Posts

Showing posts from March, 2014

Making a scraper in Node.js...

Let's make a multi link crawler for a multi page query in a web page listing jobs opportunities. First let's make a module of it, in a separate file from our server in node whatever it is and we gonna call it with var worker = require( 'worker.js' ); It will be called with this sentence from our express router or whatever you are using. First we gonna need two libraries  var request = require( "request" ); var cheerio = require( "cheerio" ); One for making requests more easy(request) and another for making jQuery available on the server side(cheerio). Now we need the list of pages of the main web site, this one makes a default paged list with the jobs listed that day in in one link, and paginates on base of that link so... var url = "http://www.bumeran.com.ar/empleos-publicacion-hoy.html" ; Now for exporting this function to the node server we gona make a "start" function