Scraping websites with NodeJS

Yonathan Meguira
Jul 10, 2017 · 2 min read
Scraping with NodeJS

Beware: Scraping may banned for certain website and thus illegal (Facebook for instance), so is downloading movies and crossing the street on the red light.. Beware..

Disclaimer: In a normal situation, you would have your methods in a different file than from the starting point of your node App.

For a ‘real life’ demonstration with integration with Heroku, please refer to my repository .

Scraping is really useful and may comes pretty handy at times.

About a week ago, I was tasked to display some feed on a dashboard about cybersecurity. Here’s the deal, cybersecurity news API are not that common, and if you happen to find some, you have to pay.

So I went ahead and built my scraping script.

If you happen (we never know) to be looking for cybersecurity news feed in JSON format please go to the live Heroku app there : https://cyber-news-scrapr.herokuapp.com/news

So, we are going to need :

  • node (of course)
  • npm
  • cheerio (npm install - save cheerio)
  • request (npm install -- save request)
  • a tiny brain (will do it)

the process is fairly simple:

we’ll request a web page via the ‘request’ package, once we’ll have the page, we’ll parse it with ‘cheerio’ which is a jQuery like DOM parser for node (super useful when it comes to parsing) and iterate through every DOM element (in that case : the article) and return our custom article Object.

After each iteration, our article Object will be pushed inside an array.

This array will eventually be returned to us will be served over a port with the help of express;

So here is the full code:

PS: If you want to know how to to use Heroku with NodeJs please watch this video.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade