Webscraping with NodeJS and Cheerio

Adi Nugroho
2 min readJul 7, 2016

When confronted with web scraping, most developer will used Scrapy (Python) or Nokogiri with Ruby and another framework based on it like Gokogiri (Go). Nokogiri and Scrapy require us to write the selector just like what they’re in html but inside a string. Somehow, this is quite not favorable, especially for those who mostly do front-end work with javascript and jquery.

Fortunately, there’s a JS framework for easier scraping and have a JQuery style of retrieving DOM elements. If you have familiarity with JQuery selector like children(), first(), next(), siblings(), etc. Then this will be as easy as cake.

Now lets start with scraping a banking site to retrieve its IDR currency rate and deliver it as a JSON.

Visit the site first and learn about it’s DOM structure (use Chrome dev tool or Firefox dev tool) http://www.bca.co.id/id/Individu/Sarana/Kurs-dan-Suku-Bunga/Kurs-dan-Kalkulator

npm install cheerio
  • Install Axios. Because we need an HTTP client. It’s Promise based so we can chain asynchronous request / response easily.
npm install axios
  1. Use axios to open the URL that we want to scrape.
  2. Load the response data (HTML data) into cheerio and assign it to variable. I assign it into variable ‘$’ so we can have a nice jquery like accessor.
  3. Then we select table row tr inside table with class .text-right. It will return multiple rows and we iterate through each of row using each.
  4. Get every `data` in each row and assign it into key-value object.
    - children : get all the children of selected element.
    - first : get the first child.
    - eq(n) : get the n child.
    - text : get the value and convert it into string.
  5. Return the object.
  6. Use then to chain the request (Promise). then take the last returned value from previous operation as a parameter.
  7. Print the object.

--

--

Adi Nugroho

Product Maker... Learner... iOS Swift & Java + Kotlin Android Enthusiast. Blog in English and Bahasa Indonesia.