How to turn any website into a RESTful API

Romain Sylvian
4 min readFeb 9, 2016

--

Credits: istockphoto.com

What is a RESTful API?

In case you are not familiar with what is a RESTful API, I suggest you to take a look at this WebConcepts youtube video.

Why would I need to turn a website into an API?

Even though most of the services we are using (Facebook, Twitter, Youtube …) already have APIs, it might happen that some doesn’t provide an easy way to access their data. As a developer, it might be useful, from time to time to be able to access this data in a RESTful way.

What you’ll find below, is a common practice called scraping. Scraping public data is not illegal but you should make sure that you are permitted to scrape the content before scraping. A thorough evaluation of the legal issues involved can be seen here.

Prerequisites

In the following explanations, we’ll use Node.JS & Express as the technology stack. If you don’t know anything about those technologies, you should still be able to understand and adapt the provided code.

After making sure that node.js & npm are installed on your environment, you’ll have to install these three dependencies:

$ npm install express
$ npm install x-ray
$ npm install q

Our Kickstarter API

Let’s train ourself by writing a personal Kickstarter API by only using scraping techniques. In this article we’ll focus on only two features:

  • GET /projects -> will return an array of popular Kickstarter projects
  • GET /projects/:type -> will return an array of Kickstarter projects sorted by type (newest, magic, popularity …)

Part 1: Express.JS

This is over simplifying it, but Express.js is to Node.js what Ruby on Rails is to Ruby. It’s a light-weight web application framework to help organize your web application architecture on the server side. It’s commonly used to build RESTful API in Node.js.

Here is our index.js file:

That’s pretty much it for the Express part. Nothing crazy if you already know how to use this framework.

Part 2: Scraping

To perform the scraping part of our project, we are going to use the x-ray node module. This module depends on the cheerio module but also includes some methods to automate common scraping tasks. One of my favorite feature is probably if you need to scrape dynamic pages … with the x-ray’s PhantomJS driver it’s as easy as a pie!

Let’s define our scraping.js file:

Here is how we are using Xray() :

  • The first parameter is the url you want to scrape. In our case, http://www.kickstarter.com/discover/advanced?sort= + the type provided (popularity will be the default if null).
  • The second parameter is a jQuery-like selector. It has to select an HTML node that each of your item has. In our case, all the items have a project css class:
All the items have a ‘project” css class.
  • Finally the third parameter is a collection of collections. Again, each field is a jQuery-like string that is also able to select on attributes. The syntax for selecting on attributes is selector@attribute. If you do not supply an attribute, the default is selecting the innerText.

In the Xray() callback, some data cleanup are performed in order to extract the id:

The promise is then resolved with the whole object containing all the data.
If you followed all the steps correctly, you might be able to run your API locally by typing:

$ node index.js

Navigating to http://localhost:3000/projects will give you data that will look like this:

https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc

Looks like it’s working!

Conclusion

That’s it! You successfully learned how to turn any website into an API. A lot of more work is necessary if you want to turn the whole kickstarter website into a RESTful API and getting the projects is only the first step!

If you are interested to learn more about scraping techniques with node, I recommend you to take a look at the x-ray official documentation. You’ll be able to learn more about concurrency, throttles, delays, timeouts and limits to help you scrape any page responsibly. If you want to check the code we wrote during this article, it is available on Gist here and here.

--

--

Romain Sylvian

French Startupper in Los Angeles — #SiliconBeach | @Stanford University, @CSULB University, @HECParis & @Epitech