Creating a Megabus “Scraper” with Node

Mathew Varughese
4 min readAug 14, 2017

--

Have you ever heard of Megabus? It is a bus line that offers $1 fares.

Usually, the fare for a bus ride with Megabus is $35 one way. However, each trip offers a $1 fare.

Try going onto megabus.com and searching for a dollar ticket. Basically, the farther into the future you book your ticket, the better your chances of getting a cheap ticket. When I took the screenshot below, it was August 13th.

A $1 trip from Pittsburgh to Philly? That is a steal. The toll itself costs $27! (Disclaimer, you have to pay $2.50 for a booking fee when you actually purchase the ticket, so its a little more than a dollar — still a steal).

Here are the problems I found with searching for these dollar tickets:

  1. You have to click each day to see the ticket deals which slows down the searching process.
  2. You have to recheck the website to see when latest available date opens up.
  3. Its a tedious task, and tedious tasks mean we should write code!

So why not try to make a scraper? The first thing I did was check the Network tab in the developer tools.

The network tab can come in clutch!

With the network tab, I can see what requests are being made every time I click something. That brings us to the next section of this article. Lets open up Postman. Postman is a tool that lets us make HTTP requests and get results in an east way.

Routes (And I ain’t talking Bus Routes)

So heres a quick explanation of HTTP requests.

Go to www.google.com. You got back some HTML. Nice! Now what happened?

Your web browser made a GET request, and the servers on Google responded with the HTML.

But servers can respond with more than HTML. If you make a GET request to (try opening the link up)

https://us.megabus.com/journey-planner/api/journeys/travel-dates?originCityId=128&destinationCityId=127

you get back our good friend JSON! That makes it easy for us to parse it. So, by analyzing the Network tab and clicking around, I found the following about Megabus’s API.

Each city has an id. For example, Pittsburghs ID is 128. Philadelphia’s is 127.

The API endpoint for getting the prices for a date is

/journey-planner/api/journeys

As you will see below when we make the request in Postman, here are the query parameters (formatted in JSON for easy viewing)

{
originId: 128,
destinationId: 127,
departureDate: "2017-10-02",
totalPassengers: 1,
concessionCount: 0,
nusCount: 0,
days: 1
}
Postman comes in clutch more than the Network tab

From Postman, we are able to mess around with the Megabus API to see what kind of data we get pack. Also, if you click Params, you are able to see a nicely formatted list of the query you make. I urge you to open up the Network tab on megabus.com, see what requests are made, open them up in Postman and mess around with the query. It is interesting to see what the data is formatted like.

Node

So making requests in Node is not that hard once you see an example.

You can use Node’s builtin http module, but I like request .

Here is a quick gist that you can run that will get save the request into a JSON file.

Now. To make this actually useful, we have to increment departure_date and make multiple requests. Also, we can parse the data into objects and filter on their price.

I am currently working on something on my Github to achieve this. If you wish to try it by yourself, it is a fun little coding project. You will probably find the npm modulemoment to be helpful to increment the date. Also, I opted to use therequest-promise module because I am familiar with Promises.

Currently I am able to produce the following through a node script, which is actually useful!

My next idea involves creating a cron job and connecting a database to check when new $1 fares show up. View my progress on that here: https://github.com/varughese/megabus-ticket-finder

Thanks for reading my first Medium article!

--

--