Unlocking the Power of JSON APIs in Ruby

One of the first labs we had at Flatiron School involved using the rest.client and JSON gems to access data in a JSON object via an API. The exercise was meant to strengthen our skills in working with hashes, but coming from a research background, I could see the huge power parsing JSON data structures will hold as I level up in my programming wizard powers.

With that in mind I decided to dig in a little deeper on JSON parsing in Ruby, but before I go down that rabbit hole, a little background on what I’m talking about.

Javascript Object Notation (JSON)

First, JSON or Javascript Object Notation is a format for structuring data that can be read as a Javascript object. Each object is wrapped in brackets like this {stuff}, filled with a series of key names and values separated by a colon, and each key/value pair is separated by a comma. Values can be pretty much any data types from a string or integer to an array or even another JSON object. The JSON format maps wonderfully to a hash data type in Ruby. I’ve include an example below for reference.

“key” : "value",
"name" : "Harry Potter",
"wizard status" : "baby wizard",
"age" : 11,
"friends" : ["Hermione", "Ron"]

Application Programming Interface (API)

An Application Programming Interface, commonly referred to as an API is a pretty broad term for instructions on how computer programs should interact. In the case of accessing JSON, it’s just a program that tells us about given data structures and allows us to query them. For deeper dive on APIs, check out this FreeCodeCamp blog post: https://medium.freecodecamp.org/what-is-an-api-in-english-please-b880a3214a82

Ruby JSON Parser Gems

Anyhow, to begin my dive into parsing data with JSON, I did some basic Googling to see what was all out there in terms of Gems. It turns out there are over 25 different gems with various capabilities and complexity. Twilio gives a great overview of the top 4 gems, which include:

  • net/http — a package built right into the standard Ruby library. While it is a little clunkier than some other packages, no gems!
  • HTTParty — a simple interface with lots of convenient methods including parsed_response, which parses JSON responses without having to explicitly call the JSON methods. HTTParty was actually introduced in a later Flatiron lab on scraping.
  • rest.client — the tried and true packaged used in my early lab. This is also a simple package, but does require you load the JSON package when parsing JSON.
  • Faraday — this package allows for more control allowing users to specify http adapter type and other file formats like xml and csv. I’ll be looking to this in the future as my programming wizard powers progress and have more complex needs.

Given these options, I decided to go with HTTParty because I’m not doing anything particularly complicated at this point, and why not cut out that one step of loading the JSON package. As I loaded the gem I was pleased to spot a reference to the party king himself Andrew WK.

When you HTTParty, you must party hard!

Choosing a JSON dataset to explore

I’ve been fishing around for easily accessible data online and really like the trove of NYC OpenData. The site is maintained by the NYC municipal government and houses thousands of datasets accessible via API on everything you could imagine related to New York City. As a newcomer to the city and a data nerd, this is cool as hell.

After bit of browsing on the site I came across a dataset on LinkNYC access points, New York’s free wifi and charging stations. I know there are a lot in Manhattan, but I’ve always wondered about their prevalence across the five boroughs. To start, let’s parse the JSON with HTTParty.

Get to parsing!!

If we take a look at the top of the NYC OpenData LinkNYC page is an option for API. Clicking on this provides a url for the JSON API . We’ll need the url here to access the JSON file in HTTParty.

Click API, then copy the API Endpoint url.

Parsing the data is a breeze with HTTPParty — all that you need to do is run the HTTParty get method on the API endpoint url we pulled above to get the page contents and store that into a variable (I called this “page_data”). From here running the “parsed_response” method on the page contents parses the contents into an array of Ruby hashes representing the records in the data file. I stored the array in a variable called responses. See below for the code I used:

require "HTTParty"
url = "https://data.cityofnewyork.us/resource/3ktt-gd74.json"
page_data = HTTParty.get(url)
responses = page_data.parsed_response

Now that we’ve parsed in the data, we can start exploring. As a first step checking the number of hash objects in the array will tell us how many records, thus how many access points there are:

=> 1000

This shows us there are 1000 records. This seems a little strange to have such a round number — sterile numbers like this generally indicate something is awry in datasets. We can check the main NYC OpenData LinkNYC Locations dataset page if there is any indicator of how many records there should be.

Sure enough, if we scroll down a little further, there is a section titled “What’s in this Dataset?” that tells us there should be 1,414 rows, which correspond to records, or objects in our array, and 18 columns, which map to key-values in each record. What do we do now?!?!

Take a deep breath and look a the documentation! (Or ask someone for help like I did — thanks, Tim!) Returning to the API section of the NYC OpenData LinkNYC data page, there is a link to API docs.

In a section of the API documentation called Paging through data, we can see that the default limit of number of records to return is 1000. AHA! The page also indicates we can modify this by adding a string query to the end of the api url $limit=xxxx&$offset=xxxx. $limit modifies the maximum number of values returned and $offset indicates how many records the data request should offset by (i.e. if the offset was 50, you would start at the 51st record). For our purposes, let’s change the the limit to 1500 and keep the offset at 0 since we want all the records (note: we need to include both in the query). The revised url used by the HTTParty get method should be: “https://data.cityofnewyork.us/resource/3ktt-gd74.json?$offset=0&$limit=1500”

When we rerun the above Ruby code with this revision and check the length, it should be 1414 — matching the number of rows report on the main LinkNYC data page!!

Analysis Time!

Now that we’ve verified we have the data we think we have, we can actually analyze it to find out how many access points their are in each of the five boroughs. First, let’s see what keys for each record are by calling the .keys method on one of the records.

=> ["boro", "borough_block_lot_bbl", "building_identification_number_bin", "cb_link_id", "census_tract_ct", "comm
unity_board", "council_district", "latitude", "link_installation", "link_installation_status", "link_site_id", "li
nk_smoke_tested_and_activated_a", "location", "longitude", "neighborhood_tabulation_area_nta", "postcode", "smalle
st_ppt", "street_address"]

There are 18 keys, which aligns with Link NYC data page on NYC Open Data. This is another way of saying there are 18 variables per record. It’s also worth pointing out that the Link NYC data page includes additional information about each of the variables.

Looking back to the list of keys, we can see that there is a “boro” key. Let’s check what the actual value of the the boro key is for this variable:

=> "Brooklyn"

It is indeed what we’d hoped — a borough! From here we just need to get a count of records for each of the five boroughs and we’ll have answered our question. If we run a collect enumerator on the array of objects to collect the borough of each access point, then use the inject model to build a hash counting each borough, we’ll have the information we need.

responses.collect{|response| response["boro"]}.inject(Hash.new(0)) {|h,v| h[v] +=1; h}
=> {"Brooklyn"=>197, "Manhattan"=>870, "Queens"=>229, "Staten Island"=>29, "Bronx"=>89}

Shocker of shockers, there are the most access points in Manhattan with a whopping 870 access points. Staten Island is the least served by Link NYC (29 access points) and the Bronx appears to be pretty underserved, too (89 access points).I was a little surprised there are more in Queens (229 access points) than Brooklyn (197 access points).

Huzzah, we’ve successfully parsed a JSON object including navigating around a default value too low for our big number of records! This is just the beginning of questions we could ask about Link NYC. The map of access points provided by Link NYC tells us more and returning to the other variables (remember the 18 columns/key-values per row) may add more to this picture and raise additional questions. Happy exploring!