Accessing Open Data from Ruby code

Quinn Daley
Mar 29, 2016 · 3 min read
Image for post
Image for post
How can a coder like me get access to all the data Leeds City Council provides?

As part of Fish Percolator’s work with the Urban Sustainable Development Lab, I get to work a lot with the wide variety of Open Data that’s available from sources such as OpenStreetMap and Leeds Data Mill.

As I’m a coder, it’s not in my nature to want to download CSV files and play with them in Excel pivot tables or whatever else data experts do. I want to be able to pull them straight from upstream and manipulate them directly in my code.

There are various reasons I might want to do this:

  • I can automate the retrieval of the data as it changes.
  • I can perform complex queries over the metadata itself or over multiple datasets simultaneously.
  • I just feel more comfortable with code and I don’t like having to maintain a folder full of CSV files to process the data.

For OpenStreetMap, getting programmatic access to the data is relatively easy: the Overpass API is well documented and there’s even a really nice Ruby gem for it, ready to go.

But what about Leeds Data Mill? At first glance, the data looks like it’s just collected together on a website and the only alternative is to manually trawl that site for the right CSV. Right?

Image for post
Image for post
A website is great for casual users and Excel geniuses, but I want an API!

Leeds Data Mill, under the hood, is based on technology called DataPress which in turn implements something called the CKAN API.

Conveniently, there’s also a gem for CKAN, which with a few tweaks can be used to query the data in Leeds Data Mill. All you need to do is set the CKAN::API.api_url:

For example, let’s say I want to know what datasets are available that are tagged with ‘history’. I can easily query the package API:

This prints out the information I asked for:

Image for post
Image for post
How can I get the phone number of everywhere with a jukebox?

Querying metadata is all fine and good, but what I really want is the latest version of the data itself to play with. Oddly, the CKAN gem doesn’t support this out of the box, but I’ve created a fork that does. If you want to use my fork, just add this to your Gemfile:

Now we can get access to CSV content using the #content_csv method on a package’s resource object.

Now we can do wonderful things with the data completely in code. For example, this scriptlet prints out all the bars in Leeds that were rated 5-stars for beer by Leeds Beer Quest:

Or perhaps you want to print the name and phone number of every bar with a jukebox:

Not bad for 2 lines of code and no Excel in sight!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store