Accessing Open Data from Ruby code

How can a coder like me get access to all the data Leeds City Council provides?

As part of Fish Percolator’s work with the Urban Sustainable Development Lab, I get to work a lot with the wide variety of Open Data that’s available from sources such as OpenStreetMap and Leeds Data Mill.

As I’m a coder, it’s not in my nature to want to download CSV files and play with them in Excel pivot tables or whatever else data experts do. I want to be able to pull them straight from upstream and manipulate them directly in my code.

There are various reasons I might want to do this:

  • I can automate the retrieval of the data as it changes.
  • I can perform complex queries over the metadata itself or over multiple datasets simultaneously.
  • I just feel more comfortable with code and I don’t like having to maintain a folder full of CSV files to process the data.

For OpenStreetMap, getting programmatic access to the data is relatively easy: the Overpass API is well documented and there’s even a really nice Ruby gem for it, ready to go.

But what about Leeds Data Mill? At first glance, the data looks like it’s just collected together on a website and the only alternative is to manually trawl that site for the right CSV. Right?

DataPress and CKAN

Leeds Data Mill, under the hood, is based on technology called DataPress which in turn implements something called the CKAN API.

Conveniently, there’s also a gem for CKAN, which with a few tweaks can be used to query the data in Leeds Data Mill. All you need to do is set the CKAN::API.api_url:

Getting to the metadata

For example, let’s say I want to know what datasets are available that are tagged with ‘history’. I can easily query the package API:

This prints out the information I asked for:

Getting the data itself

Querying metadata is all fine and good, but what I really want is the latest version of the data itself to play with. Oddly, the CKAN gem doesn’t support this out of the box, but I’ve created a fork that does. If you want to use my fork, just add this to your Gemfile:

Now we can get access to CSV content using the #content_csv method on a package’s resource object.

Now we can do wonderful things with the data completely in code. For example, this scriptlet prints out all the bars in Leeds that were rated 5-stars for beer by Leeds Beer Quest:

Or perhaps you want to print the name and phone number of every bar with a jukebox:

Not bad for 2 lines of code and no Excel in sight!

Quinn is the main developer at Fish Percolator: changing the world in small ways through technology.

Quinn is the main developer at Fish Percolator: changing the world in small ways through technology.