Accessing Open Data from Ruby code

How can a coder like me get access to all the data Leeds City Council provides?

As part of Fish Percolator’s work with the Urban Sustainable Development Lab, I get to work a lot with the wide variety of Open Data that’s available from sources such as OpenStreetMap and Leeds Data Mill.

As I’m a coder, it’s not in my nature to want to download CSV files and play with them in Excel pivot tables or whatever else data experts do. I want to be able to pull them straight from upstream and manipulate them directly in my code.

There are various reasons I might want to do this:

  • I can automate the retrieval of the data as it changes.
  • I can perform complex queries over the metadata itself or over multiple datasets simultaneously.
  • I just feel more comfortable with code and I don’t like having to maintain a folder full of CSV files to process the data.

For OpenStreetMap, getting programmatic access to the data is relatively easy: the Overpass API is well documented and there’s even a really nice Ruby gem for it, ready to go.

But what about Leeds Data Mill? At first glance, the data looks like it’s just collected together on a website and the only alternative is to manually trawl that site for the right CSV. Right?

A website is great for casual users and Excel geniuses, but I want an API!

DataPress and CKAN

Leeds Data Mill, under the hood, is based on technology called DataPress which in turn implements something called the CKAN API.

Conveniently, there’s also a gem for CKAN, which with a few tweaks can be used to query the data in Leeds Data Mill. All you need to do is set the CKAN::API.api_url:

Getting to the metadata

For example, let’s say I want to know what datasets are available that are tagged with ‘history’. I can easily query the package API:

This prints out the information I asked for:

Leeds listed buildings
Who’s who in Leeds
Blue plaques of Leeds

Getting the data itself

How can I get the phone number of everywhere with a jukebox?

Querying metadata is all fine and good, but what I really want is the latest version of the data itself to play with. Oddly, the CKAN gem doesn’t support this out of the box, but I’ve created a fork that does. If you want to use my fork, just add this to your Gemfile:

gem 'ckan', github: 'fishpercolator/CKAN'

Now we can get access to CSV content using the #content_csv method on a package’s resource object.

Now we can do wonderful things with the data completely in code. For example, this scriptlet prints out all the bars in Leeds that were rated 5-stars for beer by Leeds Beer Quest:

Friends of Ham
Mr Foley’s Cask Ale House
North Bar
Tapped Brew Co.
The Brunswick
The Head of Steam

Or perhaps you want to print the name and phone number of every bar with a jukebox:

Archie’s Bar & Kitchen 0113 243 1001
Bad Apples 07872 648781
Dry Dock 0113 391 2658
Fox & Newt 07506 741039
Mojo: Music for the People 0845 611 8643
Spencer’s 0113 243 9070
The Horse & Trumpet 0113 243 0338
The Northern Monkey 0113 242 6630
The Regent 0113 245 6040
Three Legs 0113 245 6316
Wax Bar & Jukejoint 0113 242 9442
West Riding 0113 246 8772

Not bad for 2 lines of code and no Excel in sight!