A Layman’s Guide to the Plenario API

Welcome to Plenario. The heart and soul of the Plenario project is in its API, a powerful multifaceted tool for data observation and comparison, and one you’ve used before if you’ve played around with the Explorer. However, Plenario’s real capabilities are under-the-hood — and they cannot be fully realized if the user is a novice mechanic. This guide attempts to serve as a bridge between complete and utter inexperience and a functional knowledge of our web-based API.

If you have any knowledge of APIs whatsoever, this guide will seem overly explanatory, though the examples are still relevant. For a more in-depth look at our endpoints and their capabilities, head over to our official API Docs. There, you’ll be able to find example queries and JSON responses, an explanation of our endpoints for shape, aggregate, raw and metadata, and an in-depth look at how to form complex queries involving and and or, among other information.

If the previous sentence read like a foreign language, don’t worry — the rest of this guide is tailored to you.

Note: code snippets will be represented inline as italics.


What does Plenario do?

Plenario itself serves as a host for open datasets from around the world. Users submit large datasets, usually in CSV format, and our administrators approve them and add them to our hub, which currently holds about 150 datasets and shapefiles. The actual content of the datasets varies, but a majority of Plenario consists of open urban event data — recorded instances of food inspections, crimes, 311 calls, building permit issuances, etc. — provided by cities. The site has 118 event datasets and 38 shape datasets, many of which are updated daily. As you might expect, we end up with an extreme amount of information at the end of this consistent open-data aggregation. That information is merely waiting to tell a story; this is where our API comes in.

What’s an API? At its most basic, an Application Programming Interface operates as a machine that asks questions (known as “queries” or “calls”) of another entity, usually a piece of software, a database, or a website. The latter entity then responds with a structured file tailored to the API’s request. API “endpoints” are identifiers that point your query recipient towards the specific API functionality you’d like to perform on it.

The Plenario API queries datasets from the Plenario data hub using a set of parameters specified in an HTTP GET request — which you do every day when you hit ‘enter’ in your address bar. By default, Plenario responds with a compressed list of results in JSON (JavaScript Object Notation) format. The API includes 13 endpoints, listed in the left column on the docs. You’ll be introduced to a couple of them later on.


Trying the API

You can call the API by following the link below or by typing the text into your search bar. Regardless of method, invoking plenar.io/v1/api/<endpoint> will send an API query based on the information you give it.

Here is the first example from Plenario’s v1/api/datasets endpoint. The following URL tells Plenario to return a list of all its event datasets and their variables. Let’s GET it:

http://plenar.io/v1/api/datasets/

You may have ended up with a screen similar to this:

Frankly, it’s hellish. JSON is popular with high-level users because it’s easy to develop with, but the untrained eye may feel assaulted by the wall of text it sees. The fix for this — and the first step in fully utilizing the API — is to download a JSON viewer, if you don’t have one already. It’s downloadable as a Chrome extension and a Firefox add-on.

Once the viewer is implemented, you’ll have a much more organized look at Plenario’s JSON response:

We’re now able to see the API’s metadata output, which basically acts as a “success” message (”status”: “ok”) that adds a bit of information about the rest of the response (”total”: 118, meaning Plenario has 118 datasets total). With that groundwork in place, we’re now able to move into more complicated territory.

Plenario in Action: API Examples

Summer in Chicago

Summer months in large cities are widely considered disproportionately dangerous. Uninhibited by the weather, people are constantly out and about, especially kids and teens who are out of school for the season. This freedom, however liberating, may actually lend itself to criminal activity and harm to innocent people.

Is summer the most dangerous season in Chicago? We can use the Plenario API to check the validity of that assertion. We won’t do a full-blown statistical analysis of the data — that’ll come in a later post — but we’ll casually observe the API response and see if we can spot a pattern at surface level. As of now, we know what we want Plenario to give us: A few years of Chicago homicide data aggregated by month. All we need to do now is go step-by-step through that statement and fit Plenario’s abilities to our needs.


1.First we need to decide which Plenario endpoint to invoke. We’re looking at homicides grouped (in other words, aggregated) by month, and we’re not too concerned with shape, so we can narrow our choices down to the temporal aggregation endpoints. /timeseries is our best (and, at the time of this writing, only) option here. We can now begin building the URL for our query. Start like this:

http://plenar.io/v1/api/timeseries/ 


2.With our endpoint determined, we can specify our query parameters, which will match the data to our specifications. Take a look at the query parameters under timeseries.

dataset_name is already clear; we just need to find Plenario’s machine name for the Chicago crime data. Query v1/api/datasets and CTRL-F for “Chicago Police”. You’ll come upon ”crimes_2001_to_present”, which represents our Chicago crime data. Add it to the URL you’re building, and you’ll end up at this point:

http://plenar.io/v1/api/timeseries/?dataset_name=crimes_2001_to_present

Notice that I added a question mark immediately in front of dataset_name — this must be done directly before specifying your first query parameter.


3.Next we’ll specify date. Since I’m an intern, I’m writing this over the summer, and can’t rightly include the ongoing summer of 2016 in this analysis. Let’s look at data from 2010–15 by defining obs_date__ge and obs_date__le as 2010–01–01 and 2016–01–01, respectively. We’re now getting into multiple-parameter territory; every time a new parameter is added to the query, it should be preceded with an ampersand (&). Therefore, our URL is now:

http://plenar.io/v1/api/timeseries/?dataset_name=crimes_2001_to_present&obs_date__ge=2010-01-01&obs_date__le=2016-01-01

By now, you can probably tell how the rest of this query is going to work, so soldier on and feel free to skim my directions if you feel confident.


4.[dataset_name]__filter is where we’ll separate homicides from other crimes. The crimes dataset has 22 fields, as we can see from another quick /v1/api/datasets/ API call, this time specifying our machine name. We’ll use IUCR (Illinois Uniform Crime Reporting) code as our differentiating variable, and a quick check of the IUCR code list reveals our desired code: 0110, or “HOMICIDE”. We want to see all crimes with that code, so let’s add a filter to our query.

Plenario’s attribute filtering has just been revamped, and now includes the possibility for “or” functions and queries of (theoretically) unlimited length. This added functionality comes with new syntax. Here’s the template:

{“op”:”<operator>”, “col”:”<column_name>”, “val”:”<target_value>”}

It’s lengthy, yes, but really quite simple. The list of possible operators is in the docs; since we only want one IUCR code, the operator we’ll choose is eq for “equal to”. “iucr” replaces <column_name>, which is a synonym for dataset field_name, and 0110 is our target value, as we determined earlier.

Therefore, our new URL is:

http://plenar.io/v1/api/timeseries/?dataset_name=crimes_2001_to_present&obs_date__ge=2010-01-01&obs_date__le=2016-01-01&crimes_2001_to_present__filter={"op":"eq","col":"iucr","val":"0110"}

5.The last two parameters — agg and data_type — are the most straightforward of the query parameters. All we need to do is add &agg=month to the end of our URL. We can ignore data_type, as its default value is our preferred format (JSON).

We’ve completed our URL. Let’s run our query. Click below.

http://plenar.io/v1/api/timeseries/?dataset_name=crimes_2001_to_present&obs_date__ge=2010-01-01&obs_date__le=2016-01-01&crimes_2001_to_present__filter={"op":"eq","col":"iucr","val":"0110"}&agg=month

You should receive a pretty JSON file, formatted by your viewer, that looks something like this:

Scroll down past the coordinates (pulled from the dataset, they represent the smallest square that could possibly contain every record in the chosen dataset(s)). Take a look at the counts above 20XX-06–01 and 20XX-09–01 for each year.

I quickly averaged data from June-September and compared it to an average from the other 8 months. The results are below:

Unfortunately, the pundits look to be correct. Warmer months have markedly elevated homicide counts. We’re not going to go to the trouble of performing a true data analysis here, so our results aren’t technically “official”, but they’re results nonetheless. For the most part, it’s true that Chicago’s summer months are its most deadly.

Now that we’ve built a groundwork, let’s bring Plenario’s spatial aggregation abilities into the mix.


Examining Spatial Interaction of Government Programs

Like any major city, Chicago consistently has trouble with certain areas. Depressed neighborhoods on the city’s South and West Sides face rapid depopulation coupled with rising crime rates. Furthermore, neighborhoods like Fuller Park on the South Side have unemployment rates hovering around 40%. For places like this, the path to improvement is rife with obstacles.

Outside assistance is needed, and it comes from a host of different entities. Governments at the city, county, state and federal levels all have a hand in improvement initiatives in troubled urban areas. Using our API, let’s see how a couple of those entities interact.

The US Department of Housing and Urban Development (HUD) introduced their Community Renewal Initiative in 1993, seeking to “reduce unemployment and generate economic growth… [in] distressed communities”. To this end, they designated 5 Empowerment Zones in Chicago, signifying areas of urban distress that would receive tax credits and grants. During this time, the City of Chicago introduced several different efforts to perform a similar task as the Feds; one of these was the Micro Market Recovery Program, “a neighborhood stabilization initiative targeting small geographic areas that are experiencing higher-than-normal problems with foreclosures”.

The Micro Market Recovery Program (MMRP) exists on Plenario as an event dataset. Events consist of city employees providing specific assistance at locations as well as court hearings that serve to distribute grants to other locations. Since March 25, there have been 293 events in the city:

Chicago’s Empowerment Zones also live on Plenario, but as a shape dataset. Shape datasets can be a set of point locations, line segments, or bounded sections of land. The Empowerment Zones dataset consists of 5 polygons drawn on a map of Chicago:

Let’s overlay the two. Use the

GET /v1/api/shapes/<polygon_dataset_name>/<point_dataset_name>/

endpoint to do so. You can find the dataset machine names on Plenario (I also linked to them above). Below is the call:

http://plenar.io/v1/api/shapes/boundaries_empowerment_zones/micro_market_recovery_program_cases?obs_date__ge=2016-03-25

And its response, with the lengthy coordinates collapsed using the viewer:

The overlap is relatively small, with only ~19% (55/293) MMRP instances occurring within Empowerment Zones since March. Two of the Zones don’t see any MMRP action; neighborhoods on the South Side seem relatively well-represented, but the Chicago Department of Planning and Development did not focus on Pilsen.

A disparity between the two datasets we chose can be explained. The MMRP focuses more on property value and home-ownership, while the US government’s DoHUD focuses more on giving tax credits and business subsidies within Empowerent Zones (moreover, the latter program stopped providing incentives in December 2014). The MMRP is also different in scope, extending to 13 specific micro-markets instead of 5 (really, 3) broad areas.

Moving Forward

Consider, though, a program in which a governmental body subsidizes property purchases in an area to create an influx of new businesses. Combining that shapefile with an event dataset of new business licenses would be telling as to the effectiveness of the government initiative. /grid can show us which sub-areas benefit most from the help, at almost any scale. A few months after the program begins, /timeseries can tell us how well it’s worked; in a few years, we’ll know whether the initiative is progressing or stagnating.

Imagine a shapefile of Chicago alleyways. A combination /shapes query (like the second example above) with the Chicago crime dataset can tell us how many crimes happen in alleys. We can bring the weather endpoints into play — are crimes more likely to happen in alleyways when it’s raining? When it’s windy? /weather/hourly can give us accurate information on those fronts. What kind of crimes are likely? Add a filter “or” two (ha, ha) and find out.


Plenario is one possible tool to access the vast ocean of open data, and now (I hope), you’re literate in its basic functionality. Have at it.

Reach out to us at plenario@uchicago.edu with any questions. Check us out on Github.