Working with JSON in Processing.py

Published in

Data Mining the City

9 min readOct 9, 2017

JSON, or JavaScript Object Notation, is a common data interchange format, meaning that it’s used to exchange data between different services and make it consumable in a human-readable, standard way.

For example, a simple JSON file might look like this:

At first glance, you might think it looks very much like a Python dictionary, and you’d be right. It has the same key/value pairing, e.g. "common name": "Godzilla" that you’d expect to see in a Python dictionary. Although, JSON is a bit less flexible, in that it can only contain a limited set of data types.

The structures in JSON are dictionaries and lists that can be “nested” within each other. That is, not only can dictionaries and lists can contain strings and numbers, but they can also contain other dictionaries and lists. A simple list might be:

["1954-11-03", "1998-05-20", "2014-05-08", "2016-07-25"]

which contains several strings representing dates. A simple dictionary might look like:

{
  "genus": "gojira",
  "species": "giganticus",
  "subspecies": "rex"
}

The keys are “genus,” “species,” and “subspecies,” and their respective values are “gojira,” “giganticus,” and “rex.” In Python, given a dictionary, you can access those values by supplying the keys. In JSON dictionaries, keys must be strings.

JSON is a format that can be used to represent so-called “structured data.” Since it uses dictionaries, it can be much more understandable to a human reader, since it contains cues in the keys as to what a particular value is intended to be. This also helps in parsing JSON in Python code. Take the following example.

Let’s go over how to load JSON data into Python (and Processing). For simplicity, let’s just copy/paste it into a new Processing (Python Mode) sketch, like this:

The first line, import json, loads Python’s library that handles JSON files.
The next few lines, starting with json_str =, declares a new variable to contain a multi-line string, demarcated by the triple-quotes. Inside is our JSON string.
Then, we have the data = json.loads(json_str) line, which calls the loads() method in the json library. This converts (e.g. “parses”) the JSON string for us into a Python data structure.
Last, the line just prints the value contained in the data variable. Let’s see what it does.

If you run the sketch, you’ll get this:

In the console, you’ll notice what looks like our original JSON string. There are some curious u characters in front of the single quoted strings, but don’t worry about those now. And the values look a bit out of order, but that’s ok, because Python dictionaries don’t maintain order like lists do.

Traversing the Data

Ok, well, we’ve parsed the data, but how do we actually get to the values we need in the JSON? We’ll go through a process called “traversing” the data structure.

Generally, we start from the outside and work our way in, much like an outline for topics in a book or navigating folders in Finder or Windows Explorer. The outermost data structure in the JSON string is demarcated by curly braces { … }, so it’s a dictionary. Thus, to get to any value, we need to provide a key. What keys are available?

So write this line at the end of the sketch:

print(data['common name'])

You should see something like this:

The new line in the console is the result of the new print() function call, which shows us the value of the 'common name' key of the dictionary data: Godzilla. This is the most basic way we can access JSON data.

Dictionaries in JSON

Great! But how do we get to say… the genus of the scientific name? Well, first, let’s try to access the 'scientific name' value and see what we get:

print(data['scientific name'])

The result is:

Hmm, this seems to tell us that instead of getting a string, like when we accessed 'common name', that we’re getting a dictionary instead. If that’s the case, then we should be able to do this:

data['scientific name']['genus']

to get to the genus value. Let’s try!

Whoa, neat! So data['scientific name'] returns a dictionary, which we can access in turn with the same syntax, ['genus'], to get the value we’re interested in.

This is the essence of traversing a JSON dataset. We work our way through the data structure from the outside by accessing keys of dictionaries and indices of lists. We’ve done the former, so let’s try the latter.

Lists in JSON

How do we get to the sighting dates and iterate over them? Try this:

print(data['sightings'])

Now it seems that we have a list instead of a dictionary, indicated by the square brackets: [...] If that’s the case, we should be able to iterate over them like we would a list:

Sweet! So we assign the list to a variable sightings then iterate over it like a regular list! Assuming that we have a well-formed JSON data structure, where every dictionary has the keys we expect, we have the tools we need to start working with that data, drawing things from it, and so on. The entire sketch code is this:

What we’ll find most often is lists of dictionaries, so we’ll iterate over such lists and visualize them. To demonstrate that, let’s do a more real-world example.

This will walk through (1) getting search results from YouTube and (2) drawing their thumbnails using the same techniques described above.

Get the Data

Go to the YouTube Search API page at https://developers.google.com/youtube/v3/docs/search/list and scroll down to just above the “Request” section.

On the right-hand side, click on “Execute without OAuth.” This will execute the API query but without requiring us to jump through the hoops of authenticating into the service.

After the request finishes, you should see a green result that appears just below the form:

This is a JSON-formatted response from the YouTube API (minus the /** … */ lines at the top). Let’s see if we can use this in Processing. For convenience, I’ve already saved the results in a .json file for you, so go ahead and download the JSON file for "surfing” results here.

After clicking on the link just click File → Save (or Ctrl+s on Windows or Cmd+s on Mac).
In Processing, create a new sketch in Python Mode.
Save the Sketch (with a name like “YouTubeJson”).
Include the data file by selecting in the menu bar, Sketch → Add File… Pick the data file you just downloaded. The file should be in the data folder of your sketch folder.

Traversing the YouTube JSON Data

First, we need to open the file and parse it using the same json library. Since we’re using a data file, we need to open it with the open() builtin function:

The line using with open() opens the file, declares a variable called file that represents the open file. The next line loads the file and parses it, storing the structured data in the data variable.

When you run the sketch you should see something like this:

Now that we have a working sketch up and running, let’s take a closer look at the JSON data so we can understand the best way to traverse it.

The first thing we should take note of is that the first character in the JSON data is a brace, so we’re dealing first with a dictionary.

Next, take a look at the keys of that dictionary. This includes the words "kind", "etag", "nextPageToken", and so forth. But the one we’re interested in is "items", the actual results of the search query. Looking at the original API documentation page, under the “Properties” section, we see the description for what "items" means:

Thus, perhaps we can find a way to extract a thumbnail image from the JSON data, inside the "items" list. How do we know it’s a list? It starts with a bracket [.

This means that items is effectively a list of dictionaries. So, to test how to work with this, let’s see if we can extract the first item. We can first access the dictionary value by providing the "items" key to data, then print how many values are in the "items" list:

items = data["items"]
print(len(items))

Adding these lines to the end of our sketch, now we get:

It prints 25 items, which is the number of items requested from the original query (unless you changed that number):

Let’s print the first element of the items list to see what we’re working with:

print(items[0])

This should result in:

Getting the Videos’ Thumbnail URLs

The highlighted portion above is the JSON structure of the first item, representing information about the first YouTube video in our search results. Expanded, it looks like this:

This shows us a lot of information available for each search result, including the title of the video, its description, and a set of thumbnails of various sizes. The part we’re trying to get to this time around is the "url" value of the "default" thumbnail. How can we get there?

First, we have to extract the "snippet" value, which results in a dictionary.
That gives us access to the "thumbnails" dictionary, which
in turn gives us the "default" thumbnail.

In Processing, it’s like this:

defaultUrl = items[0]["snippet"]["thumbnails"]["default"]["url"]
print(defaultUrl)

Now that we have a URL to an image, we can use loadImage() to display it on the Processing canvas:

size(600, 450)
background(0)
thumbnail = loadImage(defaultUrl)
image(thumbnail, 0, 0)

That should request the image from the Internet, then display it in the canvas:

Drawing All the Thumbnails

Thumbails, straight from YouTube, delivered with JSON data. The challenge now is to display all the thumbnails, which is fairly straightforward given the following hints:

Each default thumbnail is 120 pixels wide and 90 pixels tall.
We can iterate over the items list using a for ... in loop.

Starting with the size() function again, try this:

When you run the sketch, you should see a grid of thumbnails:

Of course, the math inside the for loop, computing the row, column, x, and y values, is just in service of showing the thumbnails in a grid. The real point of this is to show you how you can start to pull apart data in JSON format and use that data.

Originally published at spatialpixel.com.