Processing Community Day NYC 2019

PCDNYC 2019

One of my resolutions for 2019 was to give a public tech talk. So, with the encouragement of a few co-workers, I applied for the Processing Foundation’s Community Day NYC. They accepted my proposal and with some guidance from a mentor, I landed on the topic ‘Visualizing Museum Collections’. This talk is based on research I conducted for a class called Data-Mining the City while a student in the Graduate Architecture Department at Columbia University in the Fall 2017.

This is the first in a series of articles on how to visualize museum collections metadata. This article will explain: how to get metadata using a scraper or API endpoint; import it into a Python Processing sketch; process that data and clean it up for use in visualizations.

1Create a sketch. Open the Processing Development Environment (or PDE), with Python Mode installed. When you open Processing, you’ll see the text editor on top and the console below. You can download the software here. Let’s create a new sketch by doing the following:

File > New
File > Save As... ./Processing/examples/sketch_pcdnyc

2Import data. It’s not difficult to imagine a time, not too long ago when museums didn’t have social media accounts (many still don’t), much less open data access to their art collections (not all museums have provided the public with access to their digital collections). Regardless, there is still quite a bit of data readily available to us.

We’re going to be looking at two different kinds of metadata: social media — specifically Instagram and open data digital collections. Each of these formats provides different information. For example, social media/instagram provides captions, number of likes, number of comments, tags, timestamp (so you can see what day and what time of day it was posted), images of varying sizes, and location. While open data provides you with an overwhelming wealth of information such as title, artist name, artist bio, nationality, gender, medium, dimensions, date acquired and much, much more.

Let’s bring in some data to be cleaned.

Importing Social Media Data:

We’re going to scrape data using a command-line application written in Python that scrapes and downloads instagram photos, captions, comments called instagram-scraper. If you haven’t installed this library already, then you need to run the following command to install:

$ pip install instagram-scraper

In the Terminal of your operating system, to scape a user’s data, type the following:

// These are some of the properties you may need to scrape the data
$ instagram-scraper [username] -u [your_instagram_username] -p [your_instagram_password] -m 10 [maximum number of items to scrape] --media_metadata (saves the media metadata associated with the metmuseum posts to <your_destination>/metmuseum.json) --include-location (includes location metadata when saving media metadata)
// Example that returns ten photos from the metmuseum instagram account
$ instagram-scraper metmuseum -u jane_smith -p myInstaPassword01 -m 10 --media_metadata --include-location

Once you’ve run this command, the downloaded media will be placed in a folder in the following location, current_working_directory/username. There will be two types of data returned, images and a JSON file. JSON stands for JavaScript Object Notation. Python comes with pre-built methods called libraries that allow us to do things like, import JSON data. Add the import statement to the top of the sketch file.

import json

In Python functions are always written with the same syntax starting with “def” then a space then the name of the function, () and then a colon. We’re going to write tow functions, setup() and draw(). In the setup function, we will define the size of our canvas. In this case, its 200px.

import json
def setup():
size(200, 200)
def draw():

Now, load the JSON data we just scraped into our project folder. First, open the file and parse it using the json library. Since we’re using a data file, we need to open it with the with open() built-in function:

import json
def setup():
size(200, 200)
def draw():
background(0)

with open('../data/metmuseum_instagram.json') as file:
data = json.load(file)
   print(data)

The line using with open() opens the JSON file, declares a variable called file that represents the open file. The next line loads the file and parses it, storing our data in a variable of the same name.

Now, when we run the sketch, we’ll see something like the following printed out in the console:

Python Processing | Visualizing Museum Collections

Importing Open Data Collections:

Alternatively, we can load JSON from a URL and it will return a JSON object. For this example we will be using The Metropolitan Museum of Art Collection API because they do not require users to register or obtain an API key in order to access their metadata.

We’ve saved the URL to a variable and then passed it as an argument into the built-in method, loadJSONObject:

def setup():
size(200, 200)
def draw():
background(0)

met = 'https://collectionapi.metmuseum.org/public/collection/v1/objects/437133'
data = loadJSONObject(met);
   print(data)

So now we have a working sketch.


3Traverse the data. Now that we have our sketch up and running, let’s take a closer look at the JSON data returned to us in the PDE console so we can understand the best way to traverse it. In the console, we have the data we just scraped from the Met Museum instagram account in JSON format. Ugh … this looks like gibberish, right? But that’s ok because our goal is to clean this up.

We need to put our list of instagram data into a Python list called collections so that we can visualize the data in the draw() function. To do that we will declare a global variable called collections, which is a variable that can be referenced from anywhere within in the sketch. Then, we use the list method append() to add the entire set of metadata to the collections list variable just created.

# establish a global variable that can be accessed throughout sketch
global collections
# empty list to store parsed social media data
collections = []

First, we’ll write a for…in loop that grabs the properties we want to use from the metadata.

for i, item in enumerate(data):
collection = item

# append the metadata to the collections list variable
collections.append(collection)

To check and make sure the metadata was successfully iterated over, print it out:

print collections

4Create variables. Now, create variables to reference the properties in our “data” list. For example, lets print a thumbnail URL:

thumbnailSrcUrl = collection['thumbnail_src']
print thumbnailSrcUrl
Thumbnail images for all the scraped instagram posts.

Setting all the properties equal to a variable returns the following:

displayUrl = collection['display_url']
count = collection['edge_media_to_comment']['count']
thumbnailSrcUrl = collection['thumbnail_src']
timeTaken = collection['taken_at_timestamp']
tags = collection['tags']
likes = collection['edge_media_preview_like']['count']
defaultUrls = collection['urls'][0]
location = collection['location']
caption = collection['edge_media_to_caption']['edges'][0]['node']['text']
thumbnailResources = collection['thumbnail_resources'][0]['src']
thumbnailResourcesHeight = collection['thumbnail_resources'][0]['config_height']
thumbnailResourcesWidth = collection['thumbnail_resources'][0]['config_width']

5Converting values. Sometimes, you will need to convert one or some of the variables to a human readable format. In this instance, the ‘taken_at_timestamp’ returns the following values:

We must convert this ‘taken_at_timestamp’ unicode object, which we have set to the variable dateTime, to a datetime object. Luckily, Python has a module called datetime, that we can import to handle this conversion for us.

import datetime as dt
def setup():
...
timeTaken = collection['taken_at_timestamp']
convertedDateTime = dt.datetime.fromtimestamp(timeTaken).strftime('%Y-%m-%d %H:%M:%S')
...

The above calculation returns:

2019-02-07 08:11:29

Resources:

Tools
Python Mode for Processing
instagram-scraper

APIs
Met Museum Collection API
Walters Art Museum Collections API
Rijksmuseum API
Cooper Hewitt API
Cleveland Museum of Art Collection API

Articles
Why Build an API for a Museum Collection? by Keir Winesmith and Anna Carey, September 2014
Scaling the Mission: The Met Collection API by Loic Tallon, Chief Digital Officer, October 25, 2018
Information Age: Visualizing Museum Data by Christian Marc Schmidt, Schema Design, Museums and the Web Asia 2015