Helen Frankenthaler, Picasso, and Barr’s Torpedo, oh my!

We held an #artdatathon and it was awesome.

Published in

Digital @ MoMA

5 min readMar 26, 2016

In July 2015, MoMA released metadata for our full collection on GitHub. Fiona Romeo has documented some of the ways people have used that data since we released it, but when Laura Norén from NYU approached us about hosting an art datathon, we thought it would be a great way to really dig into the data.

Hold up: what exactly is a “datathon”?

Most people have heard of “hackathons” — events where people gather to engage in intensive and collaborative computer programming. A datathon is similar in that people gather to work together, but differs in the outcome: instead of an app or other software output, the results are answers to questions posed to the data. Datathons challenge participants to come up with research designs that can utilize specific data — in this case, collection data from museums — to create models, figures, maps, and other presentations of findings.

Untitled: Art Datathon was a two-day workshop in which multidisciplinary teams explored what could be learned from the data about museums, their collections, and art. Laura did a great job matching up teammates so that teams would be balanced in terms of technical skills and “domain expertise” — in this case, art historical knowledge.

Many of our participants did not have data science or programming backgrounds, so all of the data was available in its most basic form — as CSV files.

So how’d it go?

Day 1 started off with an R tutorial by Adriana Crespo-Tenorio, Lead Researcher at Facebook, to ensure a baseline knowledge among the group. After lunch, participants (and organizers) were treated to a guest presentation by Lev Manovich, who had used MoMA’s collection dataset with his students and gave the example of Helen Wall’s in-depth visualization of MoMA’s collection dataset. Lev also spoke about his project SelfieCity, which compares self-portraits (“selfies”) taken in five cities around the world.

Then, the real work started. Teams had just under 24 hours to come up with an interesting question to ask the data and then figure out the answer. In addition to MoMA’s collection data, datasets were available from the Cooper Hewitt, Tate, Carnegie Museum of Art, and SFMoMA, who provided us with a CSV file, and The Frick Collection’s Montias Database of 17th-century Dutch art inventories. MoMA also contributed two datasets not yet released to the public: one of our exhibitions from 1929–1990, and a second of all artists in our collection.

On the second day, each team presented their findings to judges Matt Lincoln, Alise Tifentale, Ramona Bronkar Bannayan, and Mark Hansen, who selected two teams as the winners.

Team 4, composed of Joan Beaudoin (Wayne State University, Detroit), Michael Fehrenbach (MoMA), Shira Feldman (NYU), Juliet Fong (BBDO), and Aiyi Zhang (NYU), won for their project, “Creators and Concepts: A Computational Analysis of Curatorial Approaches,” which examined curatorial approaches through language analysis of exhibition titles. The background of their presentation featured Helen Frankenthaler’s Mountains and Sea.

Team 4’s computational approach explained against a backdrop of Helen Frankenthaler

Team 6, composed of Woojin Kim (Columbia), Marily Konstantinopoulou (MoMA), Nomaduma Masilela (Museum Research Consortium Fellow), A’Nisa Megginson (NYU), and Manuel Rueda (Columbia), won for their study of the artists exhibited most often at MoMA over time.

A screenshot of Team 6’s data visualization of artists exhibited at MoMA between 1929–1990.

They found that between 1929–1990, Pablo Picasso was on view in an exhibition at MoMA for 16,869 days — that’s three-quarters of the time MoMA had been open! While we know we have had a lot of Picasso shows, the hard data really did reinforce that our devotion to Pablo is steadfast.

What else did the teams find out?

Color and gender were popular themes that were examined by multiple teams. Team 1 looked at the use of color in works in MoMA’s collection over the course of the 20th century by gender, while Team 2 explored the relationship between dominant colors in paintings and geographical background of the artists. Team 3 compared the gender of artists in MoMA and SFMoMA’s collections, examined the age of MoMA’s collection over time, and engaged in a semantic analysis of works in the collection. Team 5 asked, “How contemporary is MoMA’s collection” by looking at the age of artworks when they entered the collection, while Team 7 looked at the breakdown of artists in MoMA’s collection by gender and nationality.

The most popular dataset was MoMA’s exhibition history. (The only other museum to have released exhibition data is the Cooper Hewitt.) Many teams dug into MoMA’s history to try to better understand the data, using press releases, images, and even MoMA’s inaugural director, Alfred H. Barr, Jr.’s torpedo diagram in their presentations to contextualize the questions they were asking the data.

Barr’s “torpedo” diagram makes an appearance in Team 3’s presentation.

Overall, what was most impressive was the excitement and enthusiasm of all the participants. In fact, many of them have continued to work with the data! We’re looking forward to seeing their questions develop even further.

What’s next?

We’ve now made our Artists dataset is available on GitHub, and based on the participants’ experience, we’re updating both the Artworks and Artists datasets to parse out nationality, birth year and death year, dimensions (height, width, depth), and add the URLs for thumbnails where available.