Helen D. Wall
4 min readJan 11, 2016


120kMoMA - A data visualization study of The Museum of Modern Art collection dataset of 123,919 records

This project is an exploration of MoMA’s artworks in ways not possible by visiting the galleries, reading catalogues or searching the online collection, although all of these inform and guide the research. By analyzing various features, categories and groupings within the collection, we gain new insights about the artists and the artworks, as well as the institution that has amassed and preserved their legacies.

About these data visualizations:
These findings are based on the MoMA collection data downloaded from GitHub on 10/31/2015, DOI 10.5281/zenodo.35610. The visualizations were created in R.

Working with real-world data is challenging, and data manipulation required considerable consistency formatting, parsing, extracting and aggregating.

Primary artist in the artistbio visualization is defined as single artist name or first name in a multiple list, totaling 11,593 from 140 countries. Where the artistbio record includes est. or founded, the artist is considered to be a company. Plotting the top 40 nationalities shows an overwhelming majority are American at 4,095 compared to the nearest rankings. However, compared with all other worldwide nationalities, American and naturalized American citizens combined (4,522) comprise under 40% of the total.

In the department, classification and creditline visualizations, the MoMANumber was aggregated for multiple works within a set. The dataset of 123,919 individual artworks was reduced to 66,344 records grouped by accession. For example, Mies van der Rohe Archive MR14.1 — MR14.28, which includes plans, elevations, perspectives, sections, etc., is represented in one row as accession set MR14 German Pavilion, International Exposition, Barcelona, Spain. The accession set of Diego Rivera’s May Day, Moscow sketchbook of 45 watercolors is shown simply as 137.1935.1–45. Aggregating accession sets has the greatest effect on prints contained in portfolios, pages and sheets in illustrated books, and photographic prints in albums and photographic sketchbooks. Similarly, architectural, product and graphic design projects include multiple sketches and drawings, as well as component pieces — such as building elements, furniture collections and dinnerware sets.

The creditline analysis was done on unique donor names. This produced separate totals even where the same person, entity or collection was repeated. For example, gift of Abby Aldrich Rockefeller and Abby Aldrich Rockefeller Fund were totaled separately, as were Mies van der Rohe Archive, gift of the architect and Lilly Reich Collection, Mies van der Rohe Archive. One exception was the merging of Given anonymously (1,434) and Anonymous gift (1,021) under one donor title.

There were two dramatic changes in creditline order when individual artworks and grouped accession sets were compared. The 10,966 entries in The Louis E. Stern Collection are comprised of 449 illustrated books. The collection of 4,855 Eugène Atget photographs in the Abbott-Levy Collection, Partial gift of Shirley C. Burden is listed under one single accession set 1.1969.

The dimension visualization required pulling out measurements given in a variety of formats, including imperial and metric units and textual information, into area in square cm. Some additional sleuthing was required where descriptions included overall dimension variable or if the artwork is comprised of multiple pieces. The total number of these painting records is 2,230. While F-111 by James Rosenquist is commonly thought of as the largest painting in the collection, Jennifer Bartlett’s Rhapsody, comprising 987 enamel on steel plates, is 1.3 times larger. Photos of her work installed in MoMA’s atrium can be seen in this review in The New York Times: nyti.ms/1JsceeK. Hint: The 3rd largest artwork in this dataset is a billboard.

This raises the question of classification of some artworks as painting, or even how any artwork is classified by its medium or included in one department or another. Consider Hannah Hoch’s Indian Dancer: From an Ethnographic Museum made of cut-and-pasted printed paper and metallic foil on paper. Is this collage a drawing? Is Robert Rauschenberg’s Bed, which includes pillow, quilt and sheet on wood supports, a painting? The medium of Frank Stella’s massive wall-reliefs Kastura (1979) and Giufà, la luna, i ladri e le guardie (1984) both include oil paint and aluminum, yet in this dataset these works are classified as painting and sculpture, respectively.

Modern art has, notably, broken down the barriers traditionally defined throughout art history. Museum departmental practice is only now catching up. As reported recently in The New York Times (nyti.ms/1IT7jDc), MoMA is taking an “uncorporate” seismic shift in collecting and displaying its collection. How and when the collection metadata reflect these changes are yet to be seen and visualized.

My initial work on the MoMA dataset started in Lev Manovich’s “Social and Cultural Computing” course at the Graduate Center, CUNY, during Fall 2015.

Special thanks to Lev Manovich for lessons in Cultural Analytics and R.