Parsing the wisdom of crowds: Artstor’s Arcades results

Published in

ITHAKA Tech

4 min readJul 8, 2016

A zealous group of users of the digital image resource Artstor have pitched in to collaboratively catalogue images from the D. James Dee Archive of contemporary art on our crowdsourcing site, arcades.artstor.org. Thanks to a combination of their expertise and a lot of internet sleuthing, 555 works are now a welcome addition to the D. James Dee Archive of Contemporary Art collection in the Artstor Digital Library. You can read more about Arcades on this blog, and about the Dee Archive in the New York Times.

At Arcades, participants are presented with images in a game environment where they are able to enter basic data, such as creator, title, date, medium, and exhibition history in order to accumulate points. In doing so, they “level-up” and progressively acquire titles ranging from “flâneur” and “connoisseur” to “apprentice” and “master” (all references to Walter Benjamin’s unfinished Arcades Project — more about that to come). At the time of our October 2015 launch we wondered what kind of results we would get. General crowdsourcing theory assumes that the more entries, the smarter the results. Would we secure enough participants? Would they feel compelled to return again and again?

Six months, 208 participants, and 2,916 cataloged entries later, I sat down with our intrepid Collections Editor, Nancy Minty, to decide how to publish what had amounted to a remarkably rich set of user-contributed data. In the planning phase we had decided to establish an accuracy standard based on the point system in order to whittle the data down to its most credible entries: a creator needed a “weight” of three or more; title two or more; date two or more. Yet in weeding out the flâneurs and connoiseurs, we lost a lot of “incidental” data: though one user had few points, and may have entered a typo in the title, this same user had entered the only plausible date.

Though we generally strive to provide a creator, title, or date at minimum for each record in the Digital Library, we were okay with lowering our standards somewhat for this experiment, knowing that once published we could still flesh out the records. We poked through the data looking for other quick fixes: if we filtered out the records lacking creators, we were left with a set that, while imperfect, could be published without too much objection. But again, we would exclude useful data. In short, we wound up consolidating some records ourselves in order to create one good record. Since part of the purpose of the exercise was to blindly publish, we really had to limit ourselves. Tempting as it was to peer into each entry, we had to let go and trust the crowdsourcing phenomenon — that the data indeed had “legs.” We also felt a little better when we added a note in the metadata window signaling that it was crowd-sourced.

The Basquiat below amassed quite a panoply: three different dates (five of them blank), four variants of “Jean-Michel Basquiat,” two different titles (five of them also blank), and two different mediums (one of them replicated nine times), all from just eleven participants.

Jean-Michel Basquiat, Per Capita, 1982. Photograph by James Dee. © 2014 The Estate of Jean-Michel Basquiat / ADAGP, Paris / Artists Rights Society, New York

All in all, we could trust the consensus:

Jean-Michel Basquiat [8 identical entries]

Per Capita [3 identical entries]

1983 [3 identical entries]

Once we had whittled the dataset down to a set of unique works containing the richest possible data, we saw certain (rather predictable) patterns emerge. Contemporary artists whose web presence was limited to artists’ sites and galleries with clear and thorough documentation were “easily definable.” Lesser-known works by well-known artists were forming a “promiscuous” category: they lacked a single, irrefutable authority, and were obscured by Pinterest hits, print vendor sites, and general internet noise. Lastly, lesser-known works by modern and contemporary artists were falling into a “difficult” category, as they yielded so few fully-cataloged entries. We also found that many artist and gallery webpages were not engineered for this kind of task, as they often excluded dates, and even titles.

On the image-searching side, we found that a reverse image search was unsuccessful when the image was:

Incorrectly oriented (though surprisingly, color bars did not pose an issue)
An installation shot
A color-block painting
A study by a well-known artist
A poster by a well-known artist

With our above findings, future iterations of our crowdsourcing interface may incorporate:

A way for users to view the metadata they have already contributed
An alert to let participants know when various images have been identified
A progress meter showing the collection’s entire progress
Zoom and pan

In the coming months, we will add more James Dee images to Arcades (and possibly from other collections, as well), so please join us at arcades.artstor.org and feel free to share your experiences with user-contributed data with us.

Parsing the wisdom of crowds: Artstor’s Arcades results

Written by Lisa Gavell