Stuffed animals, computer vision, and the Barnes Foundation collection online.

Things have been going swimmingly over here as we build out the Barnes collection online. I’ve been astonished at just how much easier and faster this process has been compared to ten years ago, which was the last time I did this kind of thing. That project — bringing the Brooklyn Museum collection to the web — took us about a year. Today the same objective is taking less than a quarter of the time because of new tools — getting our data out of TMS via an API, using Elasticsearch, various plug-ins to grasp collection data in a glance, and keeping the rodeo under control through the use of microservices. The process is a lot more structured and less complex than it has been in the past.

Diving into the second phase of development — using computer vision to analyze our collection to ascertain visual similarity among objects — has been much less straightforward. As we started to research this part of the project I read something which, I will admit, gave me pause. Researchers at Rutgers had been doing a lot of work specifically with art, but computers were still having trouble distinguishing impressionist works from one another; these types of works are the majority in the Barnes Foundation holdings. Not only were we diving into somewhat uncharted territory, but our jump would be into the most difficult end of the spectrum.

As a start, we thought we’d just try a few things. Following the example of Tate’s Recognition project, we used the Microsoft Computer Vision API to see what would come back…and — in a moment I will never forget — what the computer saw were lots and lots of stuffed animals in the Barnes collection. I can’t count how many times the tag “stuffed” has shown up in results and “teddy bear” seems to make it into image captions more often than you would think. You can imagine my reaction, right? Here I’ve got a project in its very earliest of stages telling me I’ve got a collection full of stuffed animals — what kind of curatorial upheaval is this going to create? My first thought was: total fail.

Renoir’s Young Mother (BF15) interpreted by a computer as “a boy holding a teddy bear.” Note, also, the mis-identified gender — more on this in a future post.

I jokingly brought up the results with one of our curators, Sylvie Patry, and her response was “Oh no. No, no, no [shakes head], but have you discussed this with Martha Lucy? She may find this interesting.” And then I spoke with Martha whose scholarly focus has been on the work of Renoir—an artist whose work represents 19% of all of the paintings in the collection. Her response? “That’s fascinating.” Turns out, she had been working on a theory that Renoir was trying to elicit the sense of touch when depicting his subjects — especially fleshy nudes and babies. Martha will tell this story in full in a later post, but I mention it here because this became a very unexpected and pivotal moment in our project.

We started to speak with Kelly Freed at Microsoft about why computers were interpreting collection objects as stuffed animals. Kelly is part of the cognitive services team which made the computer vision API’s that Fabrica used to create the Tate’s Recognition project, and this newly released custom vision API. “The computer sees them as soft.” The models are trained using photographs, not art, so “it’s as if someone who has never been exposed to art before is trying to tell you what they are literally seeing.” By comparison, the Rutgers model is more like the person who has taken art history 101 and is about to head off to grad school — this model knows the basics of art and is now on a path toward specialization.

Another teddy bear. This time Modigliani’s Reclining Nude from the Back (BF576).

The Rutgers model will be integral to finding visual connections between works of art — this will most likely be the majority of what a visitor ends up seeing on our website as “visually similar results” using the formal elements that Barnes used to teach his own students. The other models— out-of-the-box solutions coming from Clarifai, Google Vision API, Microsoft Computer Vision API, [fill_in_the_blank] — have the potential to show us new/different ways of seeing. While web visitors may never discover that computers think our collection is comprised of stuffed animals, we’re curious about this second path and what we may learn from it; there may very well be potential research material here.

The “experienced” and the “newcomer” models represent different ways of seeing. They are not too dissimilar from human process and, interestingly, similar to the way we categorize museum visitors; this realization stopped us in our tracks. All along, we had conceived of the project as a way for computers to give us the “right” answers — show us which paintings have similar lines, color, composition, space, etc. — but, in the end, both points of view have potential importance.

Cassie from Girlfriends Labs is going to be up next on the Barnes Publication discussing all of her research into computer vision and machine learning. She’s talked to a lot of folks working in the field and has landed on a couple of consultants — including Ahmed Elgammal from Rutgers — who will be helping us bring visual similarity to the web. Also, you’ll start hearing from Martha Lucy about art history and what the “newcomer” model may be showing us as computers see our collection for the first time.

Stay tuned, this is going to be a fun up and down ride with some fascinating twists and turns. Stuffed animals are only the beginning.


The Barnes Foundation collection online project is funded by the Knight Foundation and our code is open source. Follow the Barnes Foundation on Medium.