Using computer vision to tag the collection.

Published in

Barnes Foundation

4 min readOct 26, 2017

Throughout our collection project we’ve talked about the use of computer vision and machine learning to help us determine visual relationships. In addition, we’ve used these same tools to analyze images and provide subject keywords for searching. Sam Hains did this part of the project for us and I will tell you this was a tricky bit. Computers can only get so far especially with impressionist paintings…remember the stuffed animals?!??

The first part of this project used six services — Microsoft Azure, IBM Watson, Google, AWS Rekognition, TensorFlow, Clarifai — to “look” at paintings in our collection and tell us what was in them. We were using the default models supplied by each service, which were trained mostly from photographs, so we knew there would be a lot of error. This part of the project became a careful balance of minimizing error, while keeping some of the serendipity that machines were giving us.

So, let’s talk about serendipity first. Here’s a really good example of computers making connections through words that no human would likely ever come up with — especially not those tasked with creating “accurate” keywords. Here’s what the search for “cherubs” looks like…

Search results for “cherubs” draw interesting connections.

Clearly, these are not cherubs, but many of these things could be considered cherub-like and, in an instant, you can see a concept being applied across the collection. This gets you thinking and making connections; it’s this magic that we want to keep.

Computers tagged Giorgio de Chirico’s portrait of Dr. Barnes portrait (BF805) as [0.37 — christ, 0.12 — man, 0.06 — old+woman, 0.06 — saint, 0.04 — portrait, 0.03 — cherubs, 0.03 — mother, 0.03 — seduction]. And, thus, let the scrubbing of religious terms and gender specific terms begin.

While we’re trying to keep the magic, we’re also trying to minimize error and one of the ways we are accomplishing this is through the use of confidence ratings that most of these services deliver along with the keywords. We started by creating a cutoff, so the more esoteric words with lower confidence ratings were dropped. We also compared keywords across services to determine if services “agreed” on tags. If two or more services indicated a painting contained “apples” then we kept that result.

Then we also created our own model — a project that took a bit more work. I sat down with a number of objects and tagged them and Sam used my results to train a model; after some noodling, which is documented on github, these results were pretty accurate and most of these tags were kept. In the end, we felt all of these methods were necessary to both prune results and give us something of value.

Even with all those methods we still needed to scrub results. Computers seemed to have an exceedingly difficult time determining the gender of people depicted in works of art, so we eliminated any gender-specific words or variations of words. We also eliminated words related to religion after we saw computers refer to our founder as “Christ.” This wasn’t the only instance — Sam reported in his documentation, “it seemed that the ‘Christ’ class seemed to be picking up any bearded man.”

Computers thought our “Horse (BF1166)” was *every* kind of animal. [0.68 — deer, 0.41 — wildlife, 0.27 — dog, 0.27 — horse, 0.25 — mammal, 0.25 — no person, 0.24 — cat, 0.23 — one, 0.23 — giraffe, 0.22 — horse like mammal, 0.19 — lion, 0.11 — lamb, 0.05 — tiger, 0.04 — cow, 0.03 — goat, 0.03 — lion]. Higher confidence ratings were kept, lower ones dropped, and errant tags will crop up, but hopefully become a discovery mechanism.

We performed color analysis through the computer vision processing of the actual visual characteristics of the painting — this is what is driving our search by color. It should be noted, however, that we eliminated whenever a computer attempted to determine if something specific (or most likely someone specific) was a color, for obvious reasons. We eliminated computers telling us something was a “dog,” because that was being applied to an extraordinary amount objects which depicted women, when there were no dogs present at all. In case you are curious, cats were much less of an issue.

All of this scrubbing meant going through a list of 40k tags to red flag words that we think could be problematic in applications that could vary. We then did a spot check to figure out what our tolerance level when looking at tags applied directly to objects. We decided to let a lot of other things go because, in the end, even errant keywords can help discovery — go try a search for “graffiti” or “selfie.”

It goes without saying that in creating titles, descriptions, and other fields in our metadata are highly prioritized in the search results and the keywords generated by computers are surfaced last. A task after launch will be to refine by considering the confidence ratings in the search result order. This will help us maintain some of these more serendipitous connections while boosting more accurate results.

You’ll also notice, that there is no “keyword field” visible on the object metadata — surfacing them by searching is one thing, but it may be a while before you’ll see these computer-derived keywords directly attributable on the page.

The Barnes Foundation collection online project is funded by the Knight Foundation and our code is open source. Follow the Barnes Foundation on Medium.

Using computer vision to tag the collection.

Written by Shelley Bernstein