Discovering artwork with visual search, or the reasonable effectiveness of convolutional neural networks
This article was initially published November 6th, 2017, prior to the acquisition of Thread Genius by Sotheby’s.
Something we’ve always wanted to try was expanding the Thread Genius visual search experience to include discovering artworks based on visual similarity. It was never highly prioritized because training models for a new domain usually means gathering entirely new training data. In the same way we trained our fashion neural net to learn about silhouettes of dresses and the different variations of plaid, it would make sense that we’d have to do the same for aspects of art.
But out of curiosity, how well would our existing models do on art? That is, would the visual features learned from fashion apply to discerning similarities amongst pieces of art as well?
In this blog post, we explore this question further.
Composition vs. Subject matter
At Thread Genius, we have a few neural net models trained to recognize various concepts found in imagery. One model, nicknamed FashNet or Fashion Model, was trained only to recognize concepts related to fashion. These include patterns (stripes vs. camouflage), shape (dress vs. pants), colors (blue vs. red), and embellishments (buckles vs. epaulettes). Another model, which we call internally as Super Model, was trained to recognize fashion concepts as well as general concepts that most people would know about — think animals, plants, buildings, etc. This was trained primarily to minimize false positives when dealing with user-generated photos.
Without introducing any new concepts about art, we created two search indices by running these models on a catalog of 600K+ artworks.
A comparison of the results from these two models shows that there’s a trade-off. Our fashion model doesn’t know anything about things outside of fashion. What are clouds? Never seen them before but they look like feather prints. Apples? Nah, those are probably red watches. So when you task it to extract visual features from images about concepts it doesn’t know, there are interesting effects. One is that there’s an emphasis on what it does know, things like colors and textures. To the Fashion Model, an oil painting of an apple is just a round red thing with blotchy textures, and so it’ll group all round, red, blotchy things together. To the Super Model, since it knows about apples, it’ll group apples together.
Fashion Model is great for abstract art, Super Model less so. Super Model is great for sculptures, Fashion Model less so. Depending on your taste, you may prefer one over the other.
Side note: Interestingly, the fashion model found an apple painting that appeared to be the same exact one as the input image. On closer inspection, they’re actually two different paintings from two different artists: the input image is from Jane Palmer and the search result is a 2016 piece from George Cassallo. OOOOOH.
One thing we find fascinating is that, although we never trained these models to know anything about artists or painting techniques, pieces by the same artist naturally get grouped together based on similarities in brush strokes, color choice, etc.
Degrees of Separation
Of course, what would a deep learning blog post be without a giant t-SNE image of our embeddings? Obligatory money shot follows.
We used to have this demo at Spotify called “Boil the Frog,” in which we grabbed two random songs and used some machine learning techniques to find a chain of songs that gradually morphed one into the other. Here are some examples of this concept applied to works of art.
Last month we launched our API which allows any developer to access our visual search engine. In fact, developers on the API have access to the same fashion model that we used to produce the results in this blog post. Admittedly though, getting our models to work well for all types of artwork would require some additional fine-tuning. You saw how there’s a trade-off between a model that emphasizes composition vs. a model that emphasizes subject matter. Finding an optimum model would mean finding the right balance between these two strengths.