Redefining Visual Search in Adobe Stock by Creating Innovative Image Similarity Technology
Visual search is a powerful tool that lets customers find similar images when the image they have is close, but not exactly what they were looking for. Partnering with both Adobe Research and the Adobe Stock team, the Adobe Sensei and Search team has now developed a pioneering image similarity technology to enhance visual search capabilities within Adobe Stock.
In this article, I will explain the key concepts of image similarity and what you can do with it, why Adobe has developed it and what the underlying technology is like.
Finding exactly the right image can be really hard. Text queries don’t always work well because it’s often difficult to describe what you’re looking for. Furthermore, sometimes you don’t know what you really want until you see it, or something really close to it. These challenges are what led us to approach this problem through an image similarity engine, which enables a customer to use an image itself as a search query to serve up a new set of results composed of other images that have similar properties.
In the first implementation of similarity search, a tag prediction model was created from images and their metadata. Every image is made up of pixels which are numerically represented by RGB (red green blue intensity) values. This model takes in an image and learns to translate the RGB values into an understanding of the image defined by a list of representative tags or tokens. Once the model is trained and tuned, embeddings (dense vectors) just below the output layer are used to represent this information about the image. Using these numerically-based abstract image representations enables us to compute which images live near each other, therefore creating a measure of “similarity.”
The goal for this first pass was to make image search more visual so that users could either just upload an image or select “find similar” from within a generated result set. It’s become a popular way to search on Adobe Stock. However, when we started to dig into the feedback from users, we heard that while basic image similarity is great, it was really only meeting the needs of those who wanted images with similar objects in them. The reason why is traceable back to the tag prediction model that was used to originate the numerical image summaries. This kind of model results in image understanding that’s more “tag” related in that it’s tightly mapped to the objects in an image. However, customers told us that they wanted more control over what aspects of image similarity they could search for, which would require a more conceptual understanding of images.
A refined image similarity search in Adobe Stock
In an effort to incorporate this feedback and improve the search function, the cross-company team investigated how people perceive similarity. With the help of Judy Massuda from Adobe Stock and Emily Sermons from Design, we conducted several rounds of research to learn how our customers comprehend and describe the elements that make up a photograph. Based on this research, we decided to focus on three main dimensions that users told us they consider when they think about how similar an image is to another image: content, color, and composition.
The result of all that work is that customers can now specify the element of similarity that they want to attain within their search. “Find Similar Controls,” a feature powered by Adobe Sensei, allows customers to leverage visual search on each of the above three dimensions by just changing the search criterion. They are able to take attributes of an original reference image and apply it to a different search with one click. For example, if they select “composition,” they can swap out seasonal images in their layouts to make them fit with the current season.
Just as in the original similarity search model, this technology works by using embeddings; however, these embeddings are specifically created to be represent the content, color, and composition dimensions. This enables the customer to pick a particular dimension and find similar images based on the chosen dimension. The embeddings are stored in an index, and when a user then uploads an image, or clicks on “Find Similar by” in the UI, Adobe Stock computes a K-nearest neighbor search to find the images that are closest in a high-dimensional space.
Challenges along the way and how we solved them
In order to build the services that underlie the Find Similar Controls (FSC) feature, we came across a few interesting problems that required creative approaches to solve particularly on the composition and color search front.
The key challenge here was finding and implementing the correct embedding, which matched the customer expectation on the dimension of choice. For composition, we used the embedding from a model that understands the regions where objects and concepts occur within an image.
“One of the issues that we confronted was understanding what composition is and what we can do about it,” explains Baldo Faieta, senior computer scientist on the Applied Science and Machine Learning team. “Designers tend to agree on when an image has a good composition but they are at pains to describe what makes a good composition in a way that can be operationalized. So, the research team decided to try a straw-man definition: Two images have similar composition if the salient blobs somewhat match spatially in the canvas. This doesn’t address aspects like layers, horizon, or orientation, but it’s a good start. In order to make it operational, we needed an embedding that is an abstract representation of the composition, and usually that’s derived by modeling a proxy task like classification and using one of the next to final layers outputs as the embedding vector. However, in this case we didn’t have labeled data, so we had to explore how can we derive this kind of representation from an existing CNN model.”
The original model had been trained to predict the tags from an image taken from Adobe Stock. Internally, this model had learned filters at various layers that activate based on patterns of the image. These correspond to wider and wider patches of the image and produce smaller and smaller activation images.
These activation images capture localization of objects and can be converted into an embedding that details the localization. We index these embeddings so that when we have a new image and its own corresponding localized embedding, we can find similar localized embeddings that effectively capture the notion of composition.
For color, an engineer on my team, Saeid Motiian, came up with a novel approach using a distance function called Hellinger.
“To find an embedding for color similarity, we had to take into account some limitations,” Saeid explains. “Extracting an embedding of an image needed to be fast (around a few milliseconds), and the distance between two embeddings needed to be based on the L2 distance (Euclidean Distance) because that’s what our large-scale search is based on.”
The method the team settled on, which uses a histogram approach, is simple yet very powerful. It does not require any learning, which saves us a lot of time and money for data collection and training a deep model. Mathematically, the method corresponds to computing a Hellinger kernel-based distance between two histograms.
We came up with a couple different approaches, but it wasn’t obvious which candidate was better. So, we did some internal A/B testing with the Adobe Design team, which included a custom two-paned UI with side-by-side results and an accompanying survey.
What’s next for the image similarity engine
Image similarity gives Adobe Stock customers more control. Find Similar Controls (FSC), gives them the power to have a meaningful visual search experience and find the exact asset they are looking for in their creative digital content project.
Next generation work on FSC is already underway. We are thinking about enhancements to our existing ways of searching as well as completely new visual search capabilities. There are plenty of exciting developments coming up, so watch this space!
Alex Filipkowski is a product manager for the Applied Science and Machine Learning team at Adobe.