Visual search image created by author, Clark Boyd.

The Past, Present, and Future of Visual Search

Visual search is one of the most complex and fiercely competed sectors of the technology industry. Google, Pinterest, and Microsoft are all battling to get in on the act.

Ours is a culture mediated by images, so it stands to reason that visual search has assumed such importance for the world’s largest technology companies. The pace of progress is certainly quickening; but there is no clear visual search ‘winner’ and nor will there be one soon.

The search industry has developed significantly over the past decade, through advances in personalization, natural language processing, and multimedia results. And yet, one could argue that the power of the image remains untapped.

This is not due to a lack of attention or investment. Quite the contrary, in fact. Cracking visual search will require a combination of technological nous, psychological insight, and neuroscientific know-how.

This makes visual search a fascinating area of development, but also one that will not be mastered easily.

Therefore, in this article, we will begin with an outline of the visual search industry and the challenges it poses, before analyzing the recent progress made by Google, Microsoft and Pinterest.

What is visual search?

We all partake in visual search every day. Every time we need to locate our keys among a range of other items, for example, our brains are engaged in a visual search.

We learn to recognize certain targets and we can locate them within a busy landscape with increasing ease over time.

This is a trickier task for a computer, however.

Image search, in which a search engine takes a text-based query and tries to find the best visual match, is subtly distinct from modern visual search. Visual search can take an image as its ‘query’, rather than text. In order to perform an accurate visual search, search engines require much more sophisticated processes than they do for traditional image search.

Typically, as part of this process, deep neural networks are put through their paces in tests like the one below, with the hope that they will mimic the functioning of the human brain in identifying targets:

Visual search: How it works

The decisions (or inherent ‘biases’, as they are known) that allow us to make sense of these patterns are more difficult to integrate into a machine.

When processing an image, should a machine prioritize shape, color, or size? How does a person do this? Do we even know for sure, or do we only know the output?

As such, search engines still struggle to process images in the way we expect them to. We simply don’t understand our own biases well enough to be able to reproduce them in another system.

There has been a lot of progress in this field, nonetheless. Google image search has improved drastically in response to text queries and other options, like Tineye, also allow us to use reverse image search. This is a useful feature, but its limits are self-evident.

For years, Facebook has been able to identify individuals in photos, in the same way a person would immediately recognize a friend’s face. This example is a closer approximation of the holy grail for visual search; however, it still falls short. In this instance, Facebook has set up its networks to search for faces, giving them a clear target.

At its zenith, online visual search allows us to use an image as an input and receive another, related image as an output. This would mean that we could take a picture with a smartphone of a chair, for example, and have the technology return pictures of suitable rugs to accompany the style of the chair.

The typically ‘human’ process in the middle, where we would decipher the component parts of an image and decide what it is about, then conceptualize and categorize related items, is undertaken by deep neural networks. These networks are ‘unsupervised’, meaning that there is no human intervention as they alter their functioning based on feedback signals and work to deliver the desired output.

The result can be mesmerising, as in the below interpretations of an image of Georges Seurat’s ‘A Sunday Afternoon on the Island of La Grande Jatte’ by Google’s neural networks:

Google visual search experiment

This is just one approach to answering a delicate question, however.

There are no right or wrong answers in this field as it stands; simply more or less effective ones in a given context.

We should therefore assess the progress of a few technology giants to observe the significant strides they have made thus far, but also the obstacles left to overcome before visual search is truly mastered.

Google Lens Visual Search

Google recently announced a slew of updates to its Lens product at the 2018 I/O conference. The aim of Lens is really to turn your smartphone into a visual search engine.

Google Lens

Take a picture of anything out there and Google will tell you what the object is about, along with any related entities. Point your smartphone at a restaurant, for example, and Google will tell you its name, whether your friends have visited it before, and highlight reviews for the restaurant too.

How Google can integrate Lens with Maps

This is supplemented by Google’s enviable inventory of data, both from its own knowledge graph and the consumer data it holds.

All of this data can fuel and refine Google’s deep neural networks, which are central to the effective functioning of its Lens product.

The core of Google’s strategy is really to get people on board with visual search first and foremost, and then to introduce more overt forms of ecommerce.

Beyond that, visual search can allow us to take better pictures. Google has demonstrated forthcoming versions of Lens that will automatically detect and remove obstructions from images, and input Wifi codes just by showing the camera the password.

What we’re really looking for are those intangibles that only an image can get close to capturing. So anything related to style or design, such as the visual arts or even tattoos (the most searched for ‘item’ on Pinterest visual search), will be a natural fit.

Search has been a fantastic medium when we want to locate a product or service. That input format limits its reach, however. If search is to continue expanding, it must become a more comprehensive resource, actively searching on our behalf before we provide explicit instruction.

Visual search strategy for ecommerce

Google-owned company DeepMind is at the forefront of visual search innovation. As such, DeepMind is also particularly familiar with just how challenging this technology is to master.

The challenge is no longer necessarily in just creating neural networks that can understand an image as effectively as a human. The bigger challenge (known as the ‘black box problem’ in this field) is that the processes involved in arriving at conclusions are so complex, obscured, and multi-faceted that even Google’s engineers struggle to keep track.

This points to a rather poignant paradox at the heart of visual search and, more broadly, the use of deep neural networks. The aim is to mimic the functioning of the human brain; however, we still don’t really understand how the human brain works.

As a result, DeepMind have started to explore new methods. In a fascinating blog post they summarized the findings from a recent paper, within which they applied the inductive reasoning evident in human perception of images.

Drawing on the rich history of cognitive psychology (rich, at least, in comparison with the nascent field of neural networks), scientists were able to apply within their technology the same biases we apply as people when we classify items.

DeepMind use the following prompt to illuminate their thinking:

“A field linguist has gone to visit a culture whose language is entirely different from our own. The linguist is trying to learn some words from a helpful native speaker, when a rabbit scurries by. The native speaker declares “gavagai”, and the linguist is left to infer the meaning of this new word. The linguist is faced with an abundance of possible inferences, including that “gavagai” refers to rabbits, animals, white things, that specific rabbit, or “undetached parts of rabbits”. There is an infinity of possible inferences to be made. How are people able to choose the correct one?”

Experiments in cognitive psychology have shown that we have a ‘shape bias’; that is to say, we give prominence to the fact that this is a rabbit, rather than focusing on its color or its broader classification as an animal. We are aware of all of these factors, but we choose shape as the most important criterion.

“Gavagai” Credit: Misha Shiyanov/Shutterstock

DeepMind is one of the most essential components of Google’s development into an ‘AI-first’ company, so we can expect findings like the above to be incorporated into visual search in the near future. When they do, we shouldn’t rule out the launch of Google Glass 2.0 or something similar.

Pinterest Lens Visual Search

Pinterest aims to establish itself as the go-to visual search engine when you don’t have the words to describe what you are looking for.

There are now over 600,000,000 visual searches on Pinterest every month, so it seems people are really starting to engage with the technology.

The launch of its Lens product last year was a real statement of intent and Pinterest has made a number of senior hires from Google’s image search teams to fuel development.

In combination with its establishment of a paid search product and features like ‘Shop the Look’, there is a growing consensus that Pinterest could become a real marketing contender. Along with Amazon, it should benefit from advertisers’ thirst for more options beyond Google and Facebook.

Where visual search comes into its own, and truly goes beyond the realm of the purely novel, is when it suggests new ideas that people have not yet thought of.

Pinterest’s Lens the Look tool is a great example. A consumer could search for shoes and find the pair they wanted, but Pinterest can also suggest an outfit that would go with the shoes too. This then becomes more of an ongoing conversation.

Pinterest president Tim Kendall noted at TechCrunch Disrupt:

“We’re starting to be able to segue into differentiation and build things that other people can’t. Or they could build it, but because of the nature of the products, this would make less sense.”

This drives at the heart of the matter. Pinterest users come to the site for something different, which allows Pinterest to build different products for them. While Google fights battles on numerous fronts, Pinterest can focus on improving its visual search offering.

Admittedly, it remains a work in progress, but Pinterest Lens is the most advanced visual search tool available at the moment. Using a smartphone, a Pinner (as the site’s users are known) can take a picture within the app and have it processed with a high degree of accuracy by Pinterest’s technology.

Pinterest Lens visual search: How it works

The results are quite effective for items of clothing and homeware, although there is still a long way to go before we use Pinterest as our personal stylist. As a tantalising glimpse of the future, however, Pinterest Lens is a welcome and impressive development.

Pinterest visual search for fashion

The next step is to monetize this, which is exactly what Pinterest plans to do. Visual search will become part of its paid advertising package, a fact that will no doubt appeal to retailers keen to move beyond keyword targeting and social media prospecting.

Bing: “intelligent” visual search

Earlier this year, Microsoft announced that it would now allow users to “search by picture.”

This is notable for a number of reasons. First of all, although Bing image search has been present for quite some time, Microsoft actually removed its original visual search product in 2012. People simply weren’t using it since its 2009 launch, as it wasn’t accurate enough.

Furthermore, it would be fair to say that Microsoft is running a little behind in this race. Rival search engines and social media platforms have provided visual search functions for some time now.

As a result, it seems reasonable to surmise that Microsoft must have something compelling if they have chosen to re-enter the fray with such a public announcement. While it is not quite revolutionary, the new Bing visual search is still a useful tool that builds significantly on their image search product.

Bing visual search

A Bing search for “kitchen decor ideas” which showcases Bing’s new visual search capabilities

What sets Bing visual search apart is the ability to search within images and then expand this out to related objects that might complement the user’s selection.

Bing visual search isolates objects

A user can select specific objects, hone in on them, and purchase similar items if they desire. The opportunities for retailers are both obvious and plentiful.

It’s worth mentioning that Pinterest’s visual search has been able to do this for some time. But the important difference between Pinterest’s capability and Bing’s in this regard is that Pinterest can only redirect users to Pins that businesses have made available on Pinterest — and not all of them might be shoppable. Bing, on the other hand, can index a retailer’s website and use visual search to direct the user to it, with no extra effort required on the part of either party.

Visual search in action

Powered by Silverlight technology, this should lead to a much more refined approach to searching through images. Microsoft provided the following visualisation of how their query processing system works for this product:

Microsoft’s visual search process

Microsoft combines this system with the structured data it owns to provide a much richer, more informative search experience. Although restricted to a few search categories, such as homeware, travel, and sports, we should expect to see this rolled out to more areas through this year.

The latest updates from Microsoft, announced in the video below, show that it has grand ambitions for this department of the business.

Bing visual search

Using a picture as a search query, users will now be able to shop and learn more just by pointing their smartphone in the right direction. The level of accuracy still lags behind both Pinterest and Lens, but Microsoft is seemingly in this battle for the long run.

What’s next?

First of all, the technology will keep improving in accuracy.

Acquisitions will likely be a part of this process. Pinterest’s early success can be put down to personnel and business strategy, but they also bought Kosei in 2015 to help understand and categorize images.

We should expect Google to put a lot of resource into integrating visual search with its other products, like Google Maps and Shopping. The I/O developers conference provided some tantalizing glimpses of where this will lead us.

Lens is already built into the Pixel smartphone camera, which makes it much easier to access, but it still isn’t integrated with other products in a truly intuitive way. People are impressed when their smartphone can recognize objects, but that capability doesn’t really add long-term value.

So, we will see a more accurate interpretation of images and, therefore, more varied and useful results.

A gap still remains between the search engine and the content it serves, however.

For this to function, brands need to play their part too. There are plentiful best practices for optimizing for Pinterest search and all visual search engines make use of contextual signals and metadata to understand what they are looking at.

One way this could happen is when brands team up with influencers to showcase their products. As long as their full range is tied thematically to the products on show, these can be served to consumers as options for further ideas.

Furthermore, there remains a gap between visual search technology and consumer behaviors. Consumers simply aren’t accustomed to using their smartphone camera as a search input and, although that is changing, progress will take time. We shouldn’t rule out the possibility of visual search a lot closer to our line of sight, perhaps even through a revived Google Glass product.

In summary, the technology has a bit of development still to come, but we need to meet the machine learning algorithms halfway by giving them the right data to work with. Pinterest has used over one billion images in its training set, for example. That means taking ownership of all online real estate and identifying opportunities for our content to surface through related results.

We may still be years from declaring a winner in the battle for visual search supremacy, but it is clear to see that the victor will claim significant spoils.

Some tips to optimize for visual search

Any time we are dealing with search, there will be a lot of theory and practice that can help anyone get better results. We just don’t have the shortcuts we used to.

To optimize for visual search, you should:

  • Read blogs like Pinterest engineering. It can seem as though these things work magically, but there is a clear methodology behind visual search
  • Organize your presence across Instagram, Google, Pinterest. Visual search engines use these as hints to understand what each image contains.
  • Follow the traditional image search best practices.
  • Analyze your own results. Look at how your images perform and try new colors, new themes. Results will be evermore personalized, so there isn’t a blanket right or wrong
  • Consider how your shoppable images might surface. You either want to be the item people search for or the logical next step from there. Look at your influencer engagements and those of your competitors to see what tends to show up
  • Engage directly with creative teams. Search remains a data-intensive industry and always will be, but this strength is now merging with the more creative aspects. Search marketers need to be working with social media and brand to make the most of visual search
  • Make it easy to isolate and identify items within your pictures. Visual search engines have a really tough job on their hands; don’t make it harder for them
  • Use a consistent theme and, if you use stock imagery, adapt it a bit. Otherwise, the image will be recognized based on the millions of other times it has appeared
  • Think about how to optimize your brick-and-mortar presence. If people use products as the stimulus for a search, what information will they want to know? Price, product information, similar items, and so on. Then ensure that you are optimized for these. Use structured data to make it easy for a search engine to surface this information. In fact, if there’s one thing to focus on for visual search right now, it is structured data.

Want to keep up to date with latest visual search news? Check out this resource! It’s a list of visual search stats, trends, news, and tips, updated daily.

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by 338,320+ people.

Subscribe to receive our top stories here.