Search Is Not Solved Yet
The bulk of the progress that took place in search in the last two decades can be traced back to the Web search space. A large part of that comes from the ability of the players in the industry to rapidly translate research outputs from labs into product development and enhancement. The focus of making more and more things searchable efficiently on the Web meant that the utility of Web search engines was constantly improving. It is then no surprise that the general populace squarely associates search with Web search, so much so that some of us even bought into the idea that search is solved.
In this post, we discuss why we are still a while away from ‘solving’ search despite the progress that we have seen in Web search. You might disagree with what I have to say if your experiences with the likes of Google are all that define your view of search. But before you do, we are not only talking about the primarily one-shot type of searches that Web search engines specialise in. Perhaps I should rephrase — there is more to search than Web search.
Search is 90% solved? Yes and no
I picked this up from an article in 2008 on TechCrunch. In the article, it states “if search was 90% solved, Google could look at a picture of me standing by the Eiffel Tower and know, without textual metadata, what’s there“. Clearly, object recognition in images for search was poor a decade ago. And guess what, the Google today is doing exactly that. So is 90% of search solved? Yes and no.
The search by image example above is a testament of the rate at which Web search engines improves in specific verticals or query types. The sometimes controversial health-related queries are another example, where related conditions and self-treatment options are provided as answers at the top of the results. More importantly, Web search engines continue to push ahead with their core search on areas such as evaluation framework, query understanding and expansion, searches for non-English content, ranking with beyond topical relevance signals such as recency and source quality, snippets generation, offensive and misleading content handling, etc.
For the reasons above, I would not be so quick to dismiss the statement that search is 90% solved, but only in the context of Web search.
The remaining 10% in Web search
Web search has reach a point of diminishing returns at this stage in terms of improving search efficiency and experience for the users. The opportunities to push further ahead in this space, in my view, are defined by (1) the discovery and use of more and more features from vast amounts of data to optimise ranking for different user segments, and (2) the coming of age of the concepts of contextual and conversational search and the voice paradigm for accessing search services. My previous post on the use of machine learning techniques and big data in search covers the first point. It is worth pointing out at this stage that while contextual, conversational and voice search are somewhat inter-related, they are three entirely different concepts. They are often used inter-changeably, promoting confusion amongst the uninitiated.
On the point of contextual search, Web search engines do perform slight adjustment to our results ranking based on the additional things that they know about us such as our current locations, our search history, etc. There is another type of context known as conversational context, which is crucial to enable conversational search. The idea of conversational context is to reduce the onus on Web search users to be overly explicit about their intent between successive searches. The idea is great as sometimes, to address an information need, multiple searches may be required. It enables an otherwise quite mechanical activity of searching to be more natural.
For instance, assume that I want to find out more about the Eiffel Tower as I am planning a trip to Europe. That is a perfectly valid information need. In order to address that need, I may choose to perform many searches to figure out “Where is Eiffel Tower located?” followed by “When was it built?” and finally “Who built it?“. However, if you were to use those questions verbatim as search queries, you will not get the answers you need in a natural, successive fashion. The two screenshots below illustrate the case, despite the announcement by Bing that they are on the trajectory to introduce this capability three years ago.
The example above highlights a rather obvious type of conversational context, where pronouns in successive searches need to be cleverly resolved to concepts or entities raised in the past within a session. While this capability may still be absent from your regular Web search accessed via browsers, they are definitely required and are present (with questionable effectiveness from my experience) in the so-called voice activated personal assistants on smartphones.
The increasingly popular voice search, on the other hand, is reminiscent of the growth of the share of searches on mobile. They pose interesting challenges for the advertising business, which provides for search engines such as Google their lion’s share of revenue. Initially, the question was around how ads are to be targeted and presented for the increasing share of mobile traffic. The challenge is even now more prominent with the concept of voice search. In essence, how do we display ads in a world void of pixels, text and images. Any push towards voice search will inextricably bring with it the question of how to continue to monetize search in this new paradigm. For this reason, innovation in voice search will only progress as fast as these companies are able to figure out its impact on their business models.
Search beyond web search
The outstanding issues discussed above aside, there is no doubt that Web search has come a long way. How good is it when your query “Date of birth of Sergey Brin” is met with an exact date as the answer. You will even get a succinct response for a query like “How to eradicate ants?“. Even if Web search engines today are unable to provide you with a short paragraph as an answer, the results for most query types are pretty decent.
Search, however, does not stop there. There is a whole range of search engines on the Web that deal with specific segments or types of content. We loosely refer to them under the vertical search umbrella. They often exist in the form of portals or marketplaces to facilitate the trade of products and services (e.g., eBay, AirBnB, Envato, RedBubble) or sharing of information (e.g., TripAdvisor, Yelp, AllRecipes).
With this broader view of search, I would only consider search as solved when I can get a decent response to the query “I want to go somewhere warm for winter this year. I am on a budget this time, so where would you recommend?“. Similarly, “My in-laws are coming over this weekend and they love seafood and Indian spices. I’m thinking of what to cook.“. At first glance, handling these queries may appear straight forward to you. But the reality is they are, despite the length of these queries, still under-specified and complex. You will need to make a lot of assumptions to begin with and really dive into what the different parts of the queries mean and how they hang together.
Pushing the frontier of vertical search
Search is something a person embarks on when they have information needs. At the heart of it, the role of search systems, Web or vertical, is to understand the needs and give the users the information that best serve the needs. In vertical search, the breadth of queries is often far more constrained compared to the Web. In other words, there are only so many types of things we can expect a person to do when they are at eBay or Yelp. The real challenge for vertical search however lies in the complexity of the tasks. This often translates to user needs which are harder to express and equally challenging to be deciphered by the system.
The opportunities for vertical search are in a few areas. Firstly, the more constrained domains that vertical search operates within means more opportunity for semantics. The availability of domain-specific data assets enables the structuring up of queries, documents, etc for things like intent recognition and better understanding of the semantics of content for semantic search. These data assets which can range from basic synonym files to taxonomies and ontologies can be constructed (semi-)automatically using a range of techniques. For certain vertical search services where recall is important, these assets are invaluable for returning content that would have otherwise been missed.
Secondly, the complex intent coupled with potentially under-specified query in vertical search opens up opportunities for conversational search experience. This is when suggestions for pivoting, clarifications of ambiguous intent, suggestions for digging into a topic, etc come into play. Together with conversational context, these mechanisms allow the users to work through with the system complex needs across multiple searches to arrive at the solutions. This is discussed in detail in another post.
Thirdly, vertical search tends to offer a greater variety of behavioural signals than Web search. These signals, which often reflect the different stages in the conversion funnels of the corresponding businesses, can give rise to interesting features for improving ranking. In addition to clicks to view documents or listings, AirBnB for example has other signals such as booking requests, contacting the hosts, booking acceptance and booking rejection.
Originally published at wilsonwong.co on May 18, 2017.