Elasticsearch is awesome, but it won’t fix your search problems

Elasticsearch, A.I. and the future of on-site search

Jacopo Tagliabue

Published in

Tooso

6 min readMay 17, 2017

“So, what do you think of Elasticsearch?”

“Man must shape his tools lest they shape him.” A. Miller

A question that our customers ask me a lot these days is ‘What do you think of Elasticsearch?’. My usual reply is that I can either give a somewhat misleading answer in one line or a well crafted one in ten minutes. As it turns out, everybody just wants to listen to the one-liner:

Elasticsearch is awesome, but it won’t fix your search problems.

However, we think there is value in sharing the full-page version so that we can present a more complete picture of our view on the future of search as a user facing product. For more on what we mean by that, read on.

Elasticsearch 101

If you spent your last 15 years in hibernation, this section is for you. The first version of Elasticsearch was released back in 2010 as a distributed full-text search engine based on Lucene, a popular open source library first released in 1999. Flash forward to 2017: Elasticsearch is now the most popular enterprise search engine in the world.

While you were in hibernation, a couple of things happened: the Cubs won the World Series, Miley Cyrus went from this to this, and, even more importantly, Wikipedia was invented, so feel free to go there for more historical info.

We are now gonna dive straight into Elasticsearch core features.

Why Elasticsearch is awesome (for the right use cases)

“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” A. de Saint-Exupery

Elasticsearch is awesome — what’s not to like?

It is open source, so you can actually look inside it; probably because of that, it has good documentation and great community support.
It is scalable and reliable, and already proved its worth in many battles; Elasticsearch is used by some of the best run enterprises in the world to process gazillions of data points every day.
It is easy to interact with, as it has a RESTful interface and comes with wrapping libraries for many popular languages.

In fact, we like Elasticsearch so much that it is an integral part of our own data pipeline: in particular, we use AWS Kinesis to ingest all Tooso related events and we drop tons of JSONs to an AWS-hosted Elasticsearch cluster; we then use Kibana to navigate through our log haystack when we need to find the proverbial needle. Hint: if those tools are not already part of your data pipeline, they most likely should be.

Sifting through terabytes of logs is not, however, the kind of search problem our clients — digital retailers, especially in the fashion industry — experience.

What retailers want is a way to make their customers happier. And people that search are their best customers, with a conversion rate that is 2–11x higher than average.

The happier the searchers, the more they buy, the happier our clients: pretty neat, uh?

While it is certainly possible in theory, producing a consistent and enjoyable search experience for digital shoppers through Elasticsearch requires making decisions on non-trivial issues, which in turn requires experience and specific knowledge. For example (in no particular order of importance/salience):

What is the difference between query context and filter context?
What is the default analyzer behavior on the query ‘I don’t like New York’?
By the way, what is an analyzer again?
Which stemming process should I use? And why stemming and not lemmatization?

It is worth pointing out that this is not just a generic tech gap (‘you have to be Albert Einstein to figure it out’): even good engineers, without previous exposure to Information Retrieval, will have to go through the painful learning process. In other words, Elasticsearch is more a “tool” than a “product”, in the same way that, say, Postgres is. For Postgres to become an effective CRM for your sales team, some non-trivial work and non-obvious choices need to be made; similarly, for Elasticsearch to power a satisfactory shopping experience, some non-trivial setup and fine-tuning need to happen. Postgres and Elasticsearch are exceptional tools to build great customer-facing product, but they are not per se ready-made products.

So, the question for digital retailers then becomes:

Is this painful setup necessary or are there more effective ways to make my customers happy (and, ultimately, earn more)?

It turns out, there are.

Search in the AI era

“I skate to where the puck is going to be, not where it has been.” W. Gretzky

Now that we know that Elasticsearch cannot be in itself the answer to the retailers’ problem, can we come up with a list of purely (or mostly) technical desiderata for a perfect user facing product?

Service oriented: the ideal solution should not involve any maintenance on the retailer side, nor hosting responsibilities, and should ideally be seamless to integrate (API-based integration comes to mind as the default choice). In the subscription economy, a SaaS model looks like the natural offer, but there could be exceptions for enterprise level services.
Search algorithm: matching keywords, tf-idf and the vector space model are all fine concepts — but this is not the Seventies anymore (unfortunately, one may say). If I don’t have more than 100 bucks for my new basketball shoes, I should be able to just type ‘nike shoes 100 dollars’ and get what I want.
No search fits all: search bars don’t care whether you are in Italy or the US, a new customer or a faithful client, a mobile-addicted millennial or a geek with her laptop. A smart search bar should be somewhat personal, intelligently adapting to user behavior and changing according to context: if I navigate to the women’s section of a website and then search for ‘shirts’, women’s shirts should be more relevant than men’s. Why can’t today’s search bars even learn that?

While there are well-established, successful companies out there already offering an out-of-the-box experience that is way better than Elasticsearch (two that we like a lot are Algolia and Fact Finder), it is hard to ignore that a paradigm shift is happening and that Artificial Intelligence is (and increasingly will be) the key for the future of search.

Truth is, good engineering and a nice product package will get you only so far.

It’s only by combining a deep understanding of language and search experience — blending together linguistics, cognition and computer science — that we will bring search to the next level. With the AI era upon us, we finally have at our disposal all the pieces in the puzzle to radically change how people interact with the oldest, and still most important, HTML object on the page.

It is on this vision (and on our entire research life) that we built Tooso from the ground up and it is our mission to fulfill those needs, and many more.

In a future post, we will go deeper in how awesome and personal search can become with Artificial Intelligence: AI-based search is the technology we deserve and the one we need right now.

See you, space cowboys

If you want to share your Elasticsearch experience or your perspective on the future of search, please reach out directly at jacopo.tagliabue@tooso.ai.

Acknowledgments

Many thanks to Maria Paola Sforza Fogliani and Katherine Yoshida for very helpful comments on a previous draft.