Improving Search Suggestions for eCommerce

Published in

Empathy.co

7 min readSep 27, 2018

“three person pointing the silver laptop computer” by John Schnobrich on Unsplash

Suggestions are one of the key parts of any site search system. They’re the first interaction a user receives at the beginning of a search session but wait, what is a search suggestion? When a user starts writing a query inside a search box they usually expect to receive some kind of guidance and help in terms of what the search engine could find for them. This guidance is provided by search suggestions.

Let’s start from the beginning

When we start to build a new search engine, the fundamental purpose is to help users find what they’re looking for. So, one of the ways to achieve this is to include search suggestions in your search engine, these not only help users to find items, but also to choose the most relevant search terms.

To create a basic suggestion system, the first thing that’s usually done is to use a static list of suggestions. It’s the fastest way to do it but, it does have a lot of drawbacks.

Using a static list of suggestions

Even when it seems to be easy, this method only works for a very small catalogue or static catalogues, these are catalogues that contain only the same products.

Pros:

One time action.
It could be curated or refined to offer the most relevant terms.

Cons:

If the catalogue changes, it needs to be rebuilt adding new terms or removing existing ones.
Your suggestions will probably not use the same language that users do, so the problem remains.

At this point we have a suggestion system that works, and it suggests the list of terms provided manually but it’s not a scalable solution. The next step is to incorporate the user queries into our suggestion system and this will also mean we’ll be able to fix one of the common problems, that of users and catalogues speaking different languages.

Building your suggestions using raw data from user queries.

Through this approach, you’ll be able to solve one of the most important issues and be able to offer the same terms that are being used by users. In an ideal world using data from user inputs would generate terms that have a high quality and that are the most relevant to other users. I’m sorry to say that actually this won’t be the case.

In the real world, if the system automatically generates suggestions based on the data provided by the users, the reality is that it will return incomplete, misspelled and redundant terms. For example, when users type “shi” (an incomplete representation of “shirt”) you’d expect the suggestion system to show words like “shirt”, “red shirt”, “black shirt” however, unfortunately, the most probable scenario is that the suggestion system will offer something like “shir”, “shirt”, “shirts”, “red shi”.

Now we’ve got a big problem. We have a suggestion system that returns data based on user input but we’re not getting the kind of suggestions that we were expecting.

Pros:

It’s an automated process.

Cons:

Having a low quality suggestion system would be even worse than not having one at all. If the user receives suggestions that are redundant or have a low quality it just causes frustration.
It’s difficult to add exclusions or manual improvements to fix the problem because the data can change frequently.

Now we have suggestions on our search engine that help our users. However, if you analyse this approach, you’ll soon see things that don’t work as good as you’d like. We have to understand that there is no silver bullet for creating a relevant suggestion system but there are some techniques that could be applied to improve them significantly.

How can we improve our suggestions quality?

The first thing that we need to figure out is: what kind of suggestions are our users expecting? It’s not the same, for example, to offer suggestions for a technology dataset as for a clothing dataset. The way users interact with each search engine is different. So, the first thing we need to do is analyse the query that users will make.

As we saw before, if our search engine supports partial queries (using ngram based queries) we would have partial queries from users, as we saw in the “shir” example where we could end up with “sh”, “shi”, “shir” and finally “shirt”. Some users might stop writing when the search engine starts offering results while others will continue to write the desired word in full. Additionally, a user could write a full term, like “shirt”, while another could write “shirts”.

To solve these, and other, kinds of problems there are some techniques that we can apply.

Using catalog data to improve your suggestions

As mentioned previously, users won’t know the exact name that products have inside a catalogue so it’s a good idea to use the data that users type to find a product in order to build and improve the search suggestions.

Avoid partial suggestions
A term like “sh” could be expanded to both “shirt” or “shoes”. So, how can we determine what the query intention is from a partial input? A good way of doing this is by analysing the interactions made by the user during the search session. If a user interacts with “shirts” for example, clicking or buying one, we can determine that the intended query was “shirt”. However, it could be that we have interactions with both terms, maybe some users meant “shirt” meanwhile others meant “shoes”. At this point we need to define a threshold over when a term can be taken as more relevant than other.
Avoid suggestions that will return zero results
The data that drives your search engine is ephemeral. Today your data could contain a whole collection of “bags” but, by tomorrow, those bags may all be gone and may have been replaced with another set of products. At that moment, you need to stop suggesting “bag” or “black bag” or any other term that will reach a zero results page. So, this means that every suggestion candidate term must be checked against your catalogue data.
Correct suggestions that are misspelled
Even when a suggestion may not appear misspelled as the typed word has a meaning within the context of the suggestion (a spellchecked term should never be a suggestion, it should be the corrected term) we can, however, consider a term as misspelled if it’s considered not relevant enough to be part of the offered suggestions.
For example, the terms “iphone back cover” could be converted to “iphone black cover” if it’s considered more relevant, offering a certain colour of case rather than a certain type of case.

Avoid returning duplicated or similar entries

So, having followed the previous steps, we’ve now arrived at a search suggestion system that offers suggestions that are relevant to the user. This would include:

Full terms suggestions (avoiding partials).
Suggestions that returns results.
Suggestions that are relevant based on user interactions.

However, we could still be offering redundant suggestions. As mentioned previously, two users don’t necessarily use a search engine in the same way so you could have a suggestion dataset with terms like this for the input “sh”:

“shirt”
“shirts”
“shirt black”
“black shirt”
“black shirts”
“red shirt”
“red shirts”

From a machine perspective, these suggestions would be good enough as they’re full term suggestions, different terms and returns results. From a human perspective, they are redundant. We would be suggesting plurals or even the same term with the word in a different order.

To avoid this situation we could use a term distance algorithm like Levenshtein or Jaro-Winkler that calculates the distance between two strings. Given that distance and after setting a threshold, a term would be considered enough similar to another one to be, or not to be, taken into consideration by your suggestions engine.

There are many websites on the internet that compare the distance of two strings using different algorithms , one of these is this one.

For example the terms “shirt” and “shirts” would offer the following distance using Levenshtein would be 83 while using Jaro-Winkler it would be 97.

A higher value points to the terms being more similar since as much the value is closer to 100 the terms would be considered more similar. However, if we have two very similar terms, which one should we select? This needs to be the most relevant based on the criteria we’ve chosen — amount of clicks, amount of occurrences etc.

In the case of terms that contain the same words but in a different order, this can’t be processed directly by using a term distance algorithm. For example the distance between “red shirt” and “shirt red” would be 11 using Levenshtein while using Jaro-Winkler it would be 71.

These results show that the terms could be considered as totally different but we know that this isn’t the case. A possible solution to solve this situation is to evaluate the distance for the cartesian product of the words that are contained within each term.

Conclusion

A core aspect for all search engines is that relevant data is a must. Suggestions are a key part of the data that we return to the user so, to summarise, the following points outline how to design a good suggestion system:

Offer suggestions only if they are relevant and good enough to provide genuine guidance during the search experience. It’s important to have suggestions on your search engine but only if they are relevant for the users.
A good data analysis is the first step for a good suggestion system.
There is no silver bullet. You need to decide the best action for improving your suggestions based on how your users are using your search engine.
A good suggestion should be meaningful and relevant.
To create good suggestions, you need to use all the data that you have available.