Creating our own Kiwi-flavoured spell checker

Nicole Williams
Trade Me Blog
Published in
4 min readAug 15, 2018

--

What people commonly search for on Trade Me is flavoured by popular brand names as well as Kiwi slang terms. The frequency of words searched differs greatly between Trade Me Users and the expected behaviour for generic spell checking solutions. Our original spellchecker wasn’t tuned into Trade Me members, this was leading to curious suggestions, such as asking if you were looking for a “free dog” when looking for “Freedom furniture”.

A one size fits all spellchecker wasn’t going to work for us. The solution was to construct our own dictionary, using data from real Trade Me searches. We took 30 million searches that had resulted in a click, cleaned up the search terms (lowercase, remove punctuation etc.), and extracted all the words we saw more than 5 times. This was accomplished using our search data lake, AWS Athena and the Python library Sklearn.

Whenever we see a word we haven’t seen more than this threshold, we try to correct it to something we have seen. If the word is by itself, we search for close misspellings using levenshtein distance, and then rank these by popularity to return the most likely result. If we have other words for context, we’ll take these into account to find the most likely word. For example, the word “fot” will correct to the common word “for”, but when “honda” is also present, it will correct to the common car model “Honda Fit”. The contextual model is based on Bayes Theorem, and uses the word2vec algorithm to predict the likelihood of a word in a given context.

What did we learn from creating our own spell checker?

  1. Live data is addictive

We monitored what spelling corrections were suggested on a live Splunk dashboard. Splunk is an an application used to capture, index, and correlate real-time data to create dashboards, reports and visualizations. Our dashboard allowed every member of the squads (from data scientists to developers and testers) to provide input into improving the suggestions which could be feed into the data science model. We also reviewed 1,000 results manually to understand where the model was doing well and where it could be improved.

Example of the real-time data from Splunk showing what Trade Me Users search for and what our spell checker suggests

2. Iterating quickly is valuable

The nature of AB testing and rollouts means often our search squads are working across multiple features — actively developing some while others are in AB testing phases. While this means we’re not sitting around waiting for results, it can mean too much time elapses between starting tests and gaining insights.

With this spell checker, our first iteration spanned weeks. We put our first version live, found an issue, rolled back and waited for the squads to revisit this with a fix. In the meantime, we weren’t delivering value to our users. We all agreed we could be faster and more focused. With our next 3 iterations we swarmed with focused data science and development efforts and were able to push 3 iterations of the spell checker in 3 days!

As well as a sense of momentum and excitement in the teams, we also saw fast improvements in our key metric — clicks on suggested terms. We started at 1% and ended on 10% CTR.

3. Our models can know more about Trade Me users than we do

Watching the live Splunk dashboard we had a few surprising moments where our model was able to understand search users better than us. For example, when the spell checker suggested “loolies” was corrected to “loobies” we assumed it was wrong. Surely “lollies” was a better suggestion? However, we dug deeper and found the model was predicting a much higher frequency for loobies, than lollies. It turns out Loobies Story is a clothing brand, and while unfamiliar to our team, it is commonly searched by our users. It was a great example of data helping us uncover our own blind spots!

Loobies clothing not lollies is what Trade Me users were searching for

We plan to continue iterating on our spell checker and retraining it with search data to continue improving it. We’re currently testing a Trade Me flavoured thesaurus which will also make it easier and faster to find things. With millions of goods and services available on Trade Me, our work to improve search is vital for helping Kiwis enjoy finding what they want whether it’s second-hand bargains, a new job or the home of their dreams.

--

--

Nicole Williams
Trade Me Blog

Head of Product @TradeMe. Prev Head of Product @SilverStripe. Marketing blogger and podcaster at www.techmarketer.org, everything else lands here.