Search Synonym (Same Meaning Word) with Elasticsearch

No more empty results because of slang and local language words

Ridho Perdana
Nov 17, 2020 · 4 min read

In my workplace, our Application has a feature that can search the data from a search box. It works very well until we check the data that many users got empty results from the search. One of the many reasons for that is they input the slang or different language versions of the word, which we don’t store in our database.

Image for post
Image for post
Photo by Markus Winkler on Unsplash

In my country Indonesia, there are many local languages and slang words. For example Bakso and Baso is having the same meaning, which is Meatball . Besides using local languages, our users are starting to use English keywords to search like apple instead of apel, although our app is targeted at Indonesian people.

There are many solutions to solve our problem, and I narrowed the solution to make our code able to retrieve words with their synonyms. I know that for the non-native language we can solve it with some kind of localization, but I think that is a little too much since our users only search a little of that non-native language.

Image for post
Image for post
elasticsearch logo.

In our database, we are using Elasticsearch for indexing the data so it will be easier to search, and luckily it has a feature called Synonym Token Filter . I was very happy to find out that Elasticsearch natively supports this kind of feature because it will be more difficult if we need another database or application to fix our empty results bug.

I will tell you how I implement this feature. There are some steps that need to be done to make it works in our application, there are:

We need to update the current index of our application to include a new synonym filter. Add a new value inside existing filter key:

notice that there is a synonym an object that has 3 key, type that has synonym value, lenient with true value, and synonyms with an array of strings as its value.

Type
I think this is self-explanatory, it tells that this object is a synonym type.

Lenient
Based on Elasticsearch documentation:

If true ignores exceptions while parsing the synonym configuration. It is important to note that only those synonym rules which cannot get parsed are ignored.

The default value for this key is false, so I set this to true to prevent any application error even if the synonym filter got an error.

Synonyms
This is where our configuration for the words with its synonym. Elasticsearch supports 2 kinds of format which are Solr and WordNet, and for my case, I prefer to use Solr format.

# Blank lines and lines starting with pound are comments.

# Explicit mappings match any token sequence on the LHS of "=>"
# and replace with all alternatives on the RHS. These types of mappings
# ignore the expand parameter in the schema.
# Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit

# Equivalent synonyms may be separated with commas and give
# no explicit mapping. In this case the mapping behavior will
# be taken from the expand parameter in the schema. This allows
# the same synonym file to be used in different synonym handling strategies.
# Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
lol, laughing out loud

# If expand==true, "ipod, i-pod, i pod" is equivalent
# to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent
# to the explicit mapping:
ipod, i-pod, i pod => ipod

# Multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
# is equivalent to
foo => foo bar, baz

These words can be written directly into synonyms key, but if the lists are too long, we are suggested to use synonyms_path key and pass the location of our txt file.

"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt"
}
}

2. Create an Analyzer for Document Properties

After we create the filter, we need to create an analyzer that use the filter.

2a. Append the Filter Into Existing Analyzer

Elasticsearch only allows 1 analyzer on each document properties, and if our document properties already have an analyzer, you need to append the synonym filter inside it.

3. Update the Elasticsearch Index Setting

Make a request to your Elasticsearch host with the PUT method to /application/_settings, and send the request body that has synonym filter, and appended analyzer filter value.

4. Done!

Now you can search for an item with the keyword synonyms! Note that all of these steps are only the simple example, if you want to know more details about this feature of Elasticsearch you can directly read their official documentation. Happy Coding! 🖖

Easyread

Easy read, easy understanding.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store