Spring Boot Autocomplete with Elasticsearch

Milos Biljanovic
Jan 5 · 4 min read
Photo by Joshua Earle on Unsplash

There are a few ways to add autocomplete feature to your Spring Boot application with Elasticsearch:

  1. Using a wildcard search
  2. Using a custom analyzer with ngrams
  3. Elasticsearch Completion Suggester

We are going to focus on custom analyzer with ngrams. It sounds kind of complex but actually it’s not. Lets get started!

Table of content:

Usecase

We want to create a simple REST API for searching our list of Users which are stored in Elasticsearch. There will be one GET endpoint where we can send search input on who are we looking for. Let’s say we are interested in searching users by country.

We want our search to support following queries:

  1. Complete words: Bahamas/bahamas
    Looking for users from Bahamas.
  2. Partial words: baham, bah
    Looking for users from Bahamas.
  3. Multiple complete words: bahamas belize
    Looking for users from Bahamas or Belize.
  4. Multiple partial words: baham beliz
    Looking for users from Bahamas or Belize.
  5. Mixins of partial and complete words: trin and toba
    Looking for users from Trinidad and Tobago.

Basic setup: SpringBoot with Elasticsearch

Few things are needed. First we need to startup Elasticsearch and second we need to implement search with our Spring Boot application.

Startup Elasticsearch

Version Support

Startup Spring Boot Application

Clone this git repo (checkout branch master-prefix-phrase-match) , and open project in your favourite IDE. When you first start the application, users from sample data will be added into the Elasticsearch.

You can check the list of users added with this command:

curl localhost:8080/users

Now let us search. Below is core of our logic for searching, which uses phrase prefix query.

Match by prefix phrase

An example of how phrase prefix works:
Keywords: “puerto r”
It considers “puerto” as exact word that needs to be in the country name, and “r” as prefix for any word after “puerto”. This will match “Puerto Rico”.

Lets try the following search:

curl localhost:8080/users/search?keywords=bahamascurl localhost:8080/users/search?keywords=baham

Great, this will return users that are from Bahamas. Our first implementation is covering 1. and 2. requirements, but it is failing for all of the rest due to how we are using Elasticsearch and how prefix phrase match works.

We could potentially solve this with wildcard searches but this would have an impact on performance and we are avoiding core of the Elasticsearch which is it’s inverted index. So in next section we will go through how Elasticsearch is doing indexing and searching, and how we can use this in our Spring Boot application to have a more flexible search.

Elasticsearch and Custom Analyzers

Analyzers are used on data that is added to the Elasticsearch, or it can be used on search input that is used to query data in Elasticsearch.

Analyzer has three parts:

  1. Character filters
    Here we can strip, remove or change input data. Basic example is using html_strip filter which will remove html tags.
  2. Tokenizer
    Here we can break input data to simple tokens. By default standard tokenizer is used.
    Example:
    Input data: “fox in a forest”
    Tokens: [fox,in,a,forest]
  3. Token filters
    Here we can add, modify or remove tokens that we have from previous step. Basic example is lowercase token filter, which will turn all tokens to lower case.

For our autocomplete we will create custom analyzer that uses edge_ngram token filter in order to create additional tokens that will match our keywords. This analyzer will be used when data is added to the Elasticsearch (index time).

"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}

Example of how edge_ngram works:
Input token: bahamas
Output tokens: [b, ba, bah, baha, baham, bahama, bahamas]
It creates prefixes with min and max length specified.

Use Custom Analyzer with Ngrams

Code using custom analyzer is on master branch. Below are the necessary changes from the previous solution.

  • Create custom analyzer and set to be used for Country field in User

Analyzer configuration

Custom Edge Ngram Analyzer

User class, using new configuration with @Setting and @Field annotation

User
  • Modify search to use query match instead of the prefix match
Search

An example of how match query works:
Keywords: “puerto baham”
It will look for countries that have “puerto” or “baham” in their name, so it will return users from Puerto Rico and Bahamas, which is exactly what want.

  • Remove the old index from Elasticsearch
curl -X DELETE localhost:9200/users

Now we can start the Spring Boot application and test our new search.

curl localhost:8080/users/search?keywords=trin%20and%20toba

Great, this now returns a user from Trinidad and Tobago.
Another example for multiple countries:

curl localhost:8080/users/search?keywords=bel%20bahamas

It returns users from Belize and Bahamas. With this we’ve covered all requirements.

Conclusion

With just a few lines of code, we have added a cool autocomplete feature to our Spring Boot application using Elasticsearch Spring Data. Try it out yourself, since this project can serve you as a playground for testing and adding other interesting features.

Happy coding!

Thanks to Oskar

Milos Biljanovic

Written by

https://www.linkedin.com/in/milos-biljanovic/

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade