Elasticsearch Highlighting with Kotlin
This week, I had to add highlighting functionality to the search field of my project. Currently, whenever I click on a search result on the frontend, it opens the corresponding documentation. These documentations are really long and therefore it is taking a long time to find the section in the documentation that you are looking for. To solve this issue I needed to highlight the best matches in the documentation, so that you can see it on first sight.
Highlighting with KT-Search
KT-Search is a great Kotlin client for ElasticSearch. Even if it is not officially supported, it provides most features of the official Java client. Highlighting, sadly, is not supported with kt-search. Therefore, we need to write the functionality ourselves.
The end result of this tutorial will be a ElasticSearch Hit that contains informations about the highlighted terms.
The solution to this problem is structured in three parts:
- Creating the Query
- Receiving the Response
- Deserializing the SearchResponse
Creating the Query
First of all, we need to find a way to send a request to ElasticSearch that contains the query with highlighting. Since there is no functionality for highlighting itself, we have to use the rawBody feature of kt-search.
Since this query can get pretty long, I created a separate function for it.
Please keep an eye on the lower part with the
- The pre- and post-tags wrap around the matching terms. If you don’t define them, ElasticSearch uses
fieldsblock defines, from which field, the best matches should be returned.
ElasticDocument is a data class with a value called
content. I use this, because I can then refactor via IntelliJ and don’t have to change the String manually, every time i change the name of this attribute.
fragment_sizedefines the maximum length of the fragments. Fragments are a combination of the matched term with some context around it.
number_of_fragmentsdefines how many of these fragment Strings are getting returned in one request.
typeshould be chosen based on this information.
Receiving the response
Since we now have the query for highlighting, we could use the
search() function of the library right? No.
The SearchResponse of the search function does not contain highlighting. This means that we have to do it ourselves.
This function only returns the JSON as a String. It uses the restClient of the SearchClient to create a new post request. The path is set together by the name of the index and the function that we want to execute on ElasticSearch. In our case, it is
rawBody uses the JSON String that we defined in the
Deserializing the SearchResponse
Lastly, we need to deserialize the String with the search results into a list of hits. Normally this would be handled by kt-search but since the default SearchResponse of kt-search doesn’t contain the highlighting info, we need to write it ourselves.
These are the Serializable classes that are needed for the deserialization.
Note that the Hit class filters out the highlighted terms, so that you don’t have the
<highlighted> wrappers in your search results.
Now we can combine all these steps into our own search function, so that we can call it just like we would normally do.
Since we annotated all the classes from above with
@Serializable , we can now call the
decodeFromString method of the
DEFAULT_JSON object provided by kt-search.
This gives us back the SearchResponse with its nested hits. We can access them by calling
Finally we have hits that contain highlighting information.
What went good
In my opinion, the actual implementation of the self written logic to retrieve highlighting information went pretty good. I did not have that much problems since I could test the behaviour with unit tests.
What needs improvement
The biggest problem that I had was understanding that the SearchResponse of kt-search really does not contain any information about highlighting. I was confused, because I sent a query with a highlighting request, but I didn’t get any result from it. I finally understood it when I sent a request via curl and saw, that it has something todo with the SearchResponse of kt-search. Next time, I would read the JavaDoc of the response object first, so that I can see on first sight that the field I’m looking for is not contained in it.