Elasticsearch Highlighter

Rachadian Novansyah
JDS Engineering
Published in
4 min readMar 18, 2023

Elasticsearch Highlighter is a feature of Elasticsearch, an open-source search engine that allows users to search and analyze data quickly and easily. A highlighter is a tool that highlights the matching search terms in the search results, making it easier for users to identify relevant information.

This is a part of my article stories about Elasticsearch and in one of the articles, I promised to share my little experience with implementing one of Elasticsearch’s features on products that I made. Qadarullah, here I just want to share stories about Elasticsearch Highlighter in the products that I made.

The Elasticsearch Highlighter works by analyzing the content of the searched documents and highlighting the matching terms in the returned results. It supports various types of highlighting, including plain highlighting, fragment highlighting, and field highlighting. Plain highlighting will highlight the entire field containing the matching term, while fragment highlighting only highlights the specific fragments of the field containing the matching term.

For example, I implemented a highlighter on the Global Search feature on the website Portal Jabarprov. If you try to visit it, you can see that every keyword you are looking for will have bold or strong styling on that keyword. This is affected by Elasticsearch highlighter from the search results.

For the demo I use DevTools by elasticsearch, here’s how we can use the highlighter to get this information:

GET ipj-content-staging/_search
{
"from" : 0,
"size" : 1,
"track_scores": true,
"query": {
"bool": {
"must": [
{
"match": {
"is_active": true
}
},
{
"multi_match": {
"query": "jawa barat",
"fields":
[
"title",
"content"
],
"fuzziness": "AUTO"
}
}
]
}
},
"highlight" : {
"pre_tags": ["<strong>"],
"post_tags": ["</strong>"],
"fields" : {
"content" : {
"fragment_size": 150,
"number_of_fragments" : 1,
"type": "plain"
}
}
}
}

This query uses the match query to search for documents containing the word “jawa barat” in the “content” and “title” fields. The highlight parameter is used to highlight the matching text in the results.

Here’s an explanation of each part of the search request:

  1. "from": 0 - This parameter specifies the offset or page number of the desired search results. The number 0 means there is no offset, so the search will start from the first result.
  2. "size": 1 - For example, this parameter specifies the number of search results desired. In this case, only one result is requested.
  3. "track_scores": true - This parameter instructs Elasticsearch to track the search score of each document found.
  4. "query" - This section specifies the search criteria. In this example, the search criteria are constructed using a boolean query with two clauses:

"match": {"is_active": true} - This clause limits the search only to documents that have a value of "is_active" equal to true.

"multi_match" - This clause searches for the words "jawa" or "barat" in the "title" or "content" column of the document. Fuzziness is set to "AUTO", which allows Elasticsearch to adjust the level of mismatch by searching for the closest matching words.

5. "highlight" - This section specifies how the search results should be displayed. In this example, the highlight option is used to display text snippets that match the search criteria. The pre-tag <strong> and post-tag </strong> are used to format the matching text. The column to be highlighted is mentioned inside "fields". "fragment_size" specifies the number of characters to be displayed in each fragment while "number_of_fragments" specifying the number of fragments to be displayed. "type" is set to "plain" to indicate that highlighting only needs to be done on the unformatted text.

Enough for explanation query, and the query returns the documents will like:

{
"took" : 12,
"timed_out" : false,
"_shards" : {..},
"hits" : {..},
"max_score" : 9.67622,
"hits" : [
{
"_index" : "ipj-content-staging",
"_type" : "_doc",
"_id" : "853e1d90-4fca-48d0-a0b0-825037448b33",
"_score" : 9.67622,
"_source" : {...},
"highlight" : {
"content" : [
"Aplikasi layanan mobile berbasis Android yang memberikan kemudahan bagi masyarakat untuk mengakses berbagai informasi dan layanan SAMSAT di <strong>Jawa</strong> <strong>Barat</strong> secara online"
]
}
}
]
}
}

Here’s an explanation of each part of the result of the highlighter query example.

  1. "took": 12 - This parameter indicates the time taken by Elasticsearch to process the search request in milliseconds.
  2. "timed_out": false - This parameter indicates whether the search request timed out before it could complete. In this case, the search did not time out.
  3. "_shards": {...} - This parameter indicates the number of shards or partitions used by Elasticsearch to search the index.
  4. "max_score": 9.67622 - This parameter indicates the highest score of all the documents that match the search criteria.
  5. "hits": {...} - This parameter contains an array of search results. In this case, only one result is returned because the "size" parameter is set to 1.
  6. "_index": "ipj-content-staging" - This parameter indicates the name of the index that was searched.
  7. "_type": "_doc" - This parameter indicates the type of document that was searched.
  8. "_id": "853e1d90-4fca-48d0-a0b0-825037448b33" - This parameter indicates the unique identifier of the document that was searched.
  9. "_score": 9.67622 - This parameter indicates the score of the document that matches the search criteria.
  10. "_source": {...} - This parameter contains the actual document that matches the search criteria.
  11. "highlight": {...} - This parameter contains the highlighted snippets of text that match the search criteria. In this case, only the "content" and “title” column matched the search criteria and is displayed with the highlighted text "Jawa" and "Barat".

Conclusion

Elasticsearch Highlighter is a useful feature that helps users quickly identify relevant information in search results by highlighting matching search terms. It supports various types of highlighting, making it a powerful tool for searching and analyzing data in Elasticsearch.

At the end of the word I just want to say, Thank you for reading my article. Hopefully, this is useful. Happy Coding!

--

--

Rachadian Novansyah
JDS Engineering

Here's my story journey as Backend Engineer at Jabar Digital Service.