How to improve your full-text search in ElasticSearch with NGram Tokenizer

Ricardo Heck
Dev Ticks
Published in
2 min readMay 23, 2017

A common and frequent problem that I face developing search features in ElasticSearch was to figure out a solution where I would be able to find documents by pieces of a word, like a suggestion feature for example.

In this article, I will show you how to improve the full-text search using the NGram Tokenizer. I will consider that you already have some knowledge in ElasticSearch and also an environment configured with some indexed documents containing a title field, which will be used to perform the search query.

To set up the NGram Tokenizer, we should declare as the following:

Where the "my_analyzer" is the analyzer’s name that we will use for the ngram field. The mapping configuration should be as following:

As you can see above, now we have two fields for the title. The first one uses the standard analyzer, without any customization, and the second one, called “ngram”, uses our newly created analyzer with the NGram Tokenizer enabled.

The field "ngram" will index the title word(s) in the following way:

String: My searched title
Ngram: [sea, ear, arc, rch, che, hed, tit, itl, tle]

Note that we configured our tokenizer with a minimum of 3 grams, because of that it does not include the word “My”.

In that way we can execute the following search query:

In the query above, the results containing exactly the word “Document” will receive a boost of 5 and, at the same time, it will return documents that have fragments of this word with a lower score. Follow below an example with three indexed entries:

The score orders the results and as we can see the document with the title “Document 1” has a higher score because the searched word is almost the whole title. At the same time, the other two documents returned have lower scores, and the last one does not contain exactly the word “Document”, but “Documentary”, in this case, the “title.ngram” field find it using the fragments and returns it too.

At the end, this article exemplifies the NGram Tokenizer in search mechanisms of free text, where the objective is to find things even though the user has not typed exactly the indexed word.

--

--