Create a simple “did you mean” — ElasticSearch

André Coelho
4 min readApr 26, 2022

--

“Did you mean” is a very important feature in search engines because they help the user by displaying a suggested term so that he can make a more accurate search.

To create a “did you mean” we are going to use the Phrase suggester because through it we will be able to suggest sentence corrections and not just terms.

First of all we will use a filter shingle as it will provide a token that the Phrase suggester will use to make the matches and return the correction.

This is the mapping:

PUT idx_did_you_mean
{
"settings": {
"analysis": {
"analyzer": {
"shingle_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle_filter"
]
}
},
"filter": {
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"suggest": {
"type": "text",
"analyzer": "shingle_analyzer"
}
}
},
"year_release": {
"type": "long"
}
}
}
}

Now let’s run a basic query to see the results of the suggestion.

GET idx_did_you_mean/_search
{
"suggest": {
"text": "transformers revenge of the falen",
"did_you_mean": {
"phrase": {
"field": "title.suggest",
"size": 5
}
}
}
}

Results:

"suggest" : {
"did_you_mean" : [
{
"text" : "transformers revenge of the falen",
"offset" : 0,
"length" : 33,
"options" : [
{
"text" : "transformers revenge of the fallen",
"score" : 0.004467494
},
{
"text" : "transformers revenge of the fall",
"score" : 2.0402104E-4
},
{
"text" : "transformers revenge of the face",
"score" : 6.419608E-5
}
]
}
]
}

Note how in a few lines you already have some promising results.

Now let’s increment our query by using more Phrase suggester features.
Let’s use the max_errors = 2, this way we want that at max two terms are run in the sentence. Added highlight to highlight suggested terms.

GET idx_did_you_mean/_search
{
"suggest": {
"text": "transformer revenge of the falen",
"did_you_mean": {
"phrase": {
"field": "title.suggest",
"size": 5,
"confidence": 1,
"max_errors":2,
"highlight": {
"pre_tag": "<strong>",
"post_tag": "</strong>"
}
}
}
}
}

Results:

"suggest" : {
"did_you_mean" : [
{
"text" : "transformer revenge of the falen",
"offset" : 0,
"length" : 32,
"options" : [
{
"text" : "transformers revenge of the fallen",
"highlighted" : "<strong>transformers</strong> revenge of the <strong>fallen</strong>",
"score" : 0.004382903
},
{
"text" : "transformers revenge of the fall",
"highlighted" : "<strong>transformers</strong> revenge of the <strong>fall</strong>",
"score" : 2.0015794E-4
},
{
"text" : "transformers revenge of the face",
"highlighted" : "<strong>transformers</strong> revenge of the <strong>face</strong>",
"score" : 6.298054E-5
},
{
"text" : "transformers revenge of the falen",
"highlighted" : "<strong>transformers</strong> revenge of the falen",
"score" : 6.159308E-5
},
{
"text" : "transformer revenge of the fallen",
"highlighted" : "transformer revenge of the <strong>fallen</strong>",
"score" : 4.8000533E-5
}
]
}
]
}

Shall we improve a little more? We added the “collate”, with that we can execute a query for each result, improving the results of the suggestions. I used a match with the “and” operator so that all terms are matched in the same sentence. If I still want the results that didn’t meet the quey criteria, I use prune = true.

GET idx_did_you_mean/_search
{
"suggest": {
"text": "transformer revenge of the falen",
"did_you_mean": {
"phrase": {
"field": "title.suggest",
"size": 5,
"confidence": 1,
"max_errors":2,
"collate": {
"query": {
"source" : {
"match": {
"{{field_name}}": {
"query": "{{suggestion}}",
"operator": "and"
}
}
}
},
"params": {"field_name" : "title"},
"prune" :true
},
"highlight": {
"pre_tag": "<strong>",
"post_tag": "</strong>"
}
}
}
}
}

Results:

"suggest" : {
"did_you_mean" : [
{
"text" : "transformer revenge of the falen",
"offset" : 0,
"length" : 32,
"options" : [
{
"text" : "transformers revenge of the fallen",
"highlighted" : "<strong>transformers</strong> revenge of the <strong>fallen</strong>",
"score" : 0.004382903,
"collate_match" : true
},
{
"text" : "transformers revenge of the fall",
"highlighted" : "<strong>transformers</strong> revenge of the <strong>fall</strong>",
"score" : 2.0015794E-4,
"collate_match" : false
},
{
"text" : "transformers revenge of the face",
"highlighted" : "<strong>transformers</strong> revenge of the <strong>face</strong>",
"score" : 6.298054E-5,
"collate_match" : false
},
{
"text" : "transformers revenge of the falen",
"highlighted" : "<strong>transformers</strong> revenge of the falen",
"score" : 6.159308E-5,
"collate_match" : false
},
{
"text" : "transformer revenge of the fallen",
"highlighted" : "transformer revenge of the <strong>fallen</strong>",
"score" : 4.8000533E-5,
"collate_match" : false
}
]
}
]
}

Note that the answer has changed, I have a new field “collate_match”, it indicates if there was a match of the collate rule in the results (this was because of prune = true).

Let’s leave the prune false.

"suggest" : {
"did_you_mean" : [
{
"text" : "transformer revenge of the falen",
"offset" : 0,
"length" : 32,
"options" : [
{
"text" : "transformers revenge of the fallen",
"highlighted" : "<strong>transformers</strong> revenge of the <strong>fallen</strong>",
"score" : 0.004382903
}
]
}
]
}

We can see that only one result is the most relevant as a suggestion.

I hope you enjoyed it and that this post serves as a basis for your “did you mean” implementations.
Access the ElasticSearch documentation and find out how to use the other features of Phrase Suggester.

References

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-suggesters.html#phrase-suggester

--

--

André Coelho

Developer of web and mobile systems. Enthusiast in the area of ​​automation and electronics and I have hobbie music.