A workaround to get term query results from a match query against an unmapped Elasticsearch index

We ran into a situation where the responses we were getting back from our elasticsearch queries against a particular instance seemed like nonsense.

It turned out that we had failed to apply a mapping to the index. As a result, elasticsearch had taken each field in each document we had added to the index and performed its standard analysis and reindexing to it (scroll down to “Why doesn’t the term query match my document?).

The proper fix is to reindex the index — effectively copy all the documents in the index over to another index that contains the mapping before the documents are entered into it.

However, in order to do that properly required some architectural work and planning so that someone wasn’t going to have to ssh into each elasticsearch instance and manually run a reindexing script.

As a result, we needed to find a short term way to get the right results using the existing index with the missing mapping.

The query at fault was a match query:

curl -XGET ‘localhost:9200/the_index/_search?pretty’ -H ‘Content-Type: application/json’ -d’{“from”:0,”size”:10,”query”:{“match”:{“the_field”:{“query”:”a-long-string-with-hypens”,”type”:”boolean”}}},”sort”:[{“creationTimestamp”:{“order”:”desc”,”unmapped_type”:”long”}}]}‘

The reason the query was failing is because the value of the field we were searching for contained a lot of hyphens. As a result, in the absence of a mapping telling elasticsearch not to analyze the document, elasticsearch’s default behavior is to split that string up on the hyphens and store the document by treating the value of that field as a collection of the chunks separated by the hyphens in the “inverted index”.

As a result, we couldn’t just change the query to be a term query because although this would perform the search on the entire string of `the_field`, since the documents had already been stored with the terms in the inverted index, we would get no results when trying to query on the entire string.

The solution was to use the minimum_should_match parameter. By setting the value to 100% we could let elasticsearch perform its default analysis on the query string but then require that all the documents that it returned contained all of the elements of the query string in their inverted index.

curl -XGET ‘localhost:9200/the_index/_search?pretty’ -H ‘Content-Type: application/json’ -d’{“from”:0,”size”:10,”query”:{“match”:{“the_field”:{“query”:”a-long-string-with-hypens”,”type”:”boolean”,”minimum_should_match”:”100%”}}},”sort”:[{“creationTimestamp”:{“order”:”desc”,”unmapped_type”:”long”}}]}‘

There are obviously performance issues with this approach, but as a way to get our customers the information they are expecting in the short term, it was good enough.