Elasticsearch queries — part 02 — full text queries

Arun Mohan
elasticsearch
Published in
6 min readOct 19, 2019

Phase 03 — Elasticsearch Full text queries — Blog 12

In my previous blog, we have seen the basic classification of the Elasticsearch queries (leaf and compound), the basics of both the categories, and also the query/filter contexts. In this blog, the purpose is to get introduced to the common full-text queries in the Elasticsearch world.

As a preparation to this blog, let us index some data which predominantly consists of some texts. For the purpose of simplicity, I have taken a CSV of a trimmed version of Facebook posts and their description and details, which are available in public. You can download the trimmed version of the same from here. You can index these tweets to Elasticsearch as we have discussed in my earlier blog discussing Kibana as a dev tool.

I have indexed the above tweets to an index named fb-post. A sample data document after indexing would look like below:

{
"_index" : "fb-post",
"_type" : "_doc",
"_id" : "TszxwG0Bm6hFGbtHjVCC",
"_score" : 1.0,
"_source" : {
"status_type" : "shared_story",
"link" : "http://abcnews.go.com/blogs/headlines/2011/12/chief-justice-roberts-responds-to-judicial-ethics-critics/",
"description" : "PAUL J. RICHARDS/AFP/Getty Images Chief Justice John Roberts issued a ringing endorsement Saturday night of his colleagues’ ability to determine when they should step down from a case because of a conflict of interest. “I have complete confidence in the capability of my colleagues to determine when ...",
"caption" : "abcnews.go.com",
"love_count" : 0,
"shares_count" : 12,
"page_id" : 86680728811,
"wow_count" : 0,
"post_type" : "link",
"id" : "86680728811_272953252761568",
"posted_at" : "2012-01-01 00:30:26",
"sad_count" : 0,
"angry_count" : 0,
"message" : "Roberts took the unusual step of devoting the majority of his annual report to the issue of judicial ethics.",
"picture" : "https://external.xx.fbcdn.net/safe_image.php?d=AQAPXteeHLT2K7Rb&w=130&h=130&url=http%3A%2F%2Fabcnews.go.com%2Fimages%2FPolitics%2Fgty_chief_justice_john_roberts_jt_111231_wblog.jpg&cfs=1&sx=108&sy=0&sw=269&sh=269",
"likes_count" : 61,
"thankful_count" : 0,
"@timestamp" : "2012-01-01T00:30:26.000+05:30",
"comments_count" : 27,
"name" : "Chief Justice Roberts Responds to Judicial Ethics Critics",
"haha_count" : 0
}
}

In the above document, our fields of interests are the text fields like “name”, “message” and “description”.

Now let us move to each of the full-text query one by one.

1. Match query

We have discussed the match query in the previous blog, but haven’t mentioned the normal use cases for the match query. The most common use case for the match query is when we have a large data set and we need to find some near-exact matches quickly.

For example, in our twitter data set, we need to find if there is a word “confidence” in the entire tweet collection. This might be done using the simple match query against the field “text” as below:

POST fb-post/_search
{
"query": {
"match": {
"description": {
"query":"confidence"
}
}
}
}

The results for this would show the tweets with the text “confidence”.
Now in the above example, we have seen given only a single word. What happens when we give multiple words?. Let us try the following query, and here the query we are going to give is “confidence buildings

POST fb-post/_search
{
"query": {
"match": {
"description": {
"query":"confidence buildings"
}
}
}
}

Now, this would return documents matching either “confidence” OR “buildings” . The default behavior of the match query is of OR. This can be changed. If we want to match both “confidence” AND “buildings”, we can specify the “operator” parameter in the query as below:

POST fb-post/_search
{
"query": {
"match": {
"description": {
"query":"confidence buildings",
"operator":"AND"
}
}
}
}

The above query would return the documents which contain both “confidence” AND “buildings” (which is zero in our data set)

2. multi-match query

The multi match query, as the name suggests, would search for the search keywords in more than one field. Suppose we have a search keyword “Giffords family”, to be searched in both the “name” and “description” field, we can make use of the multi-match query.

POST fb-post/_search
{
"query": {
"multi_match" : {
"query": "Giffords family",
"fields": [ "name", "description" ]
}
}
}

Here the words “Giffords” OR “family” is searched against the fields “name” and “description”, and the matching documents are returned.

We can also give custom scoring against the specific fields. In the below query, a boost of 5 is given to all the documents which match the keywords against the field “name”

POST fb-post/_search
{
"query": {
"multi_match" : {
"query": "Giffords family",
"fields": [ "name^5", "description" ]
}
}
}

3. query_string query

Another useful query is the query_string query. It is similar to the match query, but here, the format of the search keywords matter. It expects a specific format and if the search keyword is not of the same format, it returns an error.

Consider the following query:

POST fb-post/_search
{
"query": {
"query_string" : {
"query" : "(step down) OR (official act)"
}
}
}

Here, the search keyword is first split in to two parts, to the left of the OR and to the right of the OR condition. That is operators in the search query serves as the de-limiters. Then each part will be analyzed (according to the field which it is being queried, in the above example which queries all the fields, it would do a standard analying) and then queried.

Querying can be done against a particular field/fields also as below:

POST fb-post/_search
{
"query": {
"query_string" : {
"query" : "(step down) OR (official act)",
"fields" : ["description","name"]
}
}
}

4. match_phrase query

Match_phrase query is a particularly useful piece of query which looks for matching the phrase rather than individual words. Here in the example given below, the match_phrase query fetches documents which matches the words “deeply concerned” in the same order.

POST fb-post/_search
{
"query": {
"match_phrase" : {
"description" : "deeply concerned"
}
}
}

A very useful customization for the match_phrase query would be matching even if the order of the words are changed. For example, if we want “deeply concerned” and “concerned deeply” to get matched, we can use the help of the slop parameter with the match_phrase query as below:

POST fb-post/_search
{
"query": {
"match_phrase": {
"description" : {
"query" : "deeply concerned",
"slop": 2
}
}
}
}

The slop value defaults to 0 and ranges to a maximum of 50. In the above example, the slop value 2 indicates, how far the terms can be allowed to consider as a match.

Now consider the below query with an incomplete keyword “ab” at the end of the phrase. The match_phrase query does not give matches for this query even if there is document having “deeply concerned about phrase in the “description ”field

POST fb-post/_search
{
"query": {
"match_phrase" : {
"description" : "deeply concerned ab"
}
}
}

5. match_phrase_prefix query

In the above example, we saw that the match_phrase query needs exact phrases to do the matches. But sometimes it would be handy if we could match partial matches too with the match_phrase_prefix query. The “match_phrase_prefix” query helps us to achieve this type of matches.

POST fb-post/_search
{
"query": {
"match_phrase_prefix": {
"description" : "deeply concerned ab"
}
}
}

The above query can match phrases like the below:
“deeply concerned about
“deeply concerned above”

A practical use case would be the autocomplete implementation of zip codes where users type in partial phrases.

Conclusion

In this blog, we have seen a few important full-text queries in Elasticsearch query world. I will be writing about the term level queries in the next blog and then coming back to some of the special full-text queries, which will aid in a better understanding.

--

--