How to create faceted/filtered search with Elasticsearch?

Published in

hepsiburadatech

5 min readMay 6, 2018

Elasticsearch is a powerful search engine that makes it easy for us to search, filter and aggregate documents. In this article, I will show you how to create basic search function including facets/filters with events example.

Step 1 — Setup Elasticsearch and Kibana with Docker

Create a docker-compose.yml file and add following lines in it:

version: '3'services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.2.4
    container_name: elasticsearch
    ports: 
      - "9200:9200"
  kibana:
    image: docker.elastic.co/kibana/kibana:6.2.4
    container_name: kibana
    ports: 
      - "5601:5601"
    depends_on: ['elasticsearch']

Then run docker-compose up -d command to setup Elasticsearch and Kibana. We will use Kibana dev tools to make rest queries to Elasticsearch to make it easy. You can use any rest client or using cURL to make requests to Elasticsearch. If you wait less than 1 minute, you should be able to access Elasticsearch at http://localhost:9200 and Kibana dev tools at http://localhost:5601/app/kibana#/dev_tools

Step 2 — Create index and add documents

At first, we must create an index with its settings and mapping. Analyzers and token filters should be defined in settings. How fields analyzed and their types should be defined in mapping before adding any document to index because documents get analyzed while they are indexed. Let’s open Kibana dev tools and create events index by running following command:

PUT events
{
 "settings": {
  "analysis": {
   "filter": {
    "englishStopWords": {
     "type": "stop",
     "stopwords": "_english_"
    }
   },
   "analyzer": {
    "eventNameAnalyzer": {
     "tokenizer": "standard",
     "filter": [
      "lowercase",
      "englishStopWords"
     ]
    }
   }
  }
 },
 "mappings": {
  "event": {
   "properties": {
    "eventName": {
     "type": "keyword",
     "fields": {
      "analyzed": {
       "type": "text",
       "analyzer": "eventNameAnalyzer",
       "search_analyzer": "eventNameAnalyzer"
      }
     }
    },
    "category": {
     "type": "keyword",
     "fields": {
      "analyzed": {
       "type": "text",
       "analyzer": "eventNameAnalyzer",
       "search_analyzer": "eventNameAnalyzer"
      }
     }
    },
    "location": {
     "type": "keyword"
    },
    "price": {
     "type": "float"
    }
   }
  }
 }
}

In settings, there are built-in lowercaseand englishStopWord token filters which used in filter section of our analyzer which is called eventNameAnalyzer. While indexing documents, fields that are analyzed with eventNameAnalyzer will be lowercased(to be case in-sensitive) and stopwords like a, an, and, the, at will be removed.

In the mapping, type called event is defined. event type will have eventName, category, location and price fields. As you see in the mapping, eventName and categoryfields’s types arekeyword and they have sub fields called analyzed with type text . keyword means that these fields will not be analyzed and will be indexed as they are. text means that, the fields (which are sub fields here) will be analyzed and analyzer at analyzer field (eventNameAnalyzer in this example) will be used to analyze the text at index time. search_analyzer is used when you need to use another analyzer at search time, but we don’t need it in this example.eventName.analyzed and category.analyzed fields are the fields we will use to search documents.

You can now check how a string is tokenized into its tokens by our analyzer by running following query on Kibana:

POST events/_analyze
{
  "text": ["Elasticsearch is a POWERFUL search engine that makes it easy for us to search"],
  "analyzer": "eventNameAnalyzer"
}

You can see the sentence is tokenized into elasticsearch, powerful, search, engine, makes, easy, us, search tokens which are lowercased and stop words are removed.

Now, lets add documents to our index before start writing queries:

PUT events/event/1
{
  "eventName": "How to process streams with Kafka Streams?",
  "category": "Software Development",
  "location": "Istanbul",
  "price": 2300 
}PUT events/event/2
{
  "eventName": "Real Madrid vs Liverpool FC - UEFA Champions League 2017-18",
  "category": "Football",
  "location": "Kiev",
  "price": 3450 
}PUT events/event/3
{
  "eventName": "Deep Learning Conference",
  "category": "Software Development",
  "location": "Istanbul",
  "price": 300 
}PUT events/event/4
{
  "eventName": "Boston Celtics vs Philadelphia 76ers Basketball Playoff Game",
  "location": "Boston",
  "category": "Basketball",
  "price": 450 
}PUT events/event/5
{
  "eventName": "Fenerbahce vs. Zalgiris Kaunas Euroleague Playoff Game",
  "category": "Basketball",
  "location": "Istanbul",
  "price": 1000 
}

Note: If you did something wrong while creating index and documents, don’t worry. You can delete the index and documents with following commands:

DELETE events/event/1   => delete document with id 1
DELETE events           => delete events index

Step 3 — Searching documents

We will use multi_match query to search against multiple fields. In following query, we are searching for basketball search term in fields specified:

POST events/event/_search
{
  "query": {
    "multi_match": {
      "query": "basketball",
      "fields": [
        "eventName.analyzed",
        "category.analyzed"
      ],
      "operator": "and",
      "type": "best_fields" 
    }
  }
}operators => ["and", "or"]
types => ["best_fields", "most_fields", "cross_fields", "phrase", "phrase_prefix"]

This query will return documents with ids 4 and 5. Document with id 4 matched both from the event name and category but the document with id 5 only matched from the category. If you check the _score field, you can see that document 4 has better score than 5 so it came first. If want to see how score is calculated, you can read this documentation and add "explain": true to query to see matched fields and keywords in details. explain will decrease the performance and should be used only when testing the query.

When you search for “boston basketball” you still get the documents 4 and 5 although “boston” text does not exist in anywhere in document 5. That’s because multi_match query does or search as default. You should add "operator":"and" to multi_match query to do an and search. However, when “fenerbahce basketball” searched, unlike previous search, you didn’t get any result. That’s because default value of type field of multi_match query is best_fields and it expect both tokens to exist in single field and calculates score according to field with best score. You can check official Elasticsearch documentation to find best type for your case and other features of multi_match query.

Step 4 — Creating facets

We will use aggregations to create facets. First do not search anything to see all aggregations. Add “size”: 0 to the query to not to retrieve any documents but aggregation results.

POST events/event/_search
{
  "size": 0, 
  "aggs": {
    "Category Filter": {
      "terms": {
        "field": "category",
        "size": 10
      }
    },
    "Location Filter": {
      "terms": {
        "field": "location",
        "size": 10
      }
    },
    "Price Filter": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 1000
          },
          {
            "from": 1000,
            "to": 2000
          },
          {
            "from": 2000,
            "to": 3000
          }
        ]
      }
    }
  }
}

If you run the query above and check aggregations section in results, you can see facets, which are actually aggregations results. We created category and location facets using terms aggregationand price range filter with range aggregation. As you see, all aggregations are sorted by doc_count which is the default behavior.

If you merge queries in Step 3 and Step 4, you can see aggregation results also narrowed after our search. This is the final query with search term updated:

POST events/event/_search
{
  "query": {
    "multi_match": {
      "query": "game",
      "fields": [
        "eventName.analyzed",
        "category.analyzed"
      ],
      "operator": "and",
      "type": "best_fields"
    }
  },
  "aggs": {
    "Category Filter": {
      "terms": {
        "field": "category",
        "size": 10
      }
    },
    "Location Filter": {
      "terms": {
        "field": "location",
        "size": 10
      }
    },
    "Price Filter": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 1000
          },
          {
            "from": 1000,
            "to": 2000
          },
          {
            "from": 2000,
            "to": 3000
          }
        ]
      }
    }
  }
}

In conclusion, it is very easy with Elasticsearch to create basic faceted search functionality. Although it depends on the server specs, document counts, sizes, and query complexities(complex requirements can really make queries complex), Elasticsearch is able to return results very quickly and with high performance.

References:

Elasticsearch Multi Match query documentation

Elasticsearch Token Filter documentation

Elasticsearch Analyzer documentation

Elasticsearch Terms aggregation documentation

Elasticsearch Range aggregation documentation

Images from: Elastic, Hepsiburada