How to create faceted/filtered search with Elasticsearch?
Elasticsearch is a powerful search engine that makes it easy for us to search, filter and aggregate documents. In this article, I will show you how to create basic search function including facets/filters with events example.
Step 1 — Setup Elasticsearch and Kibana with Docker
Create a docker-compose.yml
file and add following lines in it:
version: '3'services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:6.2.4
container_name: elasticsearch
ports:
- "9200:9200"
kibana:
image: docker.elastic.co/kibana/kibana:6.2.4
container_name: kibana
ports:
- "5601:5601"
depends_on: ['elasticsearch']
Then run docker-compose up -d
command to setup Elasticsearch and Kibana. We will use Kibana dev tools to make rest queries to Elasticsearch to make it easy. You can use any rest client or using cURL to make requests to Elasticsearch. If you wait less than 1 minute, you should be able to access Elasticsearch at http://localhost:9200
and Kibana dev tools at http://localhost:5601/app/kibana#/dev_tools
Step 2 — Create index and add documents
At first, we must create an index with its settings and mapping. Analyzers and token filters should be defined in settings. How fields analyzed and their types should be defined in mapping before adding any document to index because documents get analyzed while they are indexed. Let’s open Kibana dev tools and create events
index by running following command:
PUT events
{
"settings": {
"analysis": {
"filter": {
"englishStopWords": {
"type": "stop",
"stopwords": "_english_"
}
},
"analyzer": {
"eventNameAnalyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"englishStopWords"
]
}
}
}
},
"mappings": {
"event": {
"properties": {
"eventName": {
"type": "keyword",
"fields": {
"analyzed": {
"type": "text",
"analyzer": "eventNameAnalyzer",
"search_analyzer": "eventNameAnalyzer"
}
}
},
"category": {
"type": "keyword",
"fields": {
"analyzed": {
"type": "text",
"analyzer": "eventNameAnalyzer",
"search_analyzer": "eventNameAnalyzer"
}
}
},
"location": {
"type": "keyword"
},
"price": {
"type": "float"
}
}
}
}
}
In settings, there are built-in lowercase
and englishStopWord
token filters which used in filter section of our analyzer which is called eventNameAnalyzer
. While indexing documents, fields that are analyzed with eventNameAnalyzer
will be lowercased(to be case in-sensitive) and stopwords like a, an, and, the, at
will be removed.
In the mapping, type called event
is defined. event
type will have eventName, category, location and price
fields. As you see in the mapping, eventName
and category
fields’s types arekeyword
and they have sub fields called analyzed
with type text
. keyword
means that these fields will not be analyzed and will be indexed as they are. text
means that, the fields (which are sub fields here) will be analyzed and analyzer at analyzer
field (eventNameAnalyzer
in this example) will be used to analyze the text at index time. search_analyzer
is used when you need to use another analyzer at search time, but we don’t need it in this example.eventName.analyzed
and category.analyzed
fields are the fields we will use to search documents.
You can now check how a string is tokenized into its tokens by our analyzer by running following query on Kibana:
POST events/_analyze
{
"text": ["Elasticsearch is a POWERFUL search engine that makes it easy for us to search"],
"analyzer": "eventNameAnalyzer"
}
You can see the sentence is tokenized into elasticsearch, powerful, search, engine, makes, easy, us, search
tokens which are lowercased and stop words are removed.
Now, lets add documents to our index before start writing queries:
PUT events/event/1
{
"eventName": "How to process streams with Kafka Streams?",
"category": "Software Development",
"location": "Istanbul",
"price": 2300
}PUT events/event/2
{
"eventName": "Real Madrid vs Liverpool FC - UEFA Champions League 2017-18",
"category": "Football",
"location": "Kiev",
"price": 3450
}PUT events/event/3
{
"eventName": "Deep Learning Conference",
"category": "Software Development",
"location": "Istanbul",
"price": 300
}PUT events/event/4
{
"eventName": "Boston Celtics vs Philadelphia 76ers Basketball Playoff Game",
"location": "Boston",
"category": "Basketball",
"price": 450
}PUT events/event/5
{
"eventName": "Fenerbahce vs. Zalgiris Kaunas Euroleague Playoff Game",
"category": "Basketball",
"location": "Istanbul",
"price": 1000
}
Note: If you did something wrong while creating index and documents, don’t worry. You can delete the index and documents with following commands:
DELETE events/event/1 => delete document with id 1
DELETE events => delete events index
Step 3 — Searching documents
We will use multi_match
query to search against multiple fields. In following query, we are searching for basketball
search term in fields specified:
POST events/event/_search
{
"query": {
"multi_match": {
"query": "basketball",
"fields": [
"eventName.analyzed",
"category.analyzed"
],
"operator": "and",
"type": "best_fields"
}
}
}operators => ["and", "or"]
types => ["best_fields", "most_fields", "cross_fields", "phrase", "phrase_prefix"]
This query will return documents with ids 4 and 5. Document with id 4 matched both from the event name and category but the document with id 5 only matched from the category. If you check the _score
field, you can see that document 4 has better score than 5 so it came first. If want to see how score is calculated, you can read this documentation and add "explain": true
to query to see matched fields and keywords in details. explain
will decrease the performance and should be used only when testing the query.
When you search for “boston basketball”
you still get the documents 4 and 5 although “boston”
text does not exist in anywhere in document 5. That’s because multi_match query does or
search as default. You should add "operator":"and"
to multi_match query to do an and
search. However, when “fenerbahce basketball”
searched, unlike previous search, you didn’t get any result. That’s because default value of type
field of multi_match
query is best_fields
and it expect both tokens to exist in single field and calculates score according to field with best score. You can check official Elasticsearch documentation to find best type for your case and other features of multi_match query.
Step 4 — Creating facets
We will use aggregations to create facets. First do not search anything to see all aggregations. Add “size”: 0
to the query to not to retrieve any documents but aggregation results.
POST events/event/_search
{
"size": 0,
"aggs": {
"Category Filter": {
"terms": {
"field": "category",
"size": 10
}
},
"Location Filter": {
"terms": {
"field": "location",
"size": 10
}
},
"Price Filter": {
"range": {
"field": "price",
"ranges": [
{
"from": 0,
"to": 1000
},
{
"from": 1000,
"to": 2000
},
{
"from": 2000,
"to": 3000
}
]
}
}
}
}
If you run the query above and check aggregations
section in results, you can see facets, which are actually aggregations results. We created category and location facets using terms aggregation
and price range filter with range aggregation
. As you see, all aggregations are sorted by doc_count
which is the default behavior.
If you merge queries in Step 3 and Step 4, you can see aggregation results also narrowed after our search. This is the final query with search term updated:
POST events/event/_search
{
"query": {
"multi_match": {
"query": "game",
"fields": [
"eventName.analyzed",
"category.analyzed"
],
"operator": "and",
"type": "best_fields"
}
},
"aggs": {
"Category Filter": {
"terms": {
"field": "category",
"size": 10
}
},
"Location Filter": {
"terms": {
"field": "location",
"size": 10
}
},
"Price Filter": {
"range": {
"field": "price",
"ranges": [
{
"from": 0,
"to": 1000
},
{
"from": 1000,
"to": 2000
},
{
"from": 2000,
"to": 3000
}
]
}
}
}
}
In conclusion, it is very easy with Elasticsearch to create basic faceted search functionality. Although it depends on the server specs, document counts, sizes, and query complexities(complex requirements can really make queries complex), Elasticsearch is able to return results very quickly and with high performance.
References:
Elasticsearch Multi Match query documentation
Elasticsearch Token Filter documentation
Elasticsearch Analyzer documentation
Elasticsearch Terms aggregation documentation
Elasticsearch Range aggregation documentation
Images from: Elastic, Hepsiburada