Learn Elasticsearch by Practice Part 1

Sevcan Doğramacı
4 min readSep 23, 2023

--

Data. We are all surrounded by it. In the software industry, an enormous amount of data is generated for running better businesses. To achieve this, data need to be stored and certain operations need to be performed like searching and analyzing.

This is where Elasticsearch comes to help us. In this post, we will dive into how data is stored in Elasticsearch, mapping, and settings for an index.

Photo by Markus Winkler on Unsplash

What is Elasticsearch?

It is a distributed, search and analytics engine that centrally stores our data so we can search, index, and analyze the data. It is a document-based database where each document is represented in JSON format.

How data is stored?

Data is stored in a data structure called an inverted index in Elasticsearch. This data structure basically tokenizes inputs and maps the token to where it exists.

Each field in a document is indexed separately. For example, there is product data with id, title, and description fields. The documents’ description fields are tokenized and then indexed.

What is mapping?

I have just stated that each field is indexed separately. A field value can be a boolean, integer, float, string, etc. In Elasticsearch, we define the settings for how to index a field in mappings. We can either use dynamic mapping or explicit mapping.

Let’s start with dynamic mapping, and we’ll define mappings explicitly in the following series.

  1. Run docker-compose up.
  2. Go to http://localhost:5601/app/dev_tools#/console.
  3. Create products index by inserting documents.
POST products/_doc
{
"id": 1,
"title": "Knit Sweater",
"description": "Description of a beautiful knit sweater."
}

POST products/_doc
{
"id": 2,
"title": "Cotton Sweater",
"description": "Description of a beautiful cotton sweater."
}

4. Run GET products to get info about the index.

Elasticsearch mapped the description and title fields as text, and the id field as long. Good job 😎 But wait! Also it indexed the description and title as keyword. Why?

The main difference between text and keyword mapping is that text fields are used for partial matching whereas keyword fields are used for full matching. This is because text fields are analyzed (tokenized and applied analyzer functions if defined via analyzer) before indexing.

5. See how a text is analyzed.

GET products/_analyze
{
"text": "Description of a beautiful knit sweater."
}

6. Run the following match query.

GET products/_search
{
"query": {
"match": {
"description": "Description of a beautiful knit sweater."
}
}
}

Since the description field is indexed as text and analyzed, it returns all the products we have inserted.

However, for the keyword fields, this analysis step is not applied. They are indexed as is. Thanks to this, we are able to use them for exact match searches.

7. Run the following match query again using the description keyword.

GET products/_search
{
"query": {
"match": {
"description.keyword": "Description of a beautiful knit sweater."
}
}
}

Now, it is just returning one document using exact matching.

What is setting?

This part contains settings for the index. The setting can be either static (set only at index creation) or dynamic (can be modified later).

Elasticsearch created our products index using default settings. It is configured to store data in content data nodes using one primary shard and one replica shard. These settings and more can be used to optimally and securely keep the data in the cluster.

To sum up, we covered what is inverted index Elasticsearch uses, mapping, and setting for an index. That was all for this post, see u in the next episodes 🧐 🚗

References:

--

--