Search Made Simple: Your Introductory Guide to Elastic Search

Published in

Engineering at Bajaj Health

11 min readJun 29, 2024

Hello everyone, at Bajaj Finserv Health, we utilize a variety of databases like Mysql, mongodb but a significant portion of our data is stored in Elasticsearch. Additionally, we employ ELK as our logging tool. In this article, we will delve into the fundamentals of Elasticsearch, exploring its terminology, querying capabilities, and aggregation features.

What is Elastic?

Elasticsearch functions as a robust search engine, analogous to popular search engines like Google. It allows users to execute queries using keywords or specific data, generating results comparable to search engine pages. Importantly, Elasticsearch is open-source and freely accessible.

Terminology in Elastic

Elastic is every similar to mysql in terms of terminology, i am sharing few important terminologies below

Installation

You can download Elasticsearch using the following link. After downloading, navigate to your Downloads directory and follow these steps:

>> Downloads/elastic-8.10.2/bin
>> ./elasticsearch

Open the browser and type
https://localhost:9200/


{
  "name" : "M-Y4Y17GV9FQ.local",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "4-UPSPQ8Ss--cG_zvz6M5g",
  "version" : {
    "number" : "8.10.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "6d20dd8ce62365be9b1aca96427de4622e970e9e",
    "build_date" : "2023-09-19T08:16:24.564900370Z",
    "build_snapshot" : false,
    "lucene_version" : "9.7.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Kibana Installation

Now that Elasticsearch is installed, let’s move on to installing Kibana for visualization. Follow these steps to get Kibana up and running:

Download Kibana using the provided link.
After the download is complete, open your terminal.
Navigate to the Downloads directory where the Kibana package is located.
Execute the necessary commands to set up and run Kibana.

These steps will help you install Kibana and enable data visualization capabilities to complement your Elasticsearch installation

>> Downloads/kibana-8.10.2/bin
>> ./kibana

Be patient as the process of starting Kibana may take some time and when done open your browser with http://localhost:5601/

You will encounter the enrolment token screen. To generate a new token, open your terminal and enter the following command: bin/elasticsearch-create-enrollment-token -scope kibana and you should be redirected to the homepage.

Operations in Elastic

Now that Kibana is set up, let’s explore the CRUD (Create, Read, Update, Delete) operations you can perform. However, before you can dive into these operations, you’ll need some data. Fortunately, Elasticsearch provides sample data that you can use to get started and experiment with these operations

*Click view data and view the sample dashboard*

In addition to the provided sample data, you might want to create your own custom data. Elasticsearch makes this possible through the use of Dev Tools, which can be accessed via the hamburger menu. This tool will enable you to craft and manage your data for specific use cases. Your Dev Tool screen should look like below

Queries in Elasticsearch are formulated in JSON format. To get acquainted with some queries that will be utilized in CRUD operations, let’s begin by learning how to determine the number of indexes (or tables) within Elasticsearch.

Elasticsearch provides a Linux-like command, _cat, that allows us to achieve this. Here's the command: GET _cat/indices?v=true.
When you run this command, it will provide you with information about how many indices exist within your Elasticsearch cluster.

It’s important to note that Elasticsearch supports a variety of request methods, including GET, POST, and DELETE, which can be utilized for different operations

GET _cat/indices?v=true

POST learning/_doc/1
{
  "title": "Learning Elastic"
}

GET /learning/_search

POST learning/_doc/2
{
  "title": "Learning Kibana"
}

DELETE learning/_doc/1

# Get All indices
GET _cat/indices

Here are some basic CRUD operations we did above:

POST: Use this to create a new index named “learning” with an ID of 1.
GET: Search for all documents with the name “learning.”
POST: Create another new index named “learning” with an ID of 2.
DELETE: Delete an index with an ID of 1.
GET: Retrieve a list of all available indices.

For updating data, you can still utilize the POST method as demonstrated below. Elasticsearch provides a flexible approach to manage your data.

POST learning/_update/2
{
  "doc": {
    "title": "Learning Kibana updated"
  }
}

You might have observed that in the previous examples, we predefined the document IDs. However, if you omit this step when creating indices in Elasticsearch, Elastic will automatically generate unique IDs for your documents. This allows for effortless document creation without specifying IDs manually.

POST learning/_doc
{
  "title": "Index without specifying an Id" 
}

Response:

{
  "_index": "learning",
  "_id": "k5Hg64oBiOFHM0fkxa2q",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 5,
  "_primary_term": 1
}

After creating a new index, the next step is to search for documents within that index. Elasticsearch enables you to perform searches using key-value pairs.
For instance, if we’ve created a field called “title” with a value of “learning,” we can perform a search in the browser using this key-value pair. Elasticsearch will return matching indices based on the provided criteria. This is a powerful way to retrieve specific data from your indices

https://localhost:9200/learning/_search?q=title:learning

{
  "took": 35,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.5619608,
    "hits": [
      {
        "_index": "learning",
        "_id": "1",
        "_score": 0.5619608,
        "_source": {
          "title": "Learning Elastic"
        }
      },
      {
        "_index": "learning",
        "_id": "2",
        "_score": 0.49005115,
        "_source": {
          "title": "Learning Kibana updated"
        }
      }
    ]
  }
}

In the response from Elasticsearch, you’ll notice several keys such as timed_out, hits, and shards. Let's explore the meanings behind these keys:

timed_out: This key indicates whether the query execution timed out or not. If it is set to true, it means the query took longer to execute than the specified timeout. In many instances, your query may not produce results for the entire dataset and might return partial results based on a subset of the data. In such cases, this condition would hold true
hits: The hits key provides information about the search results, including the total number of hits and an array of individual hits (documents) that match the query criteria.
shards: Elasticsearch divides indexes into smaller units called shards, and this key offers details about the shard-level execution of the query, including successful and failed shards. It’s worth noting that in certain cases, we may encounter situations where the relation leads to significantly more or fewer results, possibly reaching up to 100,000 results

Understanding these keys in the Elasticsearch response helps you interpret and utilize the search results effectively

Let’s delve deeper to understand how Elasticsearch performs document searches.

Tokenization In Elastic

When we save a document with a title like “Learning Elastic,” Elasticsearch stores the data as tokens or full text. Let’s consider an example with three documents:

Title: “Learning Elastic”
Title: “Learning Kibana”
Title: “Learning Python”

In this case, Elasticsearch will index the data by breaking it down into individual terms, resulting in four terms: “learning,” “elastic,” “kibana,” and “python.” Understanding how Elasticsearch indexes and stores terms is crucial for efficient searching and retrieval of data.

Indeed, as you can see, Elasticsearch indexes the data with terms like “Learning,” “Elastic,” “Kibana,” and “Python.” It maintains an index of where these terms exist in the documents. For instance:

“Learning” appears in all three documents.
“Elastic” is found in the first document.
“Kibana” is present in the second document.
“Python” is located in the third document.

When you search for a term like “Kibana,” Elasticsearch looks up where that term exists in its index and returns the document or documents that match the search criteria. This indexing mechanism enables efficient and accurate document retrieval.

Schema in Elastic Search

Before delving into the concept of a schema in Elasticsearch, it’s essential to grasp the notion of mapping. If you’ve already created a learning index, you can explore the default mapping that Elasticsearch has generated for it. This mapping provides insights into how Elasticsearch has structured and indexed your data, which is vital for effective data management and retrieval

GET /learning/_mapping

Response: 
{
  "learning": {
    "mappings": {
      "properties": {
        "title": {
          "type": "text", 
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

As you can observe in the response, Elasticsearch sets the data type for the “title” field as “text” by default. This means that when you insert data into Elasticsearch, it stores the information in a fully tokenized form. This allows you to implement full-text search capabilities.
Additionally, Elasticsearch retains the complete text as a single token, enabling various other types of operations on the data. Understanding how Elasticsearch handles data types is essential for tailoring your searches and operations to your specific needs

Create Mapping

Let’s try creating a custom mapping in Elasticsearch. We’ll start by creating a new index named “learning3” and defining our mappings to tailor the data structure according to our requirements. This customization allows us to optimize Elasticsearch for our specific use case and data model

PUT learning3

POST /learning3/_mapping
{
  "properties": {
    "title": {
      "type": "text"
    },
    "subTile": {
      "type": "keyword"
    },
    "age": {
      "type": "integer"
    }
  }
}
  
GET learning3/_mapping

Response: 
{
  "learning3": {
    "mappings": {
      "properties": {
        "age": {
          "type": "integer"
        },
        "subTile": {
          "type": "keyword"
        },
        "title": {
          "type": "text"
        }
      }
    }
  }
}

Elasticsearch offers a wide range of data types, including common ones like long, float, and double, similar to what you’d find in MySQL. However, Elasticsearch also provides flexibility for handling complex data structures, such as JSON objects. This capability enables you to store and work with structured and nested data within Elasticsearch, making it a versatile tool for managing diverse types of information

{
  user: aatif
  id: 1
  age: 30
}

{
  user: rahul
  id: 2
  age: 27
}

POST /learning4/_doc/
{
  "usersInfo": [
    {
    "userId": "1",
     "userName": "aatif"
    },
    {
    "userId": "2",
     "userName": "rahul"
    }
  ]
}

Storing complex data in Elasticsearch is certainly possible, as demonstrated earlier. Now, let’s explore how we can define custom mappings to accommodate and structure this complex data efficiently. This mapping customization allows us to specify how Elasticsearch should index and manage the nested fields and objects within our documents, ensuring optimal data handling

PUT /learning5

POST /learning5/_mapping/
{
  "properties": {
    "userInfo" : {
      "type": "object",
        "properties": {
            "userId" : {
                "type": "long"
             },
              "userName" : {
                "type": "text"
             }
         }
    }
  }
}

POST /learning5/_doc/
{
  "usersInfo": [
    {
    "userId": "13323abc",
     "userName": "aatif"
    },
    {
    "userId": "2dfnfkk23",
     "userName": "rahul"
    }
  ]
}

Indeed, this is how you can create and manage complex data types, including nested structures, within Elasticsearch. It provides the flexibility needed to work with diverse data formats efficiently and effectively.

Queries in Elastic

Queries in Elasticsearch are a powerful way to retrieve and search through extensive datasets. As you may recall, we previously installed sample data for an e-commerce application, which provides a practical context for demonstrating various types of queries and their applications within Elasticsearch.

GET /kibana_sample_data_ecommerce/_search

Response: 
"hits": [
      {
        "_index": "kibana_sample_data_ecommerce",
        "_id": "UJGl6ooBiOFHM0fkH5sn",
        "_score": 1,
        "_source": {
          "category": [
            "Men's Clothing"
          ],
          "currency": "EUR",
          "customer_first_name": "Eddie",
          "customer_full_name": "Eddie Underwood",
          "customer_gender": "MALE",
          "customer_id": 38,
          "customer_last_name": "Underwood",
          "customer_phone": "",
          "day_of_week": "Monday",
          "day_of_week_i": 0,
          "email": "eddie@underwood-family.zzz",
          "manufacturer": [
            "Elitelligence",
            "Oceanavigations"
          ],

Let’s put our understanding of Elasticsearch queries into action by applying them to the existing e-commerce dataset. This hands-on approach will allow us to explore the real-world use cases and practical applications of Elasticsearch queries in a meaningful context.
Below are some common types of queries used in Elasticsearch:

Term Query: The term query is used to find exact matches for a specified term in a field.
Match Query: This query is used to search for a specific term or phrase in a field. It can perform full-text searches.
Range Query: You can use this query to search for documents within a specified range of values in a numeric or date field.
Bool Query: The bool query allows you to combine multiple queries with boolean operators (must, should, must_not) to create complex queries.
Multi-Match Query: This query allows you to search for a term in multiple fields.

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "term": {
      "customer_gender": {
        "value": "MALE"
      }
    }
  }
}

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "terms": {
      "category": [
        "men",
        "shoes"
      ]
    }
  }
}

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match": {
      "products.discount_amount": "0"
    }
  }
}

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "range": {
      "products.base_price": {
        "gte": 20,
        "lte": 100
      }
    }
  }
}

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "products.manufacturer": "Oceanavigations"
          }
        }
        
        
      ],
      "should": [
        {
          "range": {
            "products.price": {
              "gte": 10,
              "lte": 100
            }
          }
        }
      ]
    }
  }
}

GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "multi_match": {
      "query": "MALE",
      "fields": ["customer_gender", "day_of_week"]
    }
  }
}

The Elasticsearch queries shared above are among the most commonly used ones. These queries are designed to retrieve data based on matching results, making our search and filtering easier.

Aggregations in Elastic

In Elasticsearch, aggregations are a powerful feature that allows you to perform data analysis and computation on your dataset to gain valuable insights. Aggregations enable you to summarise, group, and process data in various ways, similar to the GROUP BY clause in SQL.

With aggregations, you can perform operations such as:

GET /kibana_sample_data_ecommerce/_search
// GET me all type of categories available
{
  "size": 0, // used to see 
  "aggs": {
    "getNameOfAllCategories": { //getTypeOfAllCategories: you can name anything
      "terms": {
        "field": "category.keyword",
        "size": 100 // default is 10 // Will fetch 100 categories
      }
    }
  }
}

Response....>>>>

"aggregations": {
    "getTypeOfAllCategories": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Men's Clothing",
          "doc_count": 2024
        },
        {
          "key": "Women's Clothing",
          "doc_count": 1903
        },
        {
          "key": "Women's Shoes",
          "doc_count": 1136
        }
      ]
    }
  }

In the response, we obtained a list of unique categories available in the dataset. By default, Elasticsearch returns a maximum of 10 documents, but in our case, we needed a count of 100 categories. To achieve this, we adjusted the “size” field in our query to request a larger result set, accommodating our specific requirements.

The aggregation mentioned above is indeed a bucket aggregation, but Elasticsearch also provides another type of aggregation called Metric aggregation. Suppose we want to determine which product has the highest base price, we can utilize a metric aggregation to achieve this.

We can use the max keyword to retrieve the maximum value from Elasticsearch for a specific numeric field, such as the base price. Here's an example of how to structure the query:

GET /kibana_sample_data_ecommerce/_search
// GET me maximum value of base price
{
  "size": 0, // used to see 
  "aggs": {
    "getMaxPrice": {
      "max": {
        "field": "products.base_price"
      }
    }
  }
}

Respone >>>>

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "getMaxPrice": {
      "value": 1080
    }
  }
}

This query will return the maximum base price among the products in your Elasticsearch index.

Additionally, you mentioned using the min keyword. You can use a similar approach to find the minimum or average value by replacing "max" with "min" or "avg”in the aggregation query.

Keep in mind that these queries are simplified examples, and you can combine them with other aggregations or query clauses as needed to retrieve more complex information from your Elasticsearch data.

Well, there you have it — the essentials to kickstart your journey with ELK.
Remember that Elasticsearch is a powerful and versatile tool, so you can tailor your learning journey to match your specific needs and interests.

I hope you found this article engaging and gained some valuable insights.

If you enjoyed this content, don’t forget to follow me on Twitter and subscribe here for more captivating articles.
Show your appreciation with a round of applause 👏 👏 before you leave