How Elastic Search Work?

Anurag Patel
Geek Farmer
Published in
5 min readDec 25, 2022

--

ES: Elastic Search

Prerequisites

In our previous article, we covered the basics of Elastic Search and its various uses. Now, we will delve deeper into how Elastic Search actually works. Understanding the inner workings of Elastic Search can help us better utilize its capabilities and troubleshoot any issues that may arise. So let’s take a closer look at how Elastic Search operates.

To understand How ES works, we should know about some basic concepts of Elastic Search.

  1. Logical Concepts: How ES organizes the data
  2. Backend Components

An Elastic Search cluster can contain multiple Indices (databases), which in turn contain multiple Types (tables). These types hold multiple Documents (rows), and each document has Properties (columns).

Logical Concepts:

Documents

Documents are basic unit of information. We store our data as documents which can be JSON objects. So how these data organized in the cluster? The answer is indices. In the world of relational databases, documents can be compared to a row in a table which represents a given entity.

Documents can be Text or Structured data encoded in JSON (data can be things like numbers, strings, and dates)

Example: A document can represent an encyclopedia article or log entries from a web server

Index

ES Indices are logical partitions of documents and can be compared to a database in the world of relational databases. An index is the highest level entity that you can query against in ES. For Example: We can have index for cases, providers, vehicles, policies.

Type

Each index has one or more mapping types that are used to divide documents into a logical group. It can be compared to a table in the world of relational databases.

Inverted Index

It is designed to allow very fast full-text searches. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in. It is a hashmap-like data structure that directs you from a word to a document.

what happens if we have a typo in our search query, ie. what happens if we search for Jefferies instead of Jeffery?

For cases like above, TextAnalyzer plays an important part. In the case of full-text search, analyzers are responsible for tokenizing, normalizing, stemming, stopword removal of the input documents, etc so that search results give correct results.

POST _analyze
{
"tokenizer": "letter",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

[ The, QUICK, Brown, Foxes, jumped, over, the, lazy, dog, s, bone ]

Backend Components:

Cluster

In ES, data is stored in nodes, there can be n number of nodes in a machine. And each node is related to the cluster. So the Cluster is a set of nodes

Node

A node is a single server that is a part of a cluster. A node stores data and participates in the cluster’s indexing and search capabilities. An ES node can be configured in different ways:

  • Master Node: Controls the ES cluster and is responsible for all cluster-wide operations like creating/deleting an index and adding/removing nodes.
  • Data Node: Stores data and executes data-related operations such as search and aggregation.
  • Client Node: Forwards cluster requests to the master node and data-related requests to data nodes.

Shards

ES provides the ability to subdivide the index into multiple pieces called shards. Shard is just like an index. For scalability. With sharding, you can store billions of documents within the one index.

Replicas

ES allows you to make one or more copies of your index’s shards which are called “replica shards” or just “replicas”. Basically, a replica shard is a copy of a primary shard. Each document in an index belongs to one primary shard. Replicas provide redundant copies of your data to protect against hardware failure and increase capacity to serve read requests like searching or retrieving a document.

A replica is a copy of primary shard and have two purposes:

  1. Increase Failover: It can be promoted to primary shared if primary fails
  2. Increase Performance: Get and Search requests can be handled by primary or replica shards

Working: Summary

Raw data flows into ES from a variety of sources, including logs, system metrics, and web applications. Data ingestion is the process by which this raw data is parsed, normalized, and enriched before it is indexed in ES. An ES index is a collection of documents that are related to each other. ES stores data as JSON documents. Each document correlates a set of keys (names of fields or properties) with their corresponding values (strings, numbers, Booleans, dates, arrays of values, geolocations, or other types of data).

ES uses a data structure called an inverted index, which is designed to allow very fast full-text searches. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.

During the indexing process, ES stores documents and builds an inverted index to make the document data searchable in near real-time. Indexing is initiated with the index API, through which you can add or update a JSON document in a specific index.

References

Elasticsearch from the Bottom Up, Part 1

Tokenizer reference | Elasticsearch Guide [8.5] | Elastic

Normalizers | Elasticsearch Guide [8.5] | Elastic

Stemming | Elasticsearch Guide [8.5] | Elastic

I hope you enjoyed reading about Elastic Search and how it works. If you found this article helpful or have any further questions, please don’t hesitate to reach out to me through the comments.

For more updates and insights on the latest tech trends, be sure to follow me on Twitter or LinkedIn. Thanks for reading, and I look forward to connecting with you on social media.

Twitter: https://twitter.com/geekfarmer_

Linkedin: https://www.linkedin.com/in/geekfarmer

--

--