Architecture of Elasticsearch

Published in

YavarTechWorks

3 min readOct 24, 2022

Hi friends, In this blog, I will share the Architecture of Elasticsearch.

Elasticsearch is open-source analytics and full-search engine. We can build complex search functionality, like google search. We can build a powerful search engine and analytical platform.

Node:

Node is an instance of Elasticsearch. It actually stores data. We can add as many as we want.

We can run any number of nodes on the same machine.

Cluster:

A cluster is a collection of all related Nodes. Typically one cluster is usually enough. We might run multiple clusters.

How do we create a Cluster:

Once a Node started, it automatically created a cluster or join in the existing cluster if we configured so.

There are some problems in the single node, in terms of availability and scalability. But development is enough for a cluster with a single Node.

How data is organized and stored:

Each unit of data we stored in the cluster is called a document. Documents are JSON objects.

{
"_index": "people",
"_source": {
 "name": "Edison",
 "age": "26"
  }
}

Every object in the document is stored within the index. The index groups the document logically, as well as provide configuration object for scalability and availability. An index is a collection of similar objects. Search query performs against indices.

Sharding and scalability:

Sharding is a way to divide indices into separate spices. Each spice is a shard. Notice sharding is done index level, not the Node or Cluster level. Each shard is independent. Each shard is an Apache lucene. An Elasticsearch index consists of one or more lucene index.

A shard has no predefined size. It grows as the document are added.

A shard may store up to about two million documents.

The purpose of sharding:

Store more data
Divide the index into smaller chunks that fit to use.
Enables queries and distribution across shards.

An index contains a single shard by default.

Replica:

What happens if a node’s hard drive fails? Data lost. Elasticsearch supports replication for fault tolerance.

Replication is configured at the index level. Replication works by creating copies of shards, referred to as replica shards. We call it a Replica shard.

In the single node, the replica is not enabled. Once we add a new Node replica shard enabled.

A typical 1 or 2 replica shard is enough. Elasticsearch supports snapshot — DB backup.

Apart from backup replica provide throughput in Query search.

Thank you for reading. Have a nice day!

Architecture of Elasticsearch

Node:

Cluster:

Sharding and scalability:

Replica:

Written by Edison Devadoss