Architecture of Elastic Search & Installation — Part 2

Prabhu Rajendran

Published in

Everything at Once

7 min readAug 11, 2019

Part 1 — Introduction to ElasticSearch here!

Nodes & Clusters:(Center of Elastic Search Architecture)

Node — is a server which stores a data and part of cluster.
Cluster — Collection of Nodes (i.e server) , each node contain a parts of cluster data , being data added to cluster.
Each Node involves in indexing, search capability of cluster (Node will participate in Given search query that data searches — Some data in Node A , Node B — matches across nodes for given query and return,later will see how data searched in multiple node in cluster) — every node supports indexing data,searching data and manipulating existing data also.

3. Every Node in Cluster will handle HTTP Request for client that want to send request to Cluster , Done by HTTP Rest API that cluster exposes.

4. Nodes will know about Every Node in transport layer, HTTP Layer will used for communicating clients.(will discuss later).

5.Each Node may also be assigned as Master Node by default.Master Node is Responsible for Coordinating Changes to cluster such as removing or adding Nodes ,adding or removing index and also updates states of cluster and only node do this.

6. Clusters are Nodes are identified by unique names, cluster default name is “elasticsearch” and nodes are by default “uuid” .by Default Node join cluster called elasticsearch.

7. Scalability is not an issue until with huge data.(will discuss later)

Now we have got Little about what is Node and cluster,lets jump to

Indices & Documents : A Each Data store in index (each item store in cluster) called document(JSON objects) — stored across all node in cluster.
How Documents organized in Cluster? — Indices (Index) is collection of Documents.

Documents are uniquely identified by “id” in lower case letter. (for searching,updating).

A word on types: not going to deep as no longer from elasticsearch.7.* versions, in case of type is mentioned in index of previous versions , here will be using “_doc” for clearance.

As Early mentioned Elastic Search is scalable due to distributed architecture ,its due to one of reason is “Sharding”, lets seem more about it.

Sharding — why its needed? suppose we have an index with lot of documents (1TB of data), we have 2 nodes in cluster 512KBytes to store data , while try to store we don’t have any space until splitting index data.While not able to store data in index were “Sharding comes into Picture”- Solves this problem by dividing indices into smaller pieces.

So, Shards will contains subsets of indexes data (full functional and independent index).

By splitting an index data into shards, these shards can be distributed across multiple node. An index data can then be stored even if any given node does not have enough disk capacity to hold the entire data set.

Great thing about Shards is hosted on any node in cluster,that being set an index in shard will not necessary to distributed in nodes.

Why Sharding is Important? Operations Can be distributed across multiple nodes,parallelized and completely transparent.

How do specify Shards Number in index? Be default while creating index 5 shards will be created,after creating index cannot be changed shards.

Now,we all know about what is sharding.let look something different know.

Replication (Copy of Shard) — Hardware can fail at anytime (capacity -hard drive breaking) , software can be buggy at times, Having kind of fault tolerant and fail over mechanism — were Replication comes into picture.

Replication Serves 2 purposes.

Provide High Availability when node or shards fails.(even if one fails)
Increasing Search Performance (Searches are executed parallel in replicas)meaning replicas are part of cluster searching capabilities.Default Cluster consists of more than one node will have primary 5 shards and 5 replicas.(total 10 shards per index).

Sum Up:

How does Elasticsearch ensure high availability? -Having replica shards means that if a node in a multi-node cluster fails, you will not lose any data. In that case you will still have a replica shard available on a different node.
What is a primary shard? — A Shard that has been Replicated.
What is a replica shard? — A Copy of Primary Shard.
What is a replication group? — A primary Shard’s Replicas and the primary shard itself.

If Shard is replicated 5 times , how replicas are updated? — How does elasticsearch keep everything in sink? — Primary Back Up (data replication), that means primary shards in replication groups acts as inter point for indexing operation.

Let say 5 shards replicated 5 times for instance , how replicas are updated whenever data changes or its removed , clearly replicas need to be sinked.

Adding , Updating, Removing Documents as sense in primary shards (validating operations and ensuring everything is good) and done in Primary Shard and changes will done in Parallel to Replicas By Routing.

How Searching for Data ? — Client communicates with elasticsearch clusters HTTP, then clusters does magic based in index mentioned in HTTP Request,when results are ready clusters response with results then client will be response.

Some how , we have clusters with 3 nodes distributed with 3 shards and replicas. Node that receives client request called coordinating-node, then broadcast to other nodes.

Distributing Documents across Shards ? By Routing — Elastic Search uses simple formula (hashing function of routing % primary shards in index).

Remember of index shards once created cannot be changes , taking routing formula into consideration then we can answer why we cannot change shards.
Problems will be when changing shards for index and introducing custom routing — will discuss more in depth in upcoming parts.

Hope we have cleared little basic knowledge depth in elastic search,now lets go for installation elastic search.

Lets do sign up and get 14 days trial pack from elastic cloud.

Running Elasticsearch & Kibana in Elastic Cloud

Launch New Deployment by selecting region and cloud platform.

Installing Elasticsearch on Mac/Linux :

Before Installing Elastic Search java has to be installed in machine.
From elastic.co download (zip file) of elastic search.
unzip file downloaded and start by going into directory bin/elasticsearch
Were we can curl bu curl http://localhost:9200 (9200 default port for elasticsearch) — Response JSON with cluster name elasticsearch will be displayed as informed above.

Configuration in Elastic Search (config/elasticsearch.yml)

Default Values already in elasticsearch.yml until no changes any common config needed.
Changing Option of Cluster name (default elasticsearch as earlier informed). If change required uncomment cluster.name

3. Likewise cluster , we can modify node name as well.(default name might pretty confusing if something goes wrong)

4. Network (Node listens for Request)

5. Discovery

a.Discovery Nodes — (list of nodes, that node we are configuring to try contact when starting node in existing cluster). When joining a cluster its try find a node from network address which provided (when adding more nodes it hard to maintain). It means to contact single node to retrieve current cluster state from node (it didn’t contact master node and joins the cluster).

b. Master Node — (Mess Up info — Will look in depth in new parts) — for local machines developing no changes required.

Installing Kibana on Mac/Linux

From elastic.co download (zip file) of kibana.
unzip file downloaded and start by going into directory bin/kibana (similar to elasticsearch)
Were we can curl bu curl http://localhost:5601 (5601 default port for kibana).- By default it will try connect elastic search 9200 port.