Elastic Stack Architecture
Hello everyone, This article will overview the Elasticsearch stack architecture.
What is Elastic Stack(ELK)?
The ELK is an open-source data analysis and management platform. You can search, analyze, filter, and visualize data in real-time by collecting any data from any source.
The ELK has 4 main components:
- Elasticsearch: A distributed Restful search engine that stores all collected data.
- Logstash: The data processing component of the Elastic Stack which sends incoming data to Elasticsearch.
- Kibana: A web interface for searching and visualizing logs.
- Beats: Lightweight, single-purpose data shippers that can send data from machines to either Logstash or Elasticsearch.
Elasticsearch Concepts
Cluster: A collection of one or more nodes (servers) that holds all your data together and provides unified indexing and search capabilities across all nodes.
Node: A single server that is part of your cluster, which stores your data and participates in the indexing and search capabilities of the cluster.
Elasticsearch nodes can be divided into master nodes, data nodes, client nodes, and coordinating nodes.
- Master Node: does not store any index/fragment. Its responsibility is to store the detailed cluster state and assist data and other nodes in searching for indexes/fragments of meta-data. A minimum of 3 is required, one active at any given time.
- Data nodes: can hold indexed data and perform data-related operations. Differentiated Hot and Warm Data nodes can be used.
- Hot Data Nodes: This particular data node performs all indexing within the cluster. They also keep the most recent indexes, as these usually tend to be queried most often.
- Warm Data Nodes: These types of data nodes are designed to handle large numbers of read-only indexes that are unlikely to be queried frequently.
- Cold Data Nodes: When the index enters the cold phase, ILM lowers the index priority once again to allow the hot and hot indices to recover first. It will then freeze the directory and move it to the cold nodes. Once this is done, it will wait 60 days to enter the deletion phase.
Index: A collection of documents that have somewhat similar characteristics. An index is an equivalent of a relational database.
Type: Equivalent to a database table or view. Lucene, on which Elasticsearch is built, has no concept of document types, so it is stored in a _type field.
Document: JSON objects stored within an Elasticsearch index and considered the base unit of storage.
Shard: A collection of documents. An index is stored in multiple distributed shards. There are two types of shards, Primary shards, and their copies, which are called Replicas.
Mapping: how a document and its fields are indexed and stored.
You can create a mapping with the following block.
PUT user_details
{
"mappings":{
"properties":{
"firstname": { "type":"text"}, "lastname": { "type":"text"},
"date":{ "type":"date"}, "callduration":{ "type":"double"}
}
}
}
Logstash Concepts
The Logstash event processing pipeline has three stages, that is, Inputs, Filters, and Outputs. A Logstash pipeline has two required elements, that is, input and output, and one option element known as filters:
Pipeline: It comprises data flow stages in Logstash from input to output. The input data is entered into the pipeline and is processed in the form of an event. Then sends to an output destination in the user or end system’s desirable format.
Input: This is the first stage in the Logstash pipeline, which is used to get the data in Logstash for further processing. Logstash offers various plugins to get data from different platforms. Some of the most commonly used plugins are — File, Syslog, Redis, and Beats.
Filter: This is the middle stage of Logstash, where events' actual processing occurs. A developer can use pre-defined Regex Patterns by Logstash to create sequences for differentiating between the fields in the events and criteria for accepted input events.
Output: This is the last stage in the Logstash pipeline, where the output events can be formatted into the structure required by the destination systems.
ELK Stack Architectural Recommendations
In the beginning, three master nodes are well resolved via best practices and recommendations in this case. The Master nodes and Kibana nodes can be just a normal server, and we don’t need to have a specific hardware component for a while. Data nodes should be separate ‘Hot’, ‘Warm’, and ‘Cold’ nodes. We can store data for one month on hot data nodes, data for six months can store on the warm data nodes, and data for six to twelve months can store on the cold data nodes. Also, we don’t need a client node for this case.
A minimum of two Logstash nodes are recommended for high availability. Beats should load balance across a group of Logstash nodes. It’s common to deploy just one Beats input per Logstash node, but multiple Beats inputs can also be deployed per Logstash node to expose independent endpoints for different data sources.
Today, I shared my knowledge of ELK Stack architecture.
Thanks for reading! See you in my next post.