Introduction to the Elastic stack

Published in

ilegra

4 min readNov 27, 2018

“silhouette of deer and mountain background” by Ian Froome on Unsplash

Hello there, in my first story I want to talk about the basic idea of the ELK Stack and the evolution of this one, the Elastic Stack, also I will do a kind of "Hello World" running all the stack with docker.

First of all, ELK is not just an animal that you can find in the north of the USA or in Asia, indeed, ELK, is a very powerful open source stack that combine:

Elasticsearch — search and analytics engine.
Logstash ‑ data processing pipeline.
Kibana ‑ dashboard to visualize data.

The Elastic Stack come to light in 2015 when the Beats (data shippers) was incorporated to the ELK stack.

During this text I will explain more about all components, but to summarize them all, I got an image from the main page of the elastic website to show you how the disposition is of all these projects together.

Kibana is an analytics and visualization platform that works with Elasticsearch. You are able to visualize and search all events from Elasticsearch, create pretty cool dashboards like this, configure index, templates and other stuff, using the devtools tab that helps you to run requests into Elasticsearch API.

Elasticsearch is the main product of this stack, it is responsible for indexing all data, making possible very fast searches. It uses an structure called inverted index, consist in splitting the content of the documents into words (terms), and each word has which documents have that word. For each word, it applies some normalizations to standardize the terms, these normalizations are called analysis.

The basics concepts of the elasticsearch:

Cluster: collection of nodes (servers) that contain all data and provide indexing and searching across all nodes.
Node: single server that is part of your cluster storing the data, and helping the cluster to indexing and searching.
Index: collection of documents that have similar characteristics.
Document: data that can be indexed.
Shard: subdivision of the index into multiples independents pieces (shards).
Replica: copy of a shard, this copy never stays in the same node of the original.

Logstash is a data collection engine that normalizes events using pipelines and sends them to a several destinations. In the pipeline you define the input data (beats, http, file), you normalize them using filters and send it to the output (Elasticsearch, file, email). You can use multiples pipelines, different sources and outputs easily using the logstash.

Beats are shippers that get and deliver data from distinct sources. It has seven different beats until now, one for metrics data, other for network data, serverlles shipper and so on. But here I want to talk about one specific Beat, the Filebeat, that collect events from files and send to specified output. There are three components in Filebeat, prospector, harvester and spooler. These components work together to tail files and send event data to the output, for example, Elasticsearch, Logstash, Kafka, Redis.

The prospector component is responsible for identifying the files to read based on pattern that you configured. The herverter is responsible for reading the files, it knows what was read or wasn't to continue the process if Filebeat stops. And the last component is the spooler that is responsible for combining all files's line after receive from herverter.

To put all together and run the "Hello World", it's pretty simple. I used docker-compose to run all elastic stack at once.

I created a gist in github (below) with all you need for run the stack. The docker-compose.yml define all docker images, sharing ports and copying the configurations into the docker images. The apache.conf is the pipeline for the Logstash, here I defined the input (beats), some filters and output (Elasticsearch). At the end, the filebeat.yml defining the input log file and the output of filebeat.

This is the necessary structure to successfully run this example:

./docker-compose.yml
./data/filebeat.yml
./data/apache.conf
./data/logs/

I extracted the logs.gz from here as a log file example inside the data/logs folder.

To sum up, the Logstash will ship the log file and send to Logstash, than the apache pipeline that we set up in Logstash will normalize the logs. For each log normalized, it will be sent to the Elasticsearch, that will index all data. The Kibana we will use to see the logs.

Now you need to run "docker-compose up", access Kibana, configure the Kibana index pattern to "logstash-*" and use the timestamp propertie for the time filter and be happy searching the logs. You can use the kibana docs for learn how to do this.

Before the end of this story, I suggest you to read the Elasticsearch’s documentation whenever you have doubts, each product has a pretty rich documentation with all information we need to know.

I hope the information given in this story has been useful and important for you and easy to understand.

Thank you for reading and have a good one,

Introduction to the Elastic stack

Written by Eduardo Rost