ELK Stack Deep Dive: Complete Guide Using Docker

31 min readAug 2, 2024

INTRODUCTION

Ready to revolutionize your log management and data analysis?

Discover the power of the ELK stack in this comprehensive guide and learn how to effortlessly set up Elasticsearch, Logstash, and Kibana using Docker.

This project involves several steps to set up the ELK stack:

This project consists of setting up the ELK stack in several steps: First, create configuration files including the .env file and Makefile. Next, set up the setup container to generate SSL certificates. Then, configure and deploy the Elasticsearch container, the Kibana container, and the Logstash container. Finally, orchestrate all components using Docker Compose to ensure seamless deployment and management.

In this project, we won't include any Beats for simplicity. Using ELK without Beats is effective in smaller or more controlled environments where direct ingestion is practical. However, for larger, distributed environments or when collecting data from diverse sources, using Beats is recommended for better scalability and reliability.

The main parts of the article

1-Project Overview

Introduction to the ELK Stack

Imagine you are running a rapidly growing e-commerce platform. Each day, the number of users increases, and so does the volume of data your applications generate. From user interactions and transactions to system logs and error reports, the sheer amount of data can quickly become overwhelming. To stay ahead of potential issues and make data-driven decisions, you need a robust solution to collect, analyze, and visualize all this data in real time. Enter the ELK stack.

The ELK stack, which consists of Elasticsearch, Logstash, and Kibana, is a powerful set of open-source tools (until 2021) designed to address the challenges of managing large volumes of log and event data.

Elasticsearch: This is the core of the stack, acting as a distributed, RESTful search and analytics engine capable of storing and searching large amounts of data rapidly. It functions like a high-powered database specifically tailored for search and analytics, enabling users to query logs in milliseconds and facilitating immediate identification of issues and trends.

Logstash: Acting as the data processing pipeline, Logstash can ingest data from various sources, transform it, and then send it to Elasticsearch. It functions like a skilled plumber, capable of handling diverse data sources and formats, ensuring smooth data flow from its origin to its destination.

Kibana: Serving as the visualization layer of the stack, Kibana enables users to explore and visualize data stored in Elasticsearch. With its powerful dashboard and visualization capabilities, Kibana transforms complex data into actionable insights. It provides a dynamic control panel where users can monitor and analyze data in real time, creating beautiful visualizations and dashboards with ease.

The Need for the ELK Stack:

In the e-commerce platform example, as your user base grows, so does the complexity of your infrastructure. You have multiple servers generating logs with critical information. Manually sifting through these logs is not only tedious but also impractical. With the ELK stack, you can centralize your logs in Elasticsearch, process and transform them with Logstash, and visualize the data with Kibana. This enables your team to quickly identify the root cause of performance issues and set up alerts to notify you of potential problems, ensuring a seamless experience for your customers.

Relationship Between Elasticsearch, Kibana, and Logstash:

Imagine a bustling city with a sophisticated transportation system to keep everything running smoothly. In this city, Logstash functions like a team of expert couriers. Its main responsibility is to gather information packets, or logs, from various sources spread throughout the city. These logs come from different applications and systems. Once collected, the couriers securely deliver these packets to a central processing hub, Logstash itself. At the hub, Logstash diligently processes each packet, examining and enhancing the data with additional context, ensuring everything is accurate and complete before sending it to its next destination.

The next destination is a vast, intelligent library called Elasticsearch. This library stores the information in an organized and easily searchable manner, allowing citizens to quickly find the information they need. But a library needs a user-friendly interface to make sense of all the stored knowledge, and that’s where Kibana comes in.

Kibana is the city’s sleek, interactive dashboard that visualizes the data stored in Elasticsearch. It transforms raw data into meaningful insights, displaying trends, patterns, and anomalies that help the city’s administrators make informed decisions.

Together, Logstash, Elasticsearch, and Kibana create a seamless ecosystem where data flows efficiently from collection to visualization, ensuring the city remains well-informed and operates at its best.

Project Structure:

.
├── .env //Environment variables file.
├── Makefile //Automation commands for Docker operations.
├── nginx
│   └── access.log //The log file is where Logstash will read its input.
├── docker-compose.yml //Defines the services and configurations for Docker-Compose.
├── elasticsearch // Configuration and Dockerfile for Elasticsearch.
│   ├── Dockerfile
│   └── conf
│       └── elasticsearch.yml
├── kibana //Configuration and Dockerfile for Kibana.
│   ├── Dockerfile
│   └── conf
│       └── kibana.yml
├── logstash //Configuration, pipeline, and Dockerfile for Logstash.
│   ├── Dockerfile
│   ├── conf
│   │   └── logstash.yml
│   └── pipeline
│       └── logstash.conf
└── setup //Setup script and Dockerfile for initial configurations
    ├── Dockerfile
    └── setup.sh

12 directories, 17 files

In the following sections, we’ll dive deeper into each component, explaining their configurations and how they work together to form a cohesive logging and visualization solution. By the end of this guide, you’ll have a solid understanding of the ELK stack and how to deploy it using Docker, empowering you to harness the full potential of your data.

2-Setting Up the Environment

Before we dive into the details of the ELK stack components, it’s essential to set up the environment correctly. This section will guide you through the prerequisites and configurations needed to get everything up and running smoothly.

Prerequisites:

To follow this guide, you’ll need to have the following tools installed on your system:

Docker: Docker is a platform that enables you to develop, ship, and run applications in containers. Containers are lightweight and portable, and ensure that your application runs consistently across different environments.
Docker Compose: Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you can use a YAML file to configure your application’s services, making it easy to manage and orchestrate the components of the ELK stack.

For more information, you can check my blog about virtualization technology where I make a deep dive into Docker and Docker Compose.

Unveiling [42 The Network-Inception]: A Dive into Docker and Docker-Compose

medium.com

Configuration Files:

To manage environment variables and automate Docker commands, we’ll use a .env file and a Makefile.

The .env file is used to store environment variables that can be used throughout the project. This makes it easy to manage configuration settings and change them as needed without modifying the code.

# Elasticsearch password
ELASTIC_PASSWORD=changeme

# Kibana password
KIBANA_PASSWORD=changeme

# memory limit for elasticsearch and kibana
MEM_LIMIT=1g

It’s crucial to set memory limits for Elasticsearch and Kibana to ensure stable performance and effective resource management. Without limits, they could consume excessive system memory, leading to slowdowns or crashes. By controlling their memory usage, we maintain a balanced system environment and achieve predictable resource allocation.

The Makefile contains a set of commands to automate common Docker operations, making it easier to manage your Docker containers. Create an Makefile in the root of your project directory with the following content:

all: up # Default target

up: build # Start the containers
 docker-compose -f docker-compose.yml up -d

down: # Stop and remove the containers
 docker-compose -f docker-compose.yml down

stop: # Stop the containers
 docker-compose -f docker-compose.yml stop

start: # Start the containers
 docker-compose -f docker-compose.yml start

build: # Build the containers from the Dockerfiles and the docker-compose.yml file
 clear
 docker-compose -f docker-compose.yml build

clean: stop # Stop and remove the containers, images, volumes, and networks
 @docker stop $$(docker ps -qa) || true
 @docker rm $$(docker ps -qa) || true
 @docker rmi -f $$(docker images -qa) || true
 @docker volume rm $$(docker volume ls -q) || true
 @docker network rm $$(docker network ls -q) || true

re: clean up # Rebuild the containers

prune: clean # Remove all unused containers, networks, images (both dangling and unreferenced), and optionally, volumes
 @docker system prune -a --volumes -f

.PHONY: all up down stop start build clean re prune

3-Elasticsearch

Elasticsearch is the core component of the ELK stack and is responsible for storing, searching, and analyzing large volumes of data. To ensure secure communication and data integrity, we must set up SSL certificates and configure security settings. This section will guide you through the process of setting up Elasticsearch with Docker, focusing on generating and managing the necessary certificates.

Setup Container for Certificates

The setup container is responsible for generating SSL certificates and setting up initial security configurations for Elasticsearch and other components. This container runs a script (setup.sh) that automates the creation of certificates and configures the security settings.

So, to clarify, The setup container’s primary role is to create the necessary SSL certificates for Elasticsearch, and Kibana. It is not the main Elasticsearch container. Once the certificates are generated and the initial security configurations are set up, the setup container’s job is done.

First things first, let's start with the setup container.

The Dockerfile for the setup container uses the Elasticsearch image and copies the setup script into the container. setup/Dockerfile

# pull the base image 
FROM docker.elastic.co/elasticsearch/elasticsearch:8.14.3

# copy the setup script
COPY ./setup.sh /usr/share/elasticsearch/setup.sh

# switch to root user
USER root

# change the permission of the setup script
RUN chmod +x /usr/share/elasticsearch/setup.sh

# define the command to run the setup script once the container is started
CMD ["/bin/bash", "/usr/share/elasticsearch/setup.sh"]

The setup script creates the necessary SSL certificates and configures initial security settings. It also sets passwords for the Elasticsearch and Kibana users. setup/setup.sh

For an easier explanation of the script, I divide it into seven parts, and in the next section, I will explain each part in detail.

#!/bin/bash
########## part 1 ##########
if [ -z "${ELASTIC_PASSWORD}" ] || [ -z "${KIBANA_PASSWORD}" ]; then
    echo "Set the ELASTIC_PASSWORD and KIBANA_PASSWORD environment variables in the .env file"
    exit 1
fi

########## part 2 ##########
if [ ! -f config/certs/ca.zip ]; then
    echo "Creating CA"
    bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip
    unzip config/certs/ca.zip -d config/certs
fi

########## part 3 ##########
if [ ! -f config/certs/certs.zip ]; then
    echo "Creating certs"
    cat > config/certs/instances.yml <<EOL
instances:
  - name: elasticsearch
    dns:
      - elasticsearch
      - localhost
    ip:
      - 127.0.0.1
  - name: kibana
    dns:
      - kibana
      - localhost
    ip:
      - 127.0.0.1
  - name: logstash
    dns:
      - logstash
      - localhost
    ip:
      - 127.0.0.1
EOL
    echo "Creating certs for elasticsearch, kibana, and logstash"
    bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in \
      config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key
    unzip config/certs/certs.zip -d config/certs
fi

########## part 4 ##########
echo "Setting file permissions"
chown -R root:root config/certs
find config/certs -type d -exec chmod 750 {} \;
find config/certs -type f -exec chmod 640 {} \;

########## part 5 ##########
echo "Waiting for Elasticsearch availability"
until curl -s --cacert config/certs/ca/ca.crt https://elasticsearch:9200 \
  | grep -q "missing authentication credentials"; do sleep 5; done

########## part 6 ##########
echo "stting Kiabana password"
until curl -s \
  -X POST https://elasticsearch:9200/_security/user/kibana_system/_password \
  --cacert config/certs/ca/ca.crt \
  -u "elastic:${ELASTIC_PASSWORD}" \
  -H "Content-Type: application/json" \
  -d "{\"password\":\"${KIBANA_PASSWORD}\"}" \
  | grep -q "^{}"; do sleep 5; done

########## part 7 ##########
echo "All done!, removing the setup container"
CONTAINER_ID=$(hostname)
curl --unix-socket /var/run/docker.sock -X DELETE "http://localhost/containers/${CONTAINER_ID}?force=true"

Part 1:

#!/bin/bash

# Ensure that the ELASTIC_PASSWORD and KIBANA_PASSWORD environment variables are set.
if [ -z "${ELASTIC_PASSWORD}" ] || [ -z "${KIBANA_PASSWORD}" ]; then
    echo "Set the ELASTIC_PASSWORD and KIBANA_PASSWORD environment variables in the .env file"
    exit 1
fi

This section checks if the ELASTIC_PASSWORD and KIBANA_PASSWORD environment variables are set. If they are not set, the script exits with an error message. These variables are essential for setting up security credentials for Elasticsearch and Kibana.

Part 2:

# Create the CA (Certificate Authority) if it doesn't already exist.
if [ ! -f config/certs/ca.zip ]; then
    echo "Creating CA"
    bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip
    unzip config/certs/ca.zip -d config/certs
fi

This section creates a Certificate Authority (CA) if it doesn’t already exist. The elasticsearch-certutil ca command generates a CA certificate and key, which are then extracted to the config/certs directory.

This flag --silent tells the utility to run in silent mode, meaning it will not prompt for user interaction or display detailed output.

This flag--pem specifies the output format of the certificates. pem stands for Privacy Enhanced Mail, a Base64 encoded format that is commonly used for storing and transmitting cryptographic keys, certificates, and other data.

Part 3:

# Create the certificates for Elasticsearch, Kibana, and Logstash, if they don't already exist.
if [ ! -f config/certs/certs.zip ]; then
    echo "Creating certs"
    cat > config/certs/instances.yml <<EOL
instances:
  - name: elasticsearch
    dns:
      - elasticsearch
      - localhost
    ip:
      - 127.0.0.1
  - name: kibana
    dns:
      - kibana
      - localhost
    ip:
      - 127.0.0.1
  - name: logstash
    dns:
      - logstash
      - localhost
    ip:
      - 127.0.0.1
EOL
    echo "Creating certs for elasticsearch, kibana, and logstash"
    bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in \
      config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key
    unzip config/certs/certs.zip -d config/certs
fi

This section creates SSL certificates for Elasticsearch, Kibana, and Logstash if they don’t already exist. It generates a instances.yml file that lists the instances for which certificates are to be created, specifying their DNS names and IP addresses. The elasticsearch-certutil cert command uses this file, along with the CA certificate and key, to create the SSL certificates. The certificates are then extracted to the config/certs directory.

The instances.yml file is a configuration file used by the elasticsearch-certutil cert command to specify the details of the instances for which SSL certificates need to be created. This file includes information such as the instance names, DNS names, and IP addresses. The format and content of this file are crucial because it tells the elasticsearch-certutil tool exactly what certificates to generate and for which instances.

instances: The root element in the YAML file that lists all the instances for which SSL certificates will be generated.

-name: Specifies the name of the instance. This name is used internally and helps identify the certificates being created. In this example, the instances are elasticsearch, kibana, and logstash.

dns: A list of DNS names associated with the instance. The SSL certificate generated for this instance will be valid for these DNS names. This ensures that when clients access the instance using any of these DNS names, the certificate will be recognized as valid.

For example, elasticsearch and localhost are DNS names for the Elasticsearch instance.

ip: A list of IP addresses associated with the instance. The SSL certificate generated for this instance will also be valid for these IP addresses. This is particularly useful if clients access the instance using its IP address.

In this example, 127.0.0.1 is the IP address for each instance.

The command to generate the certificates uses the instances.yml file, the CA certificate, and the CA key:

bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key

--silent: Runs the command without user interaction.
--pem: Specifies the output format as PEM.
-out config/certs/certs.zip: Specifies the output file for the generated certificates.
--in config/certs/instances.yml: Specifies the input file containing instance details.
--ca-cert config/certs/ca/ca.crt: Specifies the CA certificate to use for signing the instance certificates.
--ca-key config/certs/ca/ca.key: Specifies the CA key to use for signing the instance certificates.

Part 4:

# Set file permissions for the certificates.
echo "Setting file permissions"
chown -R root:root config/certs
find config/certs -type d -exec chmod 750 {} \;
find config/certs -type f -exec chmod 640 {} \;

This section sets the appropriate file permissions for the certificate files. The chown command changes the ownership of the config/certs directory and its contents to the root user. The find commands set the directory permissions to 750 and the file permissions to 640, ensuring that only authorized users can access the certificates.

Part 5:

# Wait until Elasticsearch is available.
echo "Waiting for Elasticsearch availability"
until curl -s --cacert config/certs/ca/ca.crt https://elasticsearch:9200 \
  | grep -q "missing authentication credentials"; do sleep 5; done

This section waits for Elasticsearch to become available. It uses a curl command to send a request to Elasticsearch, checking if the response contains the phrase "missing authentication credentials." This indicates that Elasticsearch is up and running but requires authentication. The script retries every 5 seconds until it gets the expected response. this part is crucial before setting the password for the kibana_system user in Elasticsearch. This operation requires Elasticsearch to be fully up and running because it directly modifies Elasticsearch’s security settings.

Part 6:

# Set the password for the Kibana system user.
echo "stting Kiabana password"
until curl -s \
  -X POST https://elasticsearch:9200/_security/user/kibana_system/_password \
  --cacert config/certs/ca/ca.crt \
  -u "elastic:${ELASTIC_PASSWORD}" \
  -H "Content-Type: application/json" \
  -d "{\"password\":\"${KIBANA_PASSWORD}\"}" \
  | grep -q "^{}"; do sleep 5; done

This section sets the password for the kibana_system user in Elasticsearch. It uses a curl command to send a POST request to Elasticsearch's security API, authenticating with the elastic user and the password specified in the ELASTIC_PASSWORD environment variable. The new password for the kibana_system user is set to the value specified in the KIBANA_PASSWORD environment variable. The script retries every 5 seconds until it gets an empty JSON response ({}), indicating success.

Part 7:

# Remove the setup container.
echo "All done!, removing the setup container"
CONTAINER_ID=$(hostname)
curl --unix-socket /var/run/docker.sock \
  -X DELETE "http://localhost/containers/${CONTAINER_ID}?force=true"

This final section removes the setup container after the script completes all tasks. It retrieves the container ID using the hostname command and sends a DELETE request to the Docker socket to forcefully remove the container.

Now that we have the SSL certificates and security configurations set up, let’s focus on configuring and running the Elasticsearch container securely and efficiently within our Docker environment.

if you have any issues with understanding the basics of SSL/TLS encryption, how it works, and why we use it, please check out this part of my blog where I fully explain how it works.

Core Components of Elasticsearch:

Cluster:

Definition: A cluster is a collection of one or more nodes (servers) that together hold all the data and provide federated indexing and search capabilities across all nodes.
Purpose: Clusters allow Elasticsearch to scale horizontally by adding more nodes to distribute the data and workload.
How It Works: Each cluster has a unique name, and all nodes in the cluster share the same cluster name. The cluster’s health and state are managed centrally, allowing for distributed operations across nodes.

Node:

Definition: A node is a single server that is part of a cluster holds data and participates in the cluster’s indexing and search operations.
Purpose: Nodes can be configured to perform specific roles within the cluster, such as master node, data node, or coordinating node.
How It Works: When a node is started, it joins a cluster and shares the workload. The node can store data, manage search requests, or perform other roles depending on its configuration.

Index:

Definition: An index is a collection of documents that have similar characteristics.
Purpose: Indexes are used to organize data within Elasticsearch. Each index is identified by a name, which is used to reference the data during indexing, search, and management operations.
How It Works: An index is made up of one or more shards, and the data within an index is spread across these shards.

Shard:

Definition: A shard is a single Lucene index that contains a portion of the data in an index.
Purpose: Shards allow Elasticsearch to horizontally scale by splitting data across multiple nodes. They also provide redundancy and fault tolerance.
How It Works: Each index is divided into several primary shards, and each shard can have one or more replica shards. Primary shards hold the original data, while replica shards are copies that provide failover in case of a primary shard failure.

Replicas:

Definition: Replicas are copies of primary shards.
Purpose: Replicas provide redundancy and improve search performance by allowing queries to be executed on replica shards as well as primary shards.
How It Works: Each primary shard can have zero or more replicas, and these replicas are stored on different nodes within the cluster to ensure high availability.

Document:

Definition: A document is the basic unit of information that can be indexed in Elasticsearch. It is represented in JSON format.
Purpose: Documents are the data entries that you store in Elasticsearch. Each document belongs to an index and is identified by a unique ID.
How It Works: Documents are automatically stored in the appropriate shard within an index, based on the document’s unique ID.

The fast search capabilities of Elasticsearch are primarily attributed to its distributed architecture and the utilization of inverted indices.

When querying a term, Elasticsearch does not need to scan through every document. Instead, it searches for the term in an index, similar to the index in a book, which directly points to the locations where the term appears.
Furthermore, Elasticsearch distributes data across multiple nodes, enabling it to process and return results in parallel. This significantly reduces search time even at scale.

how do we ensure the search results are not just fast, but also highly relevant? Let's understand how a search engine works.

Search Relevance: Why It Matters?

It can be frustrating when the search results are not what you expected. This is where search relevance becomes important. Relevance is about how well the search results match the user's intent. But how do we measure this relevance?

When we search for something …

True Positives: Relevant documents correctly retrieved by the search engine.
False Positives: Irrelevant documents mistakenly retrieved by the search engine.
True Negatives: Irrelevant documents correctly rejected by the search engine.
False Negatives: Relevant documents mistakenly not retrieved by the search engine.

Measuring Relevance: Precision and Recall

Precision: Precision refers to the accuracy of the search results retrieved by Elasticsearch. It is calculated as the ratio of true positives (relevant documents retrieved) to the total number of documents retrieved (true positives + false positives). In simple terms, precision tells us what portion of the retrieved documents are relevant to the query.

Recall: Recall, on the other hand, measures the completeness of the search results. It is the ratio of true positives to the total number of relevant documents available (true positives + false negatives). Recall indicates what portion of the relevant documents were returned in the search results.

These two metrics are inversely related, meaning that improving one may hurt the other. Precision focuses on accuracy, even if it results in retrieving fewer results, while recall focuses on retrieving as many relevant results as possible, even if some irrelevant ones are included.

The Trade-off Between Precision and Recall

Balancing precision and recall is the challenge in search relevance. The goal is for the search engine to return results that are both relevant and comprehensive. However, if you focus too much on precision, you might get fewer relevant results (lower recall), and if you emphasize recall, you might get more irrelevant results (lower precision). Finding the right balance requires fine-tuning based on your specific use case.

Scoring and Ranking in Elasticsearch

When it comes to search results, precision and recall decide which documents are included, but they don't determine the order in which these results are displayed. This is where Elasticsearch's scoring algorithm comes in. Elasticsearch assigns a score to each document based on its relevance to the search query. Documents with higher scores are considered more relevant and are given higher ranks in the search results.

Term Frequency and Inverse Document Frequency

Elasticsearch computes the score by taking into account various factors, with term frequency (TF) and inverse document frequency (IDF) as two crucial components.

Term Frequency (TF): In the search query (how to form good habits), let’s focus on the term habits Suppose Elasticsearch retrieves two documents: in the first, habits appears four times, while in the second, it appears only once. The term frequency (TF) is higher in the first document, so Elasticsearch considers it more relevant to the query based on this term alone.
Inverse Document Frequency (IDF): The query also contains common terms like how, to, and form, which likely appear in many documents across the index. These common terms are less useful for determining relevance, so Elasticsearch reduces their impact using inverse document frequency (IDF). However, the term habits is more unique and appears in fewer documents, so it’s weighted more heavily, helping ensure that search results are more relevant to the query.

By combining TF and IDF, Elasticsearch ensures that documents with important and relevant terms are ranked higher, while those with common, less relevant terms are ranked lower.

elasticsearch/Dockerfile :

# pull the official image
FROM docker.elastic.co/elasticsearch/elasticsearch:8.14.3

# copy the elasticsearch configuration file
COPY ./conf/elasticsearch.yml /usr/share/elasticsearch/config/elasticsearch.yml

elasticsearch/conf/elasticsearch.yml :

cluster.name: docker-cluster
node.name: elasticsearch
network.host: 0.0.0.0
bootstrap.memory_lock: true
cluster.initial_master_nodes: elasticsearch
xpack.security.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.key: certs/elasticsearch/elasticsearch.key
xpack.security.http.ssl.certificate: certs/elasticsearch/elasticsearch.crt
xpack.security.http.ssl.certificate_authorities: certs/ca/ca.crt
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.key: certs/elasticsearch/elasticsearch.key
xpack.security.transport.ssl.certificate: certs/elasticsearch/elasticsearch.crt
xpack.security.transport.ssl.certificate_authorities: certs/ca/ca.crt
xpack.security.transport.ssl.verification_mode: certificate
xpack.license.self_generated.type: basic

The elasticsearch.yml file is a crucial part of the Elasticsearch setup. It contains configuration settings that define the behavior of the Elasticsearch node. Here’s a detailed explanation of each configuration setting:

cluster.name: docker-cluster

Purpose: Sets the name of the Elasticsearch cluster.
Explanation: This configuration ensures that the node is part of a cluster named “docker-cluster”. All nodes with the same cluster.name will join the same cluster.

node.name: elasticsearch

Purpose: Sets the name of the Elasticsearch node.
Explanation: This is an identifier for the node within the cluster, making it easier to manage and monitor.

network.host: 0.0.0.0

Purpose: Binds the network interface.
Explanation: By setting network.host to 0.0.0.0, Elasticsearch will listen on all available network interfaces, making it accessible from outside the container.

bootstrap.memory_lock: true

Purpose: Locks the memory used by Elasticsearch.
Explanation: This setting prevents the Elasticsearch process from swapping memory to disk, which can significantly improve performance and stability.

cluster.initial_master_nodes: elasticsearch

Purpose: Defines the initial master nodes.
Explanation: This setting is used for cluster bootstrapping. It specifies the nodes that are eligible to become master nodes during the initial cluster formation. In this case, the single node elasticsearch is specified.

xpack.security.enabled: true

Purpose: Enables the X-Pack security features.
Explanation: X-Pack provides security features such as authentication, authorization, and encryption. Enabling it ensures that these features are active.

xpack.security.http.ssl.enabled: true

Purpose: Enables SSL/TLS for HTTP communication.
Explanation: This setting ensures that all HTTP communication to and from Elasticsearch is encrypted, enhancing security.

xpack.security.http.ssl.key: certs/elasticsearch/elasticsearch.key
xpack.security.http.ssl.certificate: certs/elasticsearch/elasticsearch.crt
xpack.security.http.ssl.certificate_authorities: certs/ca/ca.crt

Purpose: Specify the SSL/TLS key and certificate for HTTP.
Explanation: These settings point to the SSL/TLS key, certificate, and CA certificate files used to encrypt HTTP traffic. The key and certificate are specific to the Elasticsearch node, while the CA certificate is used to verify client certificates.

xpack.security.transport.ssl.enabled: true

Purpose: Enables SSL/TLS for transport communication.
Explanation: This setting ensures that all communication between Elasticsearch nodes (transport layer) is encrypted.

xpack.security.transport.ssl.key: certs/elasticsearch/elasticsearch.key
xpack.security.transport.ssl.certificate: certs/elasticsearch/elasticsearch.crt
xpack.security.transport.ssl.certificate_authorities: certs/ca/ca.crt
xpack.security.transport.ssl.verification_mode: certificate

Purpose: Specify the SSL/TLS key and certificate for transport.
Explanation: These settings point to the SSL/TLS key, certificate, and CA certificate files used to encrypt transport traffic. The verification_mode set to certificate ensures that only certificates signed by the CA are trusted.

xpack.license.self_generated.type: basic

Purpose: Specify the license type.
Explanation: Elasticsearch offers various license types with different features. The basic license includes essential features and is free. This setting specifies that the basic license should be used.

what exactly X-pack is?:

X-Pack is an extension to the Elastic Stack that provides additional features for security, monitoring, alerting, reporting, and more. It supports TLS/SSL encryption for securing communications between nodes in an Elasticsearch cluster, and between Elasticsearch and clients like Kibana and Logstash. Initially, X-Pack was a separate package, but many of its features have since been integrated directly into the Elastic Stack, especially in recent versions.

After setting up Elasticsearch securely, the next step is to configure and deploy the Kibana container, which provides a robust interface for visualizing and interacting with Elasticsearch data.

4-Kibana

The Kibana server is a Node.js application that communicates with the Elasticsearch cluster. The Kibana interface is a web-based user interface that is served to users’ browsers. This interface provides tools for creating and managing visualizations, dashboards, and reports. It includes various apps and tools such as Discover, Visualize, Dashboard, and Management. Kibana stores visualizations, dashboards, index patterns, and other configurations as saved objects in a dedicated Elasticsearch index, usually named .kibana. This setup allows users to create, share, and manage their custom visualizations and dashboards. These saved objects can be exported and imported, making it easy to share and back up Kibana configurations.

kibana/Dockerfile :

# pull the official kibana image
FROM docker.elastic.co/kibana/kibana:8.14.3

# copy the kibana.yml file to the container
COPY ./conf/kibana.yml /usr/share/kibana/config/kibana.yml

kibana/conf/kibana.yml :

server.name: "kibana"
server.host: "kibana"
server.ssl.enabled: true
server.ssl.certificate: config/certs/ca/ca.crt
server.ssl.key: config/certs/ca/ca.key
monitoring.ui.container.elasticsearch.enabled: true
elasticsearch.hosts: ["https://elasticsearch:9200"]
elasticsearch.username: "kibana_system"
elasticsearch.password: "${KIBANA_PASSWORD}"
elasticsearch.ssl.certificateAuthorities: ["config/certs/ca/ca.crt"]

The kibana.yml file contains configuration settings for Kibana, a powerful visualization tool for Elasticsearch data. Here's a detailed explanation of each configuration setting:

server.name: "kibana"

Purpose: Sets the name of the Kibana server.
Explanation: This name is used for logging and for identifying the Kibana instance in a cluster.

server.host: "kibana"

Purpose: Binds the Kibana server to the specified host.
Explanation: Setting server.host to "kibana" ensures that Kibana is accessible via the hostname kibana. This should match the service name used in the Docker setup or the hostname in your network.

server.ssl.enabled: true

Purpose: Enables SSL/TLS for the Kibana server.
Explanation: This setting ensures that all HTTP communication to and from Kibana is encrypted, enhancing security.

server.ssl.certificate: config/certs/ca/ca.crt
server.ssl.key: config/certs/ca/ca.key

Purpose: Specify the SSL/TLS certificate and key for the Kibana server.
Explanation: These settings point to the certificate and key files used to encrypt HTTP traffic to and from Kibana. In this case, the certificate and key are from the Certificate Authority (CA) generated during the setup process.

monitoring.ui.container.elasticsearch.enabled: true

Purpose: Enables monitoring of the Elasticsearch container.
Explanation: This setting allows Kibana to collect and display monitoring data for the Elasticsearch container, providing insights into the performance and health of the Elasticsearch cluster.

elasticsearch.hosts: ["https://elasticsearch:9200"]

Purpose: Specify the URL of the Elasticsearch instance that Kibana should connect to.
Explanation: This setting points to the Elasticsearch instance running at https://elasticsearch:9200. The https protocol indicates that SSL/TLS is enabled for secure communication.

elasticsearch.username: "kibana_system"
elasticsearch.password: "${KIBANA_PASSWORD}"

Purpose: Specify the username and password for Kibana to authenticate with Elasticsearch.
Explanation: The kibana_system user is a built-in user in Elasticsearch with the necessary privileges for Kibana to interact with Elasticsearch. The password is set using an environment variable (KIBANA_PASSWORD), which should be securely managed and provided at runtime.

elasticsearch.ssl.certificateAuthorities: ["config/certs/ca/ca.crt"]

Purpose: Specify the Certificate Authority (CA) certificate used to verify the Elasticsearch SSL/TLS certificate.
Explanation: This setting points to the CA certificate that was used to sign the Elasticsearch SSL/TLS certificate. Kibana uses this CA certificate to verify the authenticity of the SSL/TLS certificate presented by Elasticsearch, ensuring secure communication.

After configuring Kibana for data visualization, the next step is to set up Logstash.

Logstash acts as a powerful data processing pipeline, ingesting data from various sources, transforming it, and sending it to Elasticsearch. This guide will help you configure and deploy the Logstash container.

5-logstash

Logstash is a stream-based data processing pipeline that ingests data from various sources, transforms it, and then sends it to a destination like Elasticsearch.

Core Components and Structure of Logstash:

Inputs:

Inputs are responsible for ingesting data into Logstash. They can pull data from a variety of sources, including files, syslog, HTTP, beats, Kafka, JDBC, and more.
Each input plugin is configured to collect data from its source and pass it into the Logstash pipeline for further processing.

Filters:

Filters are used to process and transform the data within the Logstash pipeline. They can modify, enrich, and structure the data as needed.
Common filter plugins include grok for parsing unstructured data, date for parsing dates, mutate for renaming, removing, or modifying fields, and geoip for adding geographical location information based on IP addresses.

Outputs:

Outputs are the destinations where the processed data is sent after passing through the Logstash pipeline. They define where the data should be stored or forwarded.
Common output plugins include Elasticsearch, file, HTTP, Kafka, and others. Each output plugin is configured to send the processed data to its designated destination.

Codecs:

Codecs are used to encode or decode the data as it is being processed by the inputs and outputs. They help in serializing or deserializing the data format.
Examples of codecs include JSON, plain text, and multiline codecs. These codecs can be used within input and output plugins to handle specific data formats.

Logstash operates in real-time, continuously ingesting and processing data as it arrives. This enables near-instantaneous data transformation and forwarding, making it suitable for real-time analytics and monitoring.

Logstash can indeed be considered an ETL (Extract, Transform, Load) pipeline, but it operates within the context of real-time, stream-based data processing rather than traditional batch ETL processes.

logstash/Dockerfile :

# pull the official logstash image
FROM docker.elastic.co/logstash/logstash:8.14.3

# set the user to root
USER root

# copy the logstash configuration file
COPY ./conf/logstash.yml /usr/share/logstash/config/logstash.yml

# copy the pipeline configuration file
COPY ./pipeline /usr/share/logstash/pipeline

logstash/conf/logstash.yml :

http.host: "logstash"
http.port: 9600
xpack.monitoring.enabled: false

The logstash.yml file contains configuration settings for Logstash, which is responsible for ingesting, processing, and forwarding data to Elasticsearch. Here's a detailed explanation of each configuration setting:

http.host: "logstash"

Purpose: Sets the HTTP host for Logstash’s internal HTTP server.
Explanation: This configuration binds the Logstash HTTP server to the hostname logstash. This should match the service name used in the Docker setup or the hostname in your network. The HTTP server is used for various internal purposes, such as monitoring and API endpoints.

http.port: 9600

Purpose: Sets the HTTP port for Logstash’s internal HTTP server.
Explanation: This configuration binds the Logstash HTTP server to port 9600. This port is used for accessing Logstash's monitoring and API endpoints. You can use this port to interact with Logstash's API for status and health checks.

xpack.monitoring.enabled: false

Purpose: Disables X-Pack monitoring for Logstash.
Explanation: X-Pack monitoring is a feature that allows you to collect and visualize metrics from your Logstash instances. Setting this to false disables X-Pack monitoring. If you want to monitor Logstash using X-Pack, you can set this to true and configure the necessary settings for Elasticsearch and Kibana integration.

logstash/pipeline/logstash.conf :

input {
  file {
    path => "/usr/share/logstash/nginx/access.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
      remove_field => ["message"]
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
      target => "@timestamp"
    }
}

output {
  elasticsearch {
    hosts => ["https://elasticsearch:9200"]
    user => "elastic"
    password => "${ELASTIC_PASSWORD}"
    index => "nginx-logs-%{+YYYY.MM.dd}"
    ssl_enabled => true
    ssl_verification_mode => "full"
    ssl_certificate_authorities => ["/usr/share/logstash/certs/ca/ca.crt"]
  }
  stdout { codec => rubydebug }
}

Input Section:

input {
  file {
    path => "/usr/share/logstash/nginx/access.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

path: Specifies the location of the log file. Here, it's set to ("/usr/share/logstash/nginx/access.log"), which means Logstash will monitor this file for new entries. It can be any Nginx or Apache log file, like the one from my inception project.
start_position: Controls where Logstash starts reading the file. ("beginning") means Logstash reads from the start of the file, which is useful when you want to process all existing log data. The other option is ("end"), which would start reading only new log entries appended after Logstash starts.
sincedb_path: Logstash uses a sincedb file to keep track of where it last read from the log file. Setting this to "/dev/null" disables this tracking, so Logstash reads the file from the beginning every time it starts. This is useful for testing or when you want to ensure all data is reprocessed on every run.

Filter Section

filter {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
      remove_field => ["message"]
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
      target => "@timestamp"
    }
}

grok filter: The grok filter is a powerful plugin used to parse and structure unstructured log data.
match: Specifies the pattern used to parse the log entries. %{COMBINEDAPACHELOG} is a predefined pattern in Grok that matches the common format of Apache and Nginx access logs, extracting fields like client IP, timestamp, request method, URL, HTTP status code, and more.
remove_field: After parsing the log message, this option removes the original message field from the output. This is often done to reduce data size and focus on the structured fields extracted by Grok.
date filter: This filter is used to parse timestamps from log entries and convert them into a format Logstash can understand.
match: Specifies the field and format of the timestamp to be parsed. In this case, it uses the timestamp field extracted by the Grok filter and matches it to the dd/MMM/yyyy:HH:mm:ss Z format.
target: This indicates that the parsed date should overwrite the @timestamp field, which is a standard field in Logstash/Elasticsearch for timestamping events.

The date filter in Logstash is crucial for accurately timestamping log entries. It parses the timestamp from your logs (e.g., dd/MMM/yyyy:HH:mm:ss Z) and replaces the default @timestamp field with this parsed value. This ensures that logs are indexed in Elasticsearch with the correct original time they were generated, not the time when Logstash processed them. Without this filter, all logs might be incorrectly timestamped, leading to issues when querying data in Kibana, especially with historical logs. The date filter ensures all log data, both new and old, is correctly ordered and displayed.

Output Section

output {
  elasticsearch {
    hosts => ["https://elasticsearch:9200"]
    user => "elastic"
    password => "${ELASTIC_PASSWORD}"
    index => "nginx-logs-%{+YYYY.MM.dd}"
    ssl_enabled => true
    ssl_verification_mode => "full"
    ssl_certificate_authorities => ["/usr/share/logstash/certs/ca/ca.crt"]
  }
  stdout { codec => rubydebug }
}

elasticsearch output plugin: This defines where Logstash should send the processed log data, in this case, to an Elasticsearch cluster.
hosts: Specifies the Elasticsearch server's address. Here, it's set to ("https://elasticsearch:9200"), meaning Logstash sends data to this Elasticsearch instance over HTTPS.
user and password: These provide the credentials for authenticating with Elasticsearch. The password is referenced from an environment variable ${ELASTIC_PASSWORD} for security.
index: Defines the Elasticsearch index where the logs should be stored. Here, the index name is set to ("nginx-logs-%{+YYYY.MM.dd}"), which means a new index is created daily, with the date appended to the index name.
ssl_enabled: Indicates that SSL/TLS is enabled for communication with Elasticsearch.
ssl_verification_mode: Set to "full", meaning Logstash will fully verify the SSL certificate of the Elasticsearch server to ensure a secure connection.
ssl_certificate_authorities: Specifies the path to the certificate authority file used to verify the SSL certificate.
stdout output plugin: This sends the processed log data to the console (standard output), mainly for debugging purposes.
codec => rubydebug: Formats the output in a readable JSON-like format, making it easier to inspect the processed data.

After configuring Elasticsearch, Kibana, and Logstash, the next step is to use Docker Compose to orchestrate these components.

6-Docker Compose

Docker Compose enables us to define and run multi-container Docker applications, ensuring seamless integration of all parts of our ELK stack. In this next section, we will focus on setting up the docker-compose.yml file to simplify the deployment and management of the entire ELK stack.

docker-compose.yml :

version: '3.8'

volumes:
  certs:
    name: certs
  elasticsearch:
    name: elasticsearch
  kibana:
    name: kibana

networks:
  elk-stack:
    name: elk-stack

services:
  setup:
    container_name: setup
    build: ./setup
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
      - /var/run/docker.sock:/var/run/docker.sock
    env_file:
      - .env
    networks:
      - elk-stack
    healthcheck:
      test: ['CMD-SHELL', '[ -f /usr/share/elasticsearch/config/certs/elasticsearch/elasticsearch.crt ]']
      interval: 1s
      timeout: 5s
      retries: 120

  elasticsearch:
    container_name: elasticsearch
    depends_on:
      setup:
        condition: service_healthy
    build: ./elasticsearch
    restart: always
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
      - elasticsearch:/usr/share/elasticsearch/data
    ports:
      - '9200:9200'
    env_file:
      - .env
    networks:
      - elk-stack
    mem_limit: ${MEM_LIMIT}
    ulimits:
      memlock:
        soft: -1
        hard: -1

  kibana:
    container_name: kibana
    depends_on:
      setup:
        condition: service_healthy
    build: ./kibana
    restart: always
    volumes:
      - certs:/usr/share/kibana/config/certs
      - kibana:/usr/share/kibana/data
    ports:
      - '5601:5601'
    env_file:
      - .env
    networks:
      - elk-stack
    mem_limit: ${MEM_LIMIT}

  logstash:
    container_name: logstash
    depends_on:
      setup:
        condition: service_healthy
    build: ./logstash
    restart: always
    volumes:
      - certs:/usr/share/logstash/certs
      - ./nginx/access.log:/usr/share/logstash/nginx/access.log
    env_file:
      - .env
    networks:
      - elk-stack

Version and Top-Level Configuration

version: '3.8'

version: Specifies the version of the Docker Compose file format. Version 3.8 is used here.

Volumes

volumes:
  certs:
    name: certs
  elasticsearch:
    name: elasticsearch
  kibana:
    name: kibana

volumes:

Defines named volumes for storing persistent data.

certs: Volume for storing SSL certificates.
elasticsearch: Volume for storing Elasticsearch data.
kibana: Volume for storing Kibana data.

Networks

networks:
  elk-stack:
    name: elk-stack

networks: Defines a custom network named elk-stack to allow communication between the ELK stack services.

Services

Setup Service

services:
  setup:
    container_name: setup
    build: ./setup
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
      - /var/run/docker.sock:/var/run/docker.sock
    env_file:
      - .env
    networks:
      - elk-stack
    healthcheck:
      test: ['CMD-SHELL', '[ -f /usr/share/elasticsearch/config/certs/elasticsearch/elasticsearch.crt ]']
      interval: 1s
      timeout: 5s
      retries: 120

setup: This service handles the generation of SSL certificates.
container_name: Names the container setup.
build: Specifies the build context as the ./setup directory.
volumes: Mounts the certs volume and the Docker socket inside the container.
env_file: Loads environment variables from the .env file.
networks: Connects the container to the elk-stack network.
healthcheck: Defines a health check to ensure the setup service is completed and the certificates are generated.
test: The command that Docker runs to check the health of the container. In this case, it uses a shell command to check if the SSL certificate file elasticsearch.crt exists.
interval: Specifies How often to run the health check command. Here, it is set to run every 1 second.
timeout: The maximum time that Docker will wait for the health check command to complete before considering it failed. Here, it is set to 5 seconds.
retries: The number of consecutive failures required for the container to be considered unhealthy. Here, it is set to 120 retries. This means that if the check fails 120 times in a row, Docker will mark the container as unhealthy.

Elasticsearch Service

elasticsearch:
    container_name: elasticsearch
    depends_on:
      setup:
        condition: service_healthy
    build: ./elasticsearch
    restart: always
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
      - elasticsearch:/usr/share/elasticsearch/data
    ports:
      - '9200:9200'
    env_file:
      - .env
    networks:
      - elk-stack
    mem_limit: ${MEM_LIMIT}
    ulimits:
      memlock:
        soft: -1
        hard: -1

elasticsearch:

Configures the Elasticsearch service.

container_name: Names the container elasticsearch.
depends_on: Ensures this service starts only after the setup service is healthy.
build: Specifies the build context as the ./elasticsearch directory.
restart: Always restart the container if it stops.
volumes: Mounts the certs volume and the elasticsearch data volume inside the container.
ports: Exposes port 9200 for Elasticsearch.
env_file: Loads environment variables from the .env file.
networks: Connects the container to the elk-stack network.
mem_limit: Limits the container’s memory usage based on the MEM_LIMIT environment variable.
ulimits: Sets the memory lock limits to avoid swapping.
memlock: Specifically controls the maximum amount of memory that can be locked into RAM, preventing it from being swapped to disk.
soft: The soft limit is the value that the kernel enforces for the corresponding resource. This value can be increased up to the hard limit.
hard: The hard limit acts as a ceiling for the soft limit and can only be increased by the root user.
setting both the soft and hard limits to -1 allows the container to lock unlimited memory.

Kibana Service

kibana:
    container_name: kibana
    depends_on:
      setup:
        condition: service_healthy
    build: ./kibana
    restart: always
    volumes:
      - certs:/usr/share/kibana/config/certs
      - kibana:/usr/share/kibana/data
    ports:
      - '5601:5601'
    env_file:
      - .env
    networks:
      - elk-stack
    mem_limit: ${MEM_LIMIT}

kibana:

Configures the Kibana service.

container_name: Names the container kibana.
depends_on: Ensures this service starts only after the setup service is healthy.
build: Specifies the build context as the ./kibana directory.
restart: Always restart the container if it stops.
volumes: Mounts the certs volume and the kibana data volume inside the container.
ports: Exposes port 5601 for Kibana.
env_file: Loads environment variables from the .env file.
networks: Connects the container to the elk-stack network.
mem_limit: Limits the container’s memory usage based on the MEM_LIMIT environment variable.

Logstash Service

logstash:
    container_name: logstash
    depends_on:
      setup:
        condition: service_healthy
    build: ./logstash
    restart: always
    volumes:
      - certs:/usr/share/logstash/certs
      - ./nginx/access.log:/usr/share/logstash/nginx/access.log
    env_file:
      - .env
    networks:
      - elk-stack

logstash:

Configures the Logstash service.

container_name: Names the container logstash.
depends_on: Ensures this service starts only after the setup service is healthy.
build: Specifies the build context as the ./logstash directory.
restart: Always restart the container if it stops.
volumes: Mounts the certs volume inside the container. and maps the nginx/access.log file from the host machine to /usr/share/logstash/nginx/access.log inside the container, allowing Logstash to read this log file.
env_file: Loads environment variables from the .env file.
networks: Connects the container to the elk-stack network.

This is the source code for the project we built in this blog. I have added my Inception project to the source code in my GitHub repository for continuous data generation, processing, and deployment. and also a custom dashboard, which I upload into Kibana using the setup container

Finally, this article offers a detailed and organized approach to setting up a strong and secure ELK stack using Docker Compose. It covers initial configurations in the .env file and Makefile, the generation of SSL certificates with the setup container, and the deployment of Elasticsearch, Kibana, and Logstash containers. Each step is carefully designed to ensure seamless integration and optimal performance. Following this guide, you can efficiently collect, process, store, and visualize log data, gaining powerful insights and enhancing your ability to make data-driven decisions.

Thank you for joining the deep dive into the ELK Stack, and if you have any questions or suggestions please leave a comment! 🚀

My LinkedIn account and my GitHub account.