Centralised Logging System Using ELK and Filebeat

9 min readJan 29, 2024

In the world of server management, where the volume of data generated by applications and services continues to surge, gaining visibility into system health and performance has become paramount. This is where centralised logging emerges as a beacon of insight, offering a holistic approach to managing and analysing log data across diverse server environments.

In this blog, we delve into the critical importance of centralised logging in modern server infrastructure. From enhancing troubleshooting capabilities to bolstering security measures, centralised logging serves as a linchpin in maintaining the robustness and resilience of server ecosystems.

Lets explore the multifaceted benefits and practical applications of centralised logging, empowering organisations to harness the power of data-driven decision-making and elevate their server management strategies to new heights.

ELK and Filebeat Stack

Enter the ELK Stack — a comprehensive suite of open-source tools designed to streamline the collection, storage, analysis, and visualisation of log data. Born out of the need for a scalable and flexible solution to manage the exponential growth of log data, the ELK Stack has emerged as a cornerstone in modern IT infrastructures, empowering organisations to harness the full potential of their data assets.

At the heart of the ELK Stack lies Elasticsearch — a powerful search and analytics engine renowned for its speed, scalability, and real-time capabilities. Complemented by Logstash, a versatile data processing pipeline, and Kibana, an intuitive visualisation platform, the ELK Stack offers a seamless end-to-end solution for log management and analytics.

Filebeat, an essential component of the ELK Stack, serves as a lightweight shipper that seamlessly collects and forwards log data from various sources to Elasticsearch or Logstash for further processing and analysis. Engineered for simplicity and efficiency, Filebeat excels in its ability to ingest log files, system metrics, and even Docker container logs, offering unparalleled flexibility in data collection. With its real-time capabilities and minimal resource footprint, Filebeat ensures timely delivery of log data, enabling organizations to stay abreast of critical events and trends. Whether it’s monitoring application logs, auditing system activities, or detecting security incidents, Filebeat plays a pivotal role in ensuring the seamless flow of log data within the ELK Stack ecosystem, empowering organizations to derive actionable insights and drive informed decision-making.

In this introductory journey, we’ll delve into the architecture of the ELK Stack, highlighting the role of each component and their synergy in transforming raw log data into actionable insights. From ingesting logs from diverse sources to performing advanced analytics and crafting rich visualisations, the ELK Stack provides organisations with the tools they need to make data-driven decisions and drive operational excellence.

Lets explore the various capabilities of the ELK Stack, showcase real-world use cases, and uncover best practices for implementation and optimisation. Whether you’re a seasoned DevOps engineer, a data analyst, or a business leader seeking to leverage data-driven insights, this guide aims to equip you with the knowledge and tools to unlock the transformative power of the ELK Stack.

Assumptions : -

In the context of our discussion, we operate under the assumption that our infrastructure comprises multiple microservices deployed in a distributed environment. These microservices are designed to be horizontally scalable and are orchestrated to run in parallel instances to meet varying demands effectively. Leveraging containerisation technologies such as Docker and orchestration platforms like Kubernetes, our microservices architecture enables seamless scaling both up and down based on workload fluctuations, ensuring optimal resource utilisation and high availability.

Each microservice operates independently, serving a specific function or business logic, and communicates with other services via well-defined APIs or messaging protocols. As a result, our system exhibits a high degree of modularity, allowing for rapid development, deployment, and iteration of individual components.

Furthermore, the autoscaling capability inherent in our architecture enables automatic provisioning and de-provisioning of resources in response to changes in workload demand. This elasticity not only ensures consistent performance under varying loads but also helps optimize costs by scaling resources dynamically based on actual usage patterns.

In this dynamic and distributed environment, effective log management and analysis become paramount for maintaining visibility, troubleshooting issues, and optimizing performance across the entire microservices ecosystem. Hence, the integration of robust logging solutions like the ELK Stack, coupled with lightweight log shippers like Filebeat, is essential to gather, centralize, and analyze log data from diverse microservices instances, allowing for comprehensive monitoring, troubleshooting, and performance optimization at scale.

Here as you can see we have assumed the following setup :-

There are two servers S1 and S2 with their own autoscaling groups setup
Each Servers have two envs Production and Development
So we assume following tags for the servers

(+) S1_prod, (+) S1_qa, (+) S2_prod, (+) S2_qa

Architectural Design

In this setup, Elasticsearch serves as the central repository for storing all the data required for generating analytics. The process entails a consistent flow of logs being transmitted to Elasticsearch through the Filebeat and Logstash pipeline. Each log undergoes meticulous processing and classification based on specific tags before being systematically indexed within Elasticsearch. This structured approach ensures that the data is efficiently organised and readily available for subsequent analytical procedures and decision-making tasks. In addition to Elasticsearch as the storage backend, Kibana is employed as the visualisation and exploration tool for analysing the logs. Kibana complements Elasticsearch by providing a user-friendly interface for querying, visualising, and interacting with the data stored within Elasticsearch.

Implementation

Folder Structure :

| main.go
| go.mod
| go.sum
| .
| .
| .
| .
| filebeat
| | filebeat.yml
| | logstash-forwarder.crt
|

1. Setup Filebeat on Your Service Server:

Upon running the Docker container, we’ll ensure Filebeat is installed onto your app server.

a. Generate a Logstash CRT file to get private key and public key so that only authenticated servers can send logs to ELK

openssl req -x509 -batch -nodes -days 3650 -newkey rsa:2048 -keyout logstash-forwarder.key -out logstash-forwarder.crt -config logstash-forwarder.cn

Use this logstash-forwarder.crt in <working_directory>/filebeat/logstash-forwarder.crt

b. Create a shell script with commands to install Filebeat:

echo "# Downloading Filebeat From the Elastic search Distrubute"
wget -qO - <https://artifacts.elastic.co/GPG-KEY-elasticsearch> | apt-key add -

echo "# Install peer dependency apt-transport-https"
apt-get install apt-transport-https

echo "# Save the repository definition"
echo "deb <https://artifacts.elastic.co/packages/8.x/apt> stable main" | tee -a /etc/apt/sources.list.d/elastic-8.x.list

echo "# Updating the Files :"
apt-get update && apt-get install filebeat

echo "# Moving the Actual File to Default config Location:"
mv ./filebeat/filebeat.yml /etc/filebeat/filebeat.yml

echo "# Elevating the file permissions"
chmod go-w /etc/filebeat/filebeat.yml

echo "# Run the actual Command :"
service filebeat start > /dev/null 2>&1 & 

echo "# Starting the Server ...."
/usr/src/app/bandit

c. Configuration for Filebeat YAML:

yamlCopy code
# ============================== Filebeat inputs ===============================

filebeat.inputs:

# filestream is an input for collecting log messages from files.
- type: log

  # Unique ID among all inputs, an ID is required.
  id: s1_api
  tags: ["s1_prod_api"]
  enabled: true
  fields_under_root: true
  json:
    message_key: "message"
    keys_under_root: true
    overwrite_keys: true
    add_error_key: false
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /usr/src/app/logfile.log

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml

# ------------------------------ Logstash Output -------------------------------
output.logstash:
  # The Logstash host running on ELK server
  hosts: ["<LOGSTASH_IP>:<PORT>"]
  ssl.certificate_authorities: ["/usr/src/app/filebeat/logstash-forwarder.crt"]

In the above configuration:

We monitor logfile.log in the root directory of the application (/usr/src/app/logfile.log) within the Docker container.
The tags field is set to s1_prod_api to identify the server source on the Logstash server.
<LOGSTASH_IP>:<PORT> note here you have to substitute the logstash IP and port to send logs to.

2. Setup Of ELK on Central Server

we will use following repository for our setup of ELK stack on server

GitHub — deviantony/docker-elk: The Elastic stack (ELK) powered by Docker and Compose.

By default, the stack exposes the following ports:

5044: Logstash Beats input
50000: Logstash TCP input
9600: Logstash monitoring API
9200: Elasticsearch HTTP
9300: Elasticsearch TCP transport
5601: Kibana

Bringing up the stack

Clone this repository onto the Docker host that will run the stack with the command below:

git clone <https://github.com/deviantony/docker-elk.git>

Use the private key generated here and create a folder inside logstash folder to add certs folder where the logstash-forwarder.key can be stored to authenticate all incoming logs.
Edit the logstash logstash/pipeline/logstash.conf file to following

Input Section:

input {
    beats {
        port => 5044
        include_codec_tag => false
        ssl => true
        ssl_certificate => "/etc/ssl/certs/logstash-forwarder.crt"
        ssl_key => "/etc/ssl/private/logstash-forwarder.key"
        client_inactivity_timeout => 36000
    }
}

Input Plugin: Specifies the input plugin to be used, which in this case is Beats. Beats is a lightweight shipper that sends data from various sources to Logstash or Elasticsearch.
port: Defines the port on which Logstash will listen for incoming Beats connections.
include_codec_tag: If set to false, the codec tag won't be included in events. Codecs are used for serializing and deserializing data.
ssl: Indicates whether SSL/TLS encryption is enabled for communication between Beats and Logstash.
ssl_certificate: Specifies the path to the SSL certificate file used by Logstash for secure communication.
ssl_key: Specifies the path to the SSL key file used by Logstash for secure communication.
client_inactivity_timeout: Defines the maximum time (in seconds) that Logstash will wait for a message from the Beats client before closing the connection due to inactivity.

Filter Section:

filter {
    prune {
        whitelist_names => [ "_index", "_id", "message", "log.level", "@timestamp", "error", "tags", "message", "log.file.path", "event.original", "host.os.name" ]
    }
}

Filter Plugin: Specifies the filter plugin to be used for processing incoming events. In this case, the prune filter is used.
prune: Removes fields from events that are not listed in the whitelist_names array. It retains only the specified fields and discards all others.

Output Section:

plaintextCopy code
output {
    if "s1_prod" in [tags] {
        elasticsearch {
            hosts => ["<http://elasticsearch:9200>"]
            user => "elastic_username"
            password => "elastic_password"
            index => "s1_prod"
        }
    }
}

Output Plugin: Specifies the output plugin to be used, which in this case is Elasticsearch.
if “s1_prod” in [tags]: Conditional statement that checks if the s1_prod tag exists in the tags field of the event. If true, the event will be processed by the subsequent Elasticsearch output plugin.
elasticsearch: Defines the Elasticsearch output plugin configuration.
hosts: Specifies the Elasticsearch nodes to which Logstash should send the processed events.
user: Specifies the username for accessing Elasticsearch.
password: Specifies the password for accessing Elasticsearch.
index: Specifies the index name where the processed events will be stored in Elasticsearch.

This configuration essentially sets up Logstash to receive data from Beats over a secure connection, filter the incoming events to retain only specified fields, and then forward the processed events to Elasticsearch for indexing, specifically targeting the s1_prod index.

Now the github repository mentioned above GitHub — deviantony/docker-elk: The Elastic stack (ELK) powered by Docker and Compose. run the respository to receive the logs.

In conclusion, the implementation of a centralized logging system utilizing ELK (Elasticsearch, Logstash, Kibana) stack along with Filebeat offers a powerful solution for aggregating, processing, visualizing, and analyzing log data from various sources. By leveraging Elasticsearch as the storage backend, Logstash as the data processing pipeline, Kibana as the visualization and exploration tool, and Filebeat as the lightweight log shipper, organizations can achieve comprehensive insights into their system’s performance, errors, and trends.