Deploying ELK with Kafka on Kubernetes

6 min readAug 28, 2023

By: Ashwin Venkatesan

When it comes to logs, one thing is certain: they’re unpredictable. In the midst of a production incident, just when you need them most, log volumes can surge, potentially overwhelming your logging setup. That’s where the power of buffering mechanisms comes in, acting as a safeguard for both Logstash and Elasticsearch. Enter Apache Kafka, the reliable broker solution commonly integrated with the ELK Stack. In the realm of Kubernetes, Kafka serves as a vital entry point, managing the influx of collected data.

In this guide, I’ll walk you through the process of deploying a robust data pipeline using the ELK Stack and Kafka in a Kubernetes environment. Here’s a breakdown of the key components and their roles:

Filebeat: Your log collector — gathers and sends logs to Kafka, kicking off the data journey.
Kafka: Data broker — manages flow, queues, and routes, handling unpredictable log surges.
Logstash: Aggregates and ships — processes data from Kafka, sends to Elasticsearch.
Elasticsearch: Indexes for searching and analysis, turning raw data into insights.
Kibana: Unveils insights — explore and analyze collected data for actionable takeaways.

In the upcoming sections, I’ll guide you through the deployment process, ensuring you’re equipped with the knowledge to build your own resilient data pipeline with ELK and Kafka, all within the dynamic “Kubernetes” ecosystem.

MY Environment

In this scenario, I’m using a CentOS 7 machine locally within a VMware environment.

Moving on, with Minikube, we’re orchestrating a Kubernetes environment that’s currently configured as a single-node setup with a master node.

For our ELK stack and Filebeat deployment, we’re leveraging Helm charts. As for Apache Kafka, we’re employing the Strimzi Kafka Operator — a combination that promises an intriguing journey.

This serves as a testing and development setup. In real-life scenarios, you’ll likely have these components running on separate machines.

Now, let’s dive right in.

Step 1: Installing Minikube

we will start installing minikube on centos 7

To install the latest minikube stable release on x86–64 Linux using binary download:

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube

after installing minikube we can start our work the first step is to start the minikube with

Minikube start --memory 8096 --cpus 4

And you should see something like this

Note that Minikube requires a driver for its functionality and operation. The default driver I recommend using is Docker (ClickHere).

Deploy Apache kafka

for deploying Apache kakfa cluster we are using Strimzi cluster operator

Create a namespace called kafka :

kubectl create namespace kafka

Apply the Strimzi install files, including ClusterRoles, ClusterRoleBindings and some Custom Resource Definitions (CRDs). The CRDs define the schemas used for the custom resources (CRs, such as Kafka, KafkaTopic and so on) you will be using to manage Kafka clusters, topics and users.

kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka

Follow the deployment of the Strimzi cluster operator:

kubectl get pod -n kafka --watch

Make sure The Container is in the Running state

Create an Apache Kafka cluster

Create a new Kafka custom resource to get a small persistent Apache Kafka Cluster with one node for Apache Zookeeper and Apache Kafka:

# Apply the `Kafka` Cluster CR file
kubectl apply -f https://strimzi.io/examples/latest/kafka/kafka-persistent-single.yaml -n kafka

Wait while Kubernetes starts the required pods, services, and so on:

kubectl wait kafka/my-cluster --for=condition=Ready --timeout=300s -n kafka

The above command might timeout if you’re downloading images over a slow connection. If that happens you can always run it again.

Note: Make sure you receive the message similar to picture i mentioned above(“kafka.kafka.strimzi.io/my-cluster condition met”)

Send and receive messages

With the cluster running, run a simple producer to send messages to a Kafka topic (the topic is automatically created):

kubectl -n kafka run kafka-producer -ti --image=quay.io/strimzi/kafka:0.36.1-kafka-3.5.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic

And to receive them in a different terminal, run:

kubectl -n kafka run kafka-consumer -ti --image=quay.io/strimzi/kafka:0.36.1-kafka-3.5.1 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning

And Yes We deployed Apache kafka on Kubernetes :)

Now The next step is to port the kafka topics/logs to ELK stack using filebeat. Minimize the current tabs and open a new terminal and follow the steps below

Installing Helm Charts

In our deployment process for the ELK stack and our log shipper, Filebeat, we’re using Helm charts to simplify their installation. Below is the code to install the Helm charts:

 curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
 chmod 700 get_helm.sh
 ./get_helm.sh

Deploying ELK STACK

Once Helm is installed, visit the Artifact Hub — a repository for Kubernetes packages — and search for ‘Elastic.’ Download the official Elasticsearch package by Elastic Inc. Below are the installation codes for Elasticsearch:

First add the repository

helm repo add elastic https://helm.elastic.co

#install the chart

helm install my-elasticsearch elastic/elasticsearch --version 7.17.3 \
  --set replicas=1

Note: I modified the chart code for elasticsearch to produce single replica of Elasticsearch. Because by default it creates 3 replicas which conflicting in errors.

Likewise, for Logstash, Kibana, and Filebeat:

helm install my-logstash elastic/logstash --version 7.17.3

helm install my-kibana elastic/kibana --version 7.17.3

helm install my-filebeat elastic/filebeat --version 7.17.3

Yess,you’ve done it! The ELK stack, Filebeat, and Apache Kafka are now deployed on Kubernetes. Ensure everything is running smoothly by using the following commands:”

kubectl get pods
kubectl get pods -n kafka

And there you have it! We’ve successfully deployed the ELK stack, Filebeat, and Apache Kafka. Filebeat, deployed as a daemonset, is now diligently collecting and shipping all the logs to Elasticsearch. To visualize the logs, we have Kibana at our disposal. However, note that by default, Kibana is exposed with a cluster IP, limiting access to within the cluster. To access it from our local machine, we can use the following command:

kubectl port-forward service/my-kibana-kibana 5601:5601

Next, we create an index pattern of type ‘filebeat-*’ — and that’s all it takes! With this in place, we can seamlessly visualize all the logs, including the Kafka logs.

As we observe Filebeat successfully transporting the Kafka logs to Elasticsearch within the Kubernetes environment, we’ve finally achieved our goal!

ENDNOTES

In a Nutshell: Kafka Logs to ELK via Kubernetes

And there you have it — the roadmap to effortlessly route Kafka logs to ELK, all within the Kubernetes universe. With Filebeat, Helm charts, and the magic of Strimzi Kafka Operator, you’ve set the stage for seamless data flow.

As you continue your coding journey, remember: Kubernetes orchestrates, Kafka logs flow, and ELK waits to analyze. Your tech story is just beginning.

Here’s to Kubernetes, Kafka, ELK, and your tech adventures ahead!

Feel free to adjust this conclusion to fit the style of your blog. If you have more content to review or any other queries, feel free to ask!

Acknowledgement

“A special note of gratitude goes to Sibi Chakkaravarthy S from VIT-AP University and Dr. S. Karthikeyan, Associate Professor, Dept of CSE(AIML), for their valuable insights and contributions that enriched the content of this article.”

Deploying ELK with Kafka on Kubernetes

Send and receive messages

Written by Ashwin Venkatesan

Responses (1)