Deploying ELK with Kafka on Kubernetes
By: Ashwin Venkatesan
When it comes to logs, one thing is certain: they’re unpredictable. In the midst of a production incident, just when you need them most, log volumes can surge, potentially overwhelming your logging setup. That’s where the power of buffering mechanisms comes in, acting as a safeguard for both Logstash and Elasticsearch. Enter Apache Kafka, the reliable broker solution commonly integrated with the ELK Stack. In the realm of Kubernetes, Kafka serves as a vital entry point, managing the influx of collected data.
In this guide, I’ll walk you through the process of deploying a robust data pipeline using the ELK Stack and Kafka in a Kubernetes environment. Here’s a breakdown of the key components and their roles:
- Filebeat: Your log collector — gathers and sends logs to Kafka, kicking off the data journey.
- Kafka: Data broker — manages flow, queues, and routes, handling unpredictable log surges.
- Logstash: Aggregates and ships — processes data from Kafka, sends to Elasticsearch.
- Elasticsearch: Indexes for searching and analysis, turning raw data into insights.
- Kibana: Unveils insights — explore and analyze collected data for actionable takeaways.
In the upcoming sections, I’ll guide you through the deployment process, ensuring you’re equipped with the knowledge to build your own resilient data pipeline with ELK and Kafka, all within the dynamic “Kubernetes” ecosystem.
MY Environment
In this scenario, I’m using a CentOS 7 machine locally within a VMware environment.
Moving on, with Minikube, we’re orchestrating a Kubernetes environment that’s currently configured as a single-node setup with a master node.
For our ELK stack and Filebeat deployment, we’re leveraging Helm charts. As for Apache Kafka, we’re employing the Strimzi Kafka Operator — a combination that promises an intriguing journey.
This serves as a testing and development setup. In real-life scenarios, you’ll likely have these components running on separate machines.
Now, let’s dive right in.
Step 1: Installing Minikube
we will start installing minikube on centos 7
To install the latest minikube stable release on x86–64 Linux using binary download:
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
after installing minikube we can start our work the first step is to start the minikube with
Minikube start --memory 8096 --cpus 4
And you should see something like this
Note that Minikube requires a driver for its functionality and operation. The default driver I recommend using is Docker (ClickHere).
Deploy Apache kafka
for deploying Apache kakfa cluster we are using Strimzi cluster operator
Create a namespace called kafka :
kubectl create namespace kafka
Apply the Strimzi install files, including ClusterRoles
, ClusterRoleBindings
and some Custom Resource Definitions (CRDs
). The CRDs define the schemas used for the custom resources (CRs, such as Kafka
, KafkaTopic
and so on) you will be using to manage Kafka clusters, topics and users.
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka
Follow the deployment of the Strimzi cluster operator:
kubectl get pod -n kafka --watch
Make sure The Container is in the Running state
Create an Apache Kafka cluster
Create a new Kafka custom resource to get a small persistent Apache Kafka Cluster with one node for Apache Zookeeper and Apache Kafka:
# Apply the `Kafka` Cluster CR file
kubectl apply -f https://strimzi.io/examples/latest/kafka/kafka-persistent-single.yaml -n kafka
Wait while Kubernetes starts the required pods, services, and so on:
kubectl wait kafka/my-cluster --for=condition=Ready --timeout=300s -n kafka
The above command might timeout if you’re downloading images over a slow connection. If that happens you can always run it again.
Note: Make sure you receive the message similar to picture i mentioned above(“kafka.kafka.strimzi.io/my-cluster condition met”)
Send and receive messages
With the cluster running, run a simple producer to send messages to a Kafka topic (the topic is automatically created):
kubectl -n kafka run kafka-producer -ti --image=quay.io/strimzi/kafka:0.36.1-kafka-3.5.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic
And to receive them in a different terminal, run:
kubectl -n kafka run kafka-consumer -ti --image=quay.io/strimzi/kafka:0.36.1-kafka-3.5.1 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
And Yes We deployed Apache kafka on Kubernetes :)
Now The next step is to port the kafka topics/logs to ELK stack using filebeat. Minimize the current tabs and open a new terminal and follow the steps below
Installing Helm Charts
In our deployment process for the ELK stack and our log shipper, Filebeat, we’re using Helm charts to simplify their installation. Below is the code to install the Helm charts:
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
Deploying ELK STACK
Once Helm is installed, visit the Artifact Hub — a repository for Kubernetes packages — and search for ‘Elastic.’ Download the official Elasticsearch package by Elastic Inc. Below are the installation codes for Elasticsearch:
First add the repository
helm repo add elastic https://helm.elastic.co
#install the chart
helm install my-elasticsearch elastic/elasticsearch --version 7.17.3 \
--set replicas=1
Note: I modified the chart code for elasticsearch to produce single replica of Elasticsearch. Because by default it creates 3 replicas which conflicting in errors.
Likewise, for Logstash, Kibana, and Filebeat:
helm install my-logstash elastic/logstash --version 7.17.3
helm install my-kibana elastic/kibana --version 7.17.3
helm install my-filebeat elastic/filebeat --version 7.17.3
Yess,you’ve done it! The ELK stack, Filebeat, and Apache Kafka are now deployed on Kubernetes. Ensure everything is running smoothly by using the following commands:”
kubectl get pods
kubectl get pods -n kafka
And there you have it! We’ve successfully deployed the ELK stack, Filebeat, and Apache Kafka. Filebeat, deployed as a daemonset, is now diligently collecting and shipping all the logs to Elasticsearch. To visualize the logs, we have Kibana at our disposal. However, note that by default, Kibana is exposed with a cluster IP, limiting access to within the cluster. To access it from our local machine, we can use the following command:
kubectl port-forward service/my-kibana-kibana 5601:5601
Next, we create an index pattern of type ‘filebeat-*’ — and that’s all it takes! With this in place, we can seamlessly visualize all the logs, including the Kafka logs.
As we observe Filebeat successfully transporting the Kafka logs to Elasticsearch within the Kubernetes environment, we’ve finally achieved our goal!
ENDNOTES
In a Nutshell: Kafka Logs to ELK via Kubernetes
And there you have it — the roadmap to effortlessly route Kafka logs to ELK, all within the Kubernetes universe. With Filebeat, Helm charts, and the magic of Strimzi Kafka Operator, you’ve set the stage for seamless data flow.
As you continue your coding journey, remember: Kubernetes orchestrates, Kafka logs flow, and ELK waits to analyze. Your tech story is just beginning.
Here’s to Kubernetes, Kafka, ELK, and your tech adventures ahead!
Feel free to adjust this conclusion to fit the style of your blog. If you have more content to review or any other queries, feel free to ask!
Acknowledgement
“A special note of gratitude goes to Sibi Chakkaravarthy S from VIT-AP University and Dr. S. Karthikeyan, Associate Professor, Dept of CSE(AIML), for their valuable insights and contributions that enriched the content of this article.”