Monitor Apache Kafka Using Grafana and Prometheus

One of the problems that many newcomers will face when starting with Apache Kafka is how to monitor their Kafka Cluster, producers, and consumers. This question is asked several times in StackOverflow even recently (see this and this). Confluent (The company behind Apache Kafka) shipped Confluent Control Center with its commercial downloads that can manage Kafka Cluster and Monitor it. But it is not available for community version of the confluent platform.

TL;DR

Kafka produces its metric through JMX. In this article, I want to show you what is JMX, how to use Prometheus to store Kafka JMX metrics and how to visualize them using Grafana to monitor your Kafka broker. Here you can see the final result of your dashboard:

If you want to run the whole stack with just one command take look at my kafka-grafana repository. But you still need to configure Grafana to use Prometheus which is described at the end of this article.

Installation

I assume you have already installed Apache Kafka and Zookeeper and they are accessible using localhost:9092 and localhost:2181 respectively. For the purpose of this article, I installed a fresh copy of Ubuntu 18 and assume that you installed the Confluent community platform using the Debian package.

I will describe the role of JMX and JMX exporter in the next section, but let's download it first:

wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.9/jmx_prometheus_javaagent-0.9.jar

JMX exporter has a YAML config file. For Kafka, you can download this config file:

wget https://github.com/prometheus/jmx_exporter/raw/master/example_configs/kafka-2_0_0.yml

Now you can run zookeeper:

sudo systemctl start confluent-zookeeper

Before we run Kafka we must pass an environment variable named KAFKA_OPTS to Kafka that enable JMX exporter to export JMX metrics from JVM. Because ubuntu installation adds systemd service, we can set KAFKA_OPTS in the service file (located in /lib/systemd/system/confluent-kafka.service) and add the environment variable. Just add the bolded line:

[Unit]
...
[Service]
...
Restart=no
Environment=KAFKA_OPTS=-javaagent:/home/morteza/myworks/jmx_prometheus_javaagent-0.9.jar=7071:/home/morteza/myworks/kafka-2_0_0.yml
[Install]
...

With this environment variable, we are able to view Kafka metrics through HTTP. Note that I passed the path to my jmx_prometheus_javaagent-0.9.jar and kafka-2_0_0.yml. Change the path to the correct path on your computer. Also, I choose port 7071 for the HTTP server. Now run Kafka:

sudo systemctl start confluent-kafka

Before we continue to the next section, let's create a simple topic:

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic mytopic

View JMX Metrics

Kafka brokers expose their metric through JMX. As alex.dzyoba.com said:

JMX is a common technology in Java world for exporting statistics of running application and also to control it (you can trigger GC with JMX, for example).

We can think of JMX as an API in JVM that allows us to pull metrics of the application that is running on JVM. It is similar to the way we call a web service API to get information about a resource.

JDK has a tool called JConsole that is used to view available metrics of a JVM:

sudo jconsole

This will brings the following window:

Select Kafka process and click on the Connect button and then select the MBeans tab. You can view a list of metrics in the left pane. If you choose a metric from the list, you will see something like this:

As you see above I want to view the number of in sync replicas (ISR) for topic mytopic. In the above image, it has value 1. This is one of the JMX metrics exposed through Kafka and as you see in the image, you can view a lot of metrics even without using any special tool but jConsole.

Expose Metrics Using Prometheus JMX Exporter

Although you can view Kafka and Zookeeper metrics in jConsole, in a real-world scenario you probably want to automatically collect these metrics and show them in an informative dashboard. To collect Kafka metrics for Prometheus we use JMX exporter.

JMX exporter collects metrics from Kafka and exposes them via HTTP API in a format that is readable for Prometheus. The good news is that we already have configured this by setting KAFKA_OPTS environment variable in Kafka service file. Now you can view the result:

curl http://localhost:7071/metrics
# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 64.0
...
kafka_network_requestmetrics_requestbytes{request="DescribeLogDirs",quantile="0.95",} 0.0
kafka_network_requestmetrics_requestbytes{request="DescribeLogDirs",quantile="0.98",} 0.0
kafka_network_requestmetrics_requestbytes{request="DescribeLogDirs",quantile="0.99",} 0.0
kafka_network_requestmetrics_requestbytes{request="DescribeLogDirs",quantile="0.999",} 0.0
...
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 68.0
...

There is a lot of metrics and I omitted them for brevity. Now Prometheus is able to pull monitoring data from localhost:7071/metrics.

Store Kafka Cluster Metrics in Prometheus

Download Prometheus:

wget https://github.com/prometheus/prometheus/releases/download/v2.6.1/prometheus-2.6.1.linux-amd64.tar.gz
tar -xzf prometheus-2.6.1.linux-amd64.tar.gz
cd prometheus-2.6.1.linux-amd64/

Edit prometheus.yml in this directory and put following content in it:

global:
scrape_interval: 10s
evaluation_interval: 10s
scrape_configs:
- job_name: 'kafka'
static_configs:
- targets:
- localhost:7071

Now run Prometheus with the following command:

./prometheus

Open your browser and go to localhost:9090:

You can choose your metric and click Execute button to view its data. You can also view a simple graph by choosing Graph tab. For example, choose jvm_memory_bytes_used metric and click Graph tab:

This graph shows you some history of Kafka JVM memory. For more complex dashboard we need to install Grafana and import Prometheus data into that.

Visualize Metrics in Grafana

Grafana allows us to create a visual dashboard by providing it a data source. Some data sources such as Prometheus are supported officially. Let’s Download Grafana and install it:

wget https://dl.grafana.com/oss/release/grafana_5.4.2_amd64.deb
sudo apt-get install -y adduser libfontconfig
sudo dpkg -i grafana_5.4.2_amd64.deb
sudo systemctl start grafana-server

Now in your browser go to localhost:3000 to view Grafana first page and configure it. At first, you will see the login page. You must enter admin for both username and password. In the next step, Grafana asks you for a new password. Change your password and after that, you will be redirected to the main page of Grafana:

Click on the Add data source button. In the next page choose Prometheus as your data source type. In the next page enter http://localhost:9090 for URL and finally click Save and Test button.

After importing data source you can create your dashboard. Because dashboard creation can be done using configuration, some developers have created their own dashboard and added it to Grafana as a plugin so that other developers can use it. You can download one of Kafka template using following command:

wget https://grafana.com/api/dashboards/721/revisions/1/download

Click on the plus button in the left bar and choose import. Click on Upload .json file button and select the file you recently downloaded. Now click on the Select a Prometheus data source and choose Prometheus data source that we have already created. Click on the Import button. Now you can see a beautiful dashboard:

By default, Grafana does not refresh the page automatically but you can enable this by clicking on the Last 30 minutes button on the top right corner of the page and then click on refreshing every drop-down list and finally select 10s. Click apply and your dashboard will be refreshed every 10 seconds.

Conclusion

We learned how to create a dashboard for Kafka metrics using Grafana, Prometheus and its JMX exporter. The next article can be about setting an alert rule that makes notification to a channel for metrics that need more attention.

Also If you want to quickly run this tools in docker you can clone my kafka-grafana repository.

References

https://alex.dzyoba.com/blog/jmx-exporter/
https://hackernoon.com/simple-chatops-with-kafka-grafana-prometheus-and-slack-764ece59e707

https://www.robustperception.io/monitoring-kafka-with-prometheus
https://medium.com/@danielmrosa/monitoring-kafka-b97d2d5a5434