Kafka hands-on. Part I: development environment

The simplest way to get your Kafka cluster up and running on your workstation

Sergei Beilin
Nov 10, 2018 · 4 min read

The best way to learn new technology is to start working with it, right? So let’s first of all get Apache Kafka running on your workstation. You’ll need Docker, docker-compose, some disk space, and internet connection. We will use Kafka Docker images from https://www.confluent.io/.

Kafka is not for absolute beginners, so I would expect you to have experience with Python and/or Java, Docker and docker-compose, and command line. You have, as well, read some introductory articles, for example https://kafka.apache.org/intro or https://hackernoon.com/thorough-introduction-to-apache-kafka-6fbf2989bbc1

We chose docker-compose (instead of, for example, Confluent CLI tools), because most of modern developers are already comfortable with Docker and many are using docker-compose on a daily basis; this will enable you to easily include Kafka cluster configuration in your existing project, and finally, as we will see in one of the following articles, you can deploy a cluster to production running docker swarm with minimal changes.


Our development cluster will consist of:

  • one Kafka broker,
  • one Zookeeper instance,
  • one schema registry (as we will use Avro later)

One Kafka broker and one Zookeeper is not enough in a real production system, but it’s fine for development.

We will also configure all three parts to be accessible both from other docker containers and from your host machine.

With Confluent Kafka docker images, we do not need to write the configuration files manually. Instead, everything could be configured via environment variables, and we will store Kafka’s environment separately from the container configuration.

1. Zookeeper

Let’s start with Zookeeper. The most important options to provide to Zookeeper are this instance ID, client port, and the list of all servers in the cluster.

You can see we also apply some heap pressure — that’s more than enough for development environment.

Confluent’s Zookeeper image exports two volumes, data and log, separately; we will have to mount them to have persistence. Thus, the minimal Zookeeper service description will look like this:

Notice that we are very explicit with hostname, service name, and they shall match what’s in the Zookeeper configuration environment.

2. Kafka (broker)

Let’s configure the Kafka broker. We need, first of all, specify the broker ID, point it to previously configured Zookeeper, and configure the listeners and advertisers (where does the broker listen at and what it advertises when clients connect). Notice that we configure the broker co we can access it both via kafka-1:9092 from other docker containers and localhost:29092 from your host machine.

These are the only things you have to configure. However, we will add a few nice things.

Kafka can create the topics automatically when you first produce to the topic; that’s usually not the best choice for production, however, quite convenient in dev. In many situations, topic deletion is not desired, while, again, in dev it’s fine. So we will enable both topic auto-creation and deletion.

All further settings in the following gist are optional and quite opinionated.

The service description only adds zookeeper-1 dependency and a volume for Kafka logs (the data logs!)

3. Schema registry

The Schema registry is the easiest to configure. Here’s the environment:

and the service:

4. Running it all together

Finally, having all three .env files and the following compose file
https://gist.github.com/saabeilin/d2448234d030a93cdec5c1b499857b53
we can start the whole stack with just docker-compose up!

It can take some time to boot the stack and you’ll see quite a lot of output. When it’s finally up, we can try our new cluster!

In a second terminal, figure out the broker container ID via docker ps, and let’s execute a new shell in it, docker exec -ti <containerId> bash. Here we will execute the “verifiable producer”, that actually waits for the acks from the broker:

This will produce 1000 messages to topic test.test with 10 repeating keys. You will see the messages produced, their partition and offset, and finally, the resulting throughput.

You can now consume the messages with kafka-console-consumer or

After consuming all the messages, it will continue polling for new ones, so you can terminate with Ctrl+C (or start another producer in another shell)

Finally, let’s see how does our topic test.test look like. Remember, it was automatically created.

Running

will show you the three expected topics:

and you can investigate the details with the following command:

So, now you have a Kafka cluster on your workstation and can try writing producers/consumers/streaming applications in your favorite language.


In the following articles we will go through a few simple examples in Python and Java, and discuss running Kafka in production (for your hobby project with no high load so far)

Sergei Beilin

Written by

Solutions architect, software engineering consultant, entrepreneur, researcher, father.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade