Kafka hands-on. Part I: development environment
The simplest way to get your Kafka cluster up and running on your workstation
The best way to learn new technology is to start working with it, right? So let’s first of all get Apache Kafka running on your workstation. You’ll need Docker, docker-compose, some disk space, and internet connection. We will use Kafka Docker images from https://www.confluent.io/.
Kafka is not for absolute beginners, so I would expect you to have experience with Python and/or Java, Docker and docker-compose, and command line. You have, as well, read some introductory articles, for example https://kafka.apache.org/intro or https://hackernoon.com/thorough-introduction-to-apache-kafka-6fbf2989bbc1
We chose docker-compose (instead of, for example, Confluent CLI tools), because most of modern developers are already comfortable with Docker and many are using docker-compose on a daily basis; this will enable you to easily include Kafka cluster configuration in your existing project, and finally, as we will see in one of the following articles, you can deploy a cluster to production running docker swarm with minimal changes.
Our development cluster will consist of:
- one Kafka broker,
- one Zookeeper instance,
- one schema registry (as we will use Avro later)
One Kafka broker and one Zookeeper is not enough in a real production system, but it’s fine for development.
We will also configure all three parts to be accessible both from other docker containers and from your host machine.
With Confluent Kafka docker images, we do not need to write the configuration files manually. Instead, everything could be configured via environment variables, and we will store Kafka’s environment separately from the container configuration.
Let’s start with Zookeeper. The most important options to provide to Zookeeper are this instance ID, client port, and the list of all servers in the cluster.
You can see we also apply some heap pressure — that’s more than enough for development environment.
Confluent’s Zookeeper image exports two volumes,
log, separately; we will have to mount them to have persistence. Thus, the minimal Zookeeper service description will look like this:
Notice that we are very explicit with hostname, service name, and they shall match what’s in the Zookeeper configuration environment.
2. Kafka (broker)
Let’s configure the Kafka broker. We need, first of all, specify the broker ID, point it to previously configured Zookeeper, and configure the listeners and advertisers (where does the broker listen at and what it advertises when clients connect). Notice that we configure the broker co we can access it both via
kafka-1:9092 from other docker containers and
localhost:29092 from your host machine.
These are the only things you have to configure. However, we will add a few nice things.
Kafka can create the topics automatically when you first produce to the topic; that’s usually not the best choice for production, however, quite convenient in dev. In many situations, topic deletion is not desired, while, again, in dev it’s fine. So we will enable both topic auto-creation and deletion.
All further settings in the following gist are optional and quite opinionated.
The service description only adds
zookeeper-1 dependency and a volume for Kafka logs (the data logs!)
3. Schema registry
The Schema registry is the easiest to configure. Here’s the environment:
and the service:
4. Running it all together
Finally, having all three
.env files and the following compose file
we can start the whole stack with just
It can take some time to boot the stack and you’ll see quite a lot of output. When it’s finally up, we can try our new cluster!
In a second terminal, figure out the broker container ID via
docker ps, and let’s execute a new shell in it,
docker exec -ti <containerId> bash. Here we will execute the “verifiable producer”, that actually waits for the acks from the broker:
kafka-verifiable-producer --broker-list kafka-1:9092 --max-messages 1000 --repeating-keys 10 --topic test.test
This will produce 1000 messages to topic
test.test with 10 repeating keys. You will see the messages produced, their partition and offset, and finally, the resulting throughput.
You can now consume the messages with
kafka-verifiable-consumer --broker-list kafka-1:9092 --topic test.test --group-id test-1
After consuming all the messages, it will continue polling for new ones, so you can terminate with
Ctrl+C (or start another producer in another shell)
Finally, let’s see how does our topic
test.test look like. Remember, it was automatically created.
kafka-topics --zookeeper zookeeper-1:2181 --list
will show you the three expected topics:
and you can investigate the details with the following command:
kafka-topics --zookeeper zookeeper-1:2181 --describe --topic test.test
So, now you have a Kafka cluster on your workstation and can try writing producers/consumers/streaming applications in your favorite language.
In the following articles we will go through a few simple examples in Python and Java, and discuss running Kafka in production (for your hobby project with no high load so far)