Kafka: All you need to know

Apache Kafka’s popularity is exploding. Learn about what it is, and why it’s becoming a solution of big data and microservices applications

Maria Valcam
May 23, 2019 · 7 min read

What is Kafka

Records are published into topics. Topics are like channels where all the records are stored. Consumers subscribe to one or more topics to read its records.

Kafka has a retention period, so it will store your records for the time or size you specify and can be set by topic.

Consumers label themselves with a consumer group name. In a consumer group, one or more consumers work together to consume a topic. When a new record arrives to the topic, it will be sent just to one consumer instance in the consumer group.


To run Kafka, you need Zookeeper. Zookeeper is used for:

  • Controller election. The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions. When a node shuts down, it is the controller that tells other replicas to become partition leaders to replace the partition leaders on the node that is going away. Zookeeper is used to elect a controller, make sure there is only one and elect a new one it if it crashes.
  • Cluster membership. Zookeeper maintains a list of all the brokers that are functioning and are a part of the cluster at any given moment.
  • Topic configuration — which topics exist, how many partitions each has, where are the replicas, who is the preferred leader, what configuration overrides are set for each topic
  • Quotas . How much data is each client allowed to read and write
  • Access Control Lists (ACL) . Who is allowed to read and write to which topic (old high level consumer) — Which consumer groups exist, who are their members and what is the latest offset each group got from each partition.

Deep Dive

A partition is owned by a single broker in the cluster, and that broker is called the leader of the partition. A partition may be assigned to multiple brokers, which will result in the partition being replicated. This provides redundancy of records in the partition, such that another broker can take over leadership if there is a broker failure.

About consumer groups, In order to avoid two consumer instances within the same consumer groups reading the same record twice, each partition is tied to only one consumer process per consumer group.

Kafka follows the principle of a dumb broker and smart consumer. This means that Kafka does not keep track of what records are read by the consumer. By storing the offset of the last consumed record for each partition, either in Zookeeper or in Kafka itself, a consumer can stop and restart without losing its place.

Each record consists of a key, a value, and a timestamp. This key is assigned by Kafka when producers publish a record. Keys are used when records are to be written to partitions in a more controlled manner. The simplest such scheme is to generate a consistent hash of the key, and then select the partition number for that record by taking the result of the hash modulo, the total number of partitions in the topic. This assures that records with the same key are always written to the same partition.

Note: While records are opaque byte arrays to Kafka itself, it is recommended that additional structure, or schema, be imposed on the record content so that it can be easily understood. Typically, this schema can be JSON or Avro. It is also normal to add some versioning for when it changes.

Note on Kafka Streams

It is normally used for enrichment and transformation. For example, we could have a users topic and an enriched-users topic. So a UserEnricher service could read users events, add information about its company and college and publish the same events but with more information into our enriched-users topic.

Run it locally

$ docker network create kafka
$ docker run --network kafka -p 2181:2181 -p 9092:9092 --env ADVERTISED_HOST=`docker-machine ip \`docker-machine active\`` --env ADVERTISED_PORT=9092 spotify/kafka

To test that it is working, let’s create a topic and run a consumer and a producer. Kafka exposes an API for these actions. To make it easy, Kafka brings some script that we can use to easily do this. To use these scripts, we can run an image of Kafka in a container and tell this container to execute one of these scripts.

So to create a topic, we can run a container with bitnami/kafka:latest image in interactive mode (-it) using the same network that our Kafka and Zookeeper containers are using ( — network kafka). Let’s check what is the IP of our container:

$ docker network inspect kafka
# Check IPv4Address, for me it shows

And now run kafka-topics.sh script:

docker run -it --network kafka bitnami/kafka:latest kafka-topics.sh --create --zookeeper --replication-factor 1 --partitions 1 --topic amazingtopic

Now let’s check our topic was created properly:

docker run -it --network kafka bitnami/kafka:latest kafka-topics.sh --zookeeper --describe --topic amazingtopic

And now let’s do the cool stuff: create a Consumer and a Producer. Start creating a consumer by typing:

docker run -it --network kafka bitnami/kafka:latest kafka-console-consumer.sh --bootstrap-server --topic amazingtopic --from-beginning

Now open a new terminal and create a producer:

docker run -it --network kafka bitnami/kafka:latest kafka-console-producer.sh --broker-list --topic amazingtopic

When you run this producer, it lets you type words and it publishes a record to Kafka once you click enter. Try to create some records and see how they arrive to the consumer :D

Note: if you kill the consumer, then produce some more records into our amazingtopic and finally run a new consumer, you will see that all the records appear again so there is no lost of information while the consumer was down.


Its interface is really simple, if you access to localhost:9000 using your browser, its first page will load and you will see a list of all Kafka cluster that it is connected to.

If you click in one of the clusters, a new page with the summary of it will appear and also, you will see several options on the top menu:

  • Brokers. In the brokers main page, you can see some basic info for the brokers (like bytes in and bytes out). If you click on one of the brokers ids, you will see information about records and topics handle by this specific broker.
  • Topic. Then go to List, you will be redirected to a topics page. Here you can see the number of partitions and replicas for every topic. You can also click on the topic name to find much more information about it.
  • Consumers. You will get a list of consumers. Click on one of the consumers name to see the list of topics it is reading from and click on a topic name to see information like % of partitions assigned to a consumer instance, total lag and consumer offset.

Thanks for reading!

If you want to see code examples of consumers or producers for Kafka, please let me know in a comment below or send a message to my twitter account @Marvalcam1

Beamery Hacking Talent

Beamery .. where we engineer, design and produce talent

Maria Valcam

Written by

Engineer studying an MBA. I am interested in Business, Sexology, and Engineering. Website: https://marvalero.github.io/Oohmyclit/english

Beamery Hacking Talent

Beamery .. where we engineer, design and produce talent

Maria Valcam

Written by

Engineer studying an MBA. I am interested in Business, Sexology, and Engineering. Website: https://marvalero.github.io/Oohmyclit/english

Beamery Hacking Talent

Beamery .. where we engineer, design and produce talent

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store