Apache Kafka in Action

Understanding the role of different components in Apache Kafka

Shreyas Chaudhari
Aug 17 · 2 min read

What is Apache Kafka?

Apache Kafka is all about data. It is all about transferring large amounts of data in a reliable, rapid, and scalable way.

In the computing world, transferring data means messaging. Kafka is used for high-throughput use cases for moving large amounts of data in a scalable and fault-tolerant way.


Challenges and Limitations in Messaging

Messaging is a fairly simple paradigm for transfer of data between applications and data stores.

However, there are multiple challenges associated with it:

  • Limited scalability due to the broker becoming a bottleneck.
  • Strained message brokers due to bigger message size.
  • Consumers are in the position to consume the messages at a reasonable rate.
  • Consumers exhibiting non-fault tolerance by making sure that the messages consumed are not gone forever.

Messaging Limitations

Messaging limitations are due to:

Messaging applications are hosted on a single host or a node. Hence, there is a possibility of the broker becoming the bottleneck due to a single host or local storage.

Also, if the subscribers consume data slowly, or if there is no consumption of data, there is a possibility of the broker or publisher going down which may result in complete denial of service.

There is a possibility of a bug in the subscriber logic and data might not be processed correctly as a result.

It may result in data being poisoned or adulterated. Post the bug in the subscriber being fixed, there must be the capability to fetch the old data for processing. If the subscriber stashes the data, it will be helpful.

Reprocessing all the messages once the bug is fixed is also a task.

Different apps which act as a publisher-subscriber have custom logic to write to a broker. Each of them has different error handling. Hence, maintaining data consistency, in this case, will be difficult.


Solving Challenges

How does Kafka solve these challenges?

  • Provides high-throughput for large volumes of data which are in terabytes or beyond.
  • Is horizontally scalable and able to scale out by adding machines to seamlessly share the load.
  • Provides reliability where none of the data will be lost in case of failure.
  • Has publishers and consumers loosely coupled where they are only involved in data exchange.
  • It makes use of a pub-sub messaging semantic where independent applications send data on the topic and interested subscribers can consume data on the topic.

You can watch the video tutorial for:

  1. Setting up Apache Kafka cluster.
  2. Setting up the ZooKeeper and broker.
  3. Producing messages on the topic.
  4. Consuming the messages from the same topic.

The steps of the video tutorial are listed below:


Better Programming

Advice for programmers.

Shreyas Chaudhari

Written by

Lead Consultant @ ThoughtWorks | Twitter : shreyasc_tweets | Instagram : shreyasc_clicks

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade