Apache Kafka — 101 & How It Comes to Picture
I did not understand the importance of Apache Kafka in large-scale systems when I investigated it with simple examples many years ago. I saw that it has only one feature, that is, running a command to “save” (or publish, or push actually) a message somewhere and running another command to “get” (or consume or pull actually) that message. I believed that I could implement it in 1h until I started playing with it in a real project at my company.
I don’t think I can provide a better explanation of Apache Kafka than its documentation and many articles on the Internet. This post is mainly for those who are in the same situation as me and my aim is to provide you some information that can hopefully change your view a bit. Obviously, I strongly believe that the best teacher for this case is a real project with a lot of issues/improvement tasks related to Apache Kafka.
Let me start with a light introduction about Kafka in terms of its concepts, related works, before putting it in the context of a large-scale system and telling you some issues that are challenging to solve. Just in case you are interesting in how Kafka is deployed in practice projects, you can check out my another story here:
It is a Pub-Sub Messaging System that allows messages to be “save”ed and “get”ed by other applications. It is said to be fast, scalable, and fault-tolerant but I don’t focus on those characteristics as it is hard to imagine until you face the related issues in a real system.
A Kafka system has 3 basic components: a broker, a producer, and a consumer. The broker is where the message is “saved” (or persisted, actually) and replicated. The producer pushes the message whereas the consumer pulls the message. Many concepts like Kafka zookeeper, Kafka topic, partition, offset, leader and replica can be found from its documentation.
You might also find other things like Kafka Producer API, Kafka Consumer API, Kafka Streams API, Kafka Connector API that allow Kafka cluster to interact with different applications. The following figure explains how the APIs are used.
How Kafka comes to the picture
There are many IT projects following microservice architecture. And as a matter of the fact, Kafka has been playing a vital role in enabling those micro-services of a system to collaborate with each other.
Let’s look at a simple example  (which is actually not simple in reality) explaining how Kafka comes to the picture. When you click Buy on a website to buy your smartphone, the following things then happen:
- An order request is sent to the service of the order
- The shipping service get notified and looks up the address to send
- Start the shipping process
Straightforwardly, it might look like
This flow can be built with an event-driven approach
If we look closely, the interaction between the orders service and the shipping service hasn’t changed all that much, other than that they communicate via events rather than calling each other directly. But the thing is: the orders service has no idea about the existence of the shipping service. I believe you can see the benefits of this approach now, especially in the context of large-scale complex systems.
But if you are still not sure, then just imagine that there is another service, say, repricing service which updates the price of goods in real-time based on supply and demand. I will let you do it by yourself to implement the repricing service with a request-driven approach. Using an event-driven model, it is just a service plugging into the Kafka system, sending out price updates when necessary.
You may notice that we left the query for the customer’s address as a direct call between Shipping service and Customer service. In fact, we can use Kafka to replicate customer data from the customer service to the local DB of the shipping service.
The system I am working on
This is just another example of the application of Kafka from the networking perspective. It is from the real project that I am working on. I hope that you won’t have the feeling like, “hey, it is just an illustrative example, hard to convince me of the importance of Kafka messaging bus” as you might have with the above example.
I am not going to the details of each component in the above figure. Just FYI, in the production environment, where each network can contain up to 2000 network elements with different configurations from different vendors, it is already challenging for the most simple use case: one collector collects data from one network for one application to use. Issues come every day, from our Git repository, the container platform where the local testing system is deployed, new workflows, new requirements for TDD, to the wrong collected data (since some guys managing Network 1 are doing some experiments), the malfunctions at many intermediate services (which are not shown in the figure). And after a couple of years, we are still far from our ultimate goal, which is to allow any application to efficiently use collected data from, humbly, 2 networks. And yes, I am talking about only 2 networks, not 3 as illustrated, not something like n networks which you probably saw in many academic articles.
When you face issues every day in a complex system, it is understandable to always look for a third-party solution that can handle as many problems as possible (of course, it requires some effort on managing, but at least, no as painful as other issues). That’s how Kafka messaging bus comes to the picture besides many other solutions.
That is all folks. This story just gives you an bit insight of how Kafka is adopted in real projects. But note that, only Apache Kafka does not represent the whole PubSub system. Different systems with different involved micro-services may have different designs for their Pubsub.
-  Ben Stopford, “Designing Event-Driven Systems: Concepts and Patterns for Streaming Services with Apache Kafka”