Image for post
Image for post

On micro-services architecture as well as on high availability (HA) infrastructures, there is a need to send and receive notifications. In this article, I’d like to cover what is Apache Kafka starting from typical use cases, what it is not and best Getting Started links I found.

Image for post
Image for post
Source: http://kafka.apache.org/documentation.html

A typical use case

  • Imagine you have a system with 2 servers A and B that need to communicate:
Image for post
Image for post

First, a basic REST API can do the job:

Image for post
Image for post

But what about if B becomes very slow to respond? Communication with REST API is synchronous. If B is slow, then A will be.

Image for post
Image for post

Hum, we can make the communication asynchronous with a pub/sub store like Redis:

Image for post
Image for post

But, what happens if B crashes? The message may be lost as only present/available subscribers receive messages when they arrive.

Image for post
Image for post

So maybe it is time to improve the availability (and the scalability) of B by adding a secondary B server:

Image for post
Image for post

But we have a problem, messages may be duplicated and may be treated twice. It may be not suitable on some cases. We may use a queue service instead of a pub/sub store:

Image for post
Image for post

We still have a problem if the message order matter (FIFO): you need to choose your queue store very carefully. For instance Amazon SQS does not guarantee FIFO. Beanstalkd seems to guarantee it. Thus, the complexity of B1 and B2 may increase if messages contain sequences.

So to summarize, here are the constraints on the message bus we have:

  • It should provide asynchronous communication (decouple processes)
  • It should be scalable to support the growth
  • It should be distributed to guarantee high availability (HA)
  • The FIFO (First In First Out) of messages guarantee or at least, a mechanism should exist to manage message order.
  • No messages should be lost on receivers down time (during a short period of time)
  • It should enable load balancing on receivers with 2 modes:
  1. All receivers receive all and same messages (mode pub/sub)
  2. Messages are distributed among receivers (mode queue)

Kafka respond to all this constraints. That why we assist today at the rise of Kafka, especially on micro-services architecture.

What is Kafka ?

Kafka is a distributed, partitioned, replicated commit log service.

Basic messaging terminology (Source):

Kafka maintains feeds of messages in categories called topics.

We’ll call processes that publish messages to a Kafka topic producers.

We’ll call processes that subscribe to topics and process the feed of published messages consumers.

Kafka is run as a cluster comprised of one or more servers each of which is called a broker.

In addition, a message has a key and a value.

Main concepts

  • A topic is divided in partitions.
  • The message order is only guarantee inside a partition
  • Consumer offsets are persisted by Kafka with a commit/auto-commit mechanism.
  • Consumers subscribes to topics
  • Consumers with different group-id receives all messages of the topics they subscribe. They consume the messages at their own speed.
  • Consumers sharing the same group-id will be assigned to one (or several) partition of the topics they subscribe. They only receive messages from their partitions. So a constraint appears here: the number of partitions in a topic gives the maximum number of parallel consumers.
  • The assignment of partitions to consumer can be automatic and performed by Kafka (through Zookeeper). If a consumer stops polling or is too slow, a process call “re-balancing” is performed and the partitions are re-assigned to other consumers.

Concept of partitions

Basically Kafka divides a topic in partitions. Each partition is an ordered, immutable sequence of messages that is continually appended to.

Image for post
Image for post
Source: http://kafka.apache.org/documentation.html

A message in a partition is identified by a sequence number called offset.

The FIFO is only guarantee inside a partition.

When a topic is created, the number of partitions should be given, for instance:

The producer can choose which partition will get the message or let Kafka decides for him based on a hash of the message key (recommended). So the message key is important and will be the used to ensure the message order.

Moreover, as the consumer will be assigned to one or several partition, the key will also “group” messages to a same consumer.

This feature is particularly important as it enables a lot of HA use cases where the message order is important.

Image for post
Image for post

What it is not ?

Kafka is not a true queue service or at least it doesn’t support fine grained control as in Amazon SQS:

  • No dead letter queues
  • No maximum number of delivery attempts

Kafka is also not like Redis:

  • The creation of a Kafka topic is an expensive process, which is not like creating a Redis channel
  • The number of topics (and partitions inside) may be limited.

Kafka is not designed for Windows. Even the development in JAVA with Kafka-client may have some restrictions on Windows. Some streaming features that includes usage of KTable use Rocksdb key/store as a native module via JNI. The native modules are only packaged for Linux and OSX.

Kafka is not really well available on NodeJS. There are 2 main modules:

And more promising but require Java installed:

Getting started

If you are new with Kafka, learn directly the 0.10.0 version. The API has completely changed since 0.8.2. The 0.10.0 version adds new streaming features (KStream, KTable). The 0.10.0 version has merge the concept of High Level Consumer and Simple Consumer.

Docker

The simplest way to start is to use docker with this docker-compose http://wurstmeister.github.io/kafka-docker/

Kafka requires Zookeeper and the docker-compose handles everything. You can even scale the number of nodes to simulate replication.

Kafka is packaged with a CLI tools (in $KAFKA_HOME/bin/) so you can try using the concept few seconds after running the docker.

JAVA

In your Java application, add the following MAVEN dependency:

Image for post
Image for post

If you want to use the streaming, in addition to previous line, add:

Image for post
Image for post

Links

Good links on Kafka in addition to the Apache web site.

Here is a list of companies that using Kafka

To go further with Kafka Streaming

Written by

Entrepreneur, creator of @screenpresso, Software architect at @Groupe_Renault. Passionate about tech, content, design, software & startups.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store