Introduction to Apache Kafka and some examples with Golang.

Kittapa S
Touch Technologies
Published in
8 min readJul 7, 2021

What is Apache Kafka?

Apache Kafka is an open-source framework for storing, reading, and analyzing streaming data. Its advantages are its speed, scalability and durability; therefore, it is often used in real-time data streaming architectures.

Kafka is designed to store topics, which are the ordered collections of events that are stored in a durable way. In this case, events can be anything that has happened and you want to record it into the database. In Kafka, the topics are written to disks and are replicated. That is why there would not be failure that could make the data disappear. Furthermore, topics can be stored and kept for anytime duration, from a few hours to years, and the topic can store either small or large events.

How does Kafka work?

The Kafka system consists of servers and clients that communicate through Transmission Control Protocol (TCP). This protocol defines how to establish and maintain a network conversation. The Kafka server guarantees that a single TCP connection can processed a request and a response.

Some of the servers are called the brokers. Kafka runs as clusters of one or more servers. If one of the servers crashes, the other server will take over, therefore it prevents potential discontinued operations or data loss.

The Kafka library is available in many high level languages like Go, Python, C\C++, as well as REST APIs.

To manage clusters, Kafka uses ZooKeeper. ZooKeeper does the health check for Kafka and sends changes whenever a new broker joins, dies or topic was removed or added to Kafka. Henceforth, Kafka services cannot be used without installing Zookeeper.

Kafka’s main concepts

As mentioned earlier, events are the records of anything in the world. Simply, it can be called records or messages. When writing or reading the data, Kafka does it in the form of events. An event usually contains a key, value, timestamp, and optional metadata headers. For example, the key could be your name, the value might be your shopping bill, and the timestamp of your purchase.

The clients that publish or write events to Kafka are called producers. On the other hand, the clients that subscribe, read, or process the events are known as consumers. Producers in Kafka system never have to wait for consumers to subscribe to the events; the events are always available for publishing.

We can compare the topics to folders in a filesystem and the events are the files in those folders. For instance, the topic name could be “transactions” and the events are you r purchases. There can be zero or any number of producers or consumers that produce or consume the event in each topic.

Furthermore, the topics are divided into partitions, located on different Kafka brokers. When a new event is published to the existing topic, the event will be appended to the topic’s partitions.

Kafka cluster composes of multiple brokers and topics, which can have more than one replication-factor. If a broker is down, another one will serve the data of the topics. This is the general idea why Kafka offers strong durability and no data loss will be seen.

Why use Kafka?

If you are new to the messaging system, getting to know Apache Kafka will not be as difficult as you think since Kafka is easy to set up, understand, and work with.

Kafka provides excellent performances: stable, reliable durability, flexible, has publish-subscribe/queue, robust replication, consistency, and preserved orders.

One of the most important characteristics of Kafka is that it is so fast when transferring the data around. This is because Kafka depends on the OS kernel. Thus, Kafka is used for stream processing, website activity tracking, messages replaying, real-time analytics, and many more.

Kafka’s Architecture in microservices

Kafka-centric microservices architecture refers to microservices communication with each other using Kafka as the medium. Kafka becomes useful because it helps solve the problems of holding older messaging queues back.

Nowadays, the trends are leaning toward Microservices architecture, writing small programs that can talk to each other instead of Monolithic architecture. The programs can do so by talking through Kafka topics. Each of the topics can consume the message from Kafka topics and may produce it to another Kafka topics for the other service to consume the data.

Kafka is a publish-subscribe model that handles the writing and reading records. It is a communication method where the receiver chooses the events that the senders sent, asynchronously.

Additionally, Kafka has its own API called Kafka streams. It is a Java API that handles all the framework and infrastructure. With Kafka service, you can group, filter, aggregate, and enrich your data. Moreover, you can use Kafka Connect for data integration.

Let’s get started with Kafka using Golang!

You can clone https://github.com/proudkittapa/kafkaExample to follow along.

You will need docker-compose for this example. First, I created a file called docker-compose.yml. There will be two services: Kafka and Zookeeper

version: '3'
services:
zookeeper_medium:
image: confluentinc/cp-zookeeper:6.1.0
ports:
- "2181:2181"
restart: always
environment:
ZOOKEEPER_SERVER_ID: "1"
ZOOKEEPER_CLIENT_PORT: "2181"
ZOOKEEPER_TICK_TIME: "2000"

kafka_medium:
image: confluentinc/cp-kafka:6.1.0
depends_on:
- "zookeeper_medium"
ports:
- "9094:9094"
- "9092:9092"
restart: always
environment:
ALLOW_PLAINTEXT_LISTENER: "yes"
KAFKA_ZOOKEEPER_CONNECT: "zookeeper_medium:2181"
KAFKA_ADVERTISED_LISTENERS: "LISTENER_INTERNAL://:9092,LISTENER_LOCAL://localhost:9094"
KAFKA_LISTENERS: "LISTENER_INTERNAL://:9092,LISTENER_LOCAL://:9094"
KAFKA_INTER_BROKER_LISTENER_NAME: "LISTENER_INTERNAL"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "LISTENER_INTERNAL:PLAINTEXT,LISTENER_LOCAL:PLAINTEXT"
KAFKA_DELETE_TOPIC_ENABLE: "true"
KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: "true"
KAFKA_NUM_NETWORK_THREADS: "8"
KAFKA_NUM_IO_THREADS: "16"
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: "1"

Next, you have to compose the following file by typing down “docker-compose up -d” or “docker-compose up” in terminal. Wait for a minute until Kafka is fully up.

> docker-compose up -d

Then, create a Go file that will act like a producer. I named the file “app1.go”. This file is suppose to create a new broker and send the message to Kafka in the topic called “my-topic”.

package main

import (
"context"
"github.com/touchtechnologies-product/message-broker"
"github.com/touchtechnologies-product/message-broker/common"
)
func main(){
conf := &common.Config{
BackOffTime: 2,
MaximumRetry: 3,
Version: "2.6.1",
Group: "my-group",
Host: []string{"localhost:9094"},
Debug: true,
}
broker, err := message.NewBroker(common.KafkaBrokerType,conf)
if err != nil{
panic(err)
}
go broker.Start(func(ctx context.Context, err error) {})
msg := []byte("Hi App2 from App1")
err = broker.SendTopicMessage("my-topic", msg)


}

Run app1.go. You can use the following commad.

> go run app1.go
After running the file, you will see this.

Yes, you will not see any messages here except the logs from Kafka because you are just producing the message not receiving it.

So, to see that the message actually got produced to Kafka, I will use an application called Conduktor. It is an all-in-one friendly interface for working with Apache Kafka.

You can use this link to download: https://www.conduktor.io/download/

After you are done installing Conduktor, follow the steps below to see if the message are sent to Kafka in the topic of “my-topic”

Click on the “New Kafka Cluster” button.
You will see the following page. Fill in theses three boxes and press the save button.
The new cluster will be added to the top left. Now, double click there.
This page will appear. Then press the button “consumer”.
Click on “my-topic” then click “start”.
You will see this page waiting for the data.

Next, try running app1.go again.

Yay! Now you see that the message are sent to Kafka in the topic “my-topic”.

Let’s write another file called “app2.go”. This file will read the message from “my-topic”; it will act like the consumer.

package main

import (
"context"
"fmt"
"github.com/touchtechnologies-product/message-broker"
"github.com/touchtechnologies-product/message-broker/common"
"time"
)
func main(){
conf := &common.Config{
BackOffTime: 2,
MaximumRetry: 3,
Version: "2.6.1",
Group: "my-group",
Host: []string{"localhost:9094"},
Debug: true,
}
broker, err := message.NewBroker(common.KafkaBrokerType,conf)
if err != nil{
panic(err)
}
handler := func(ctx context.Context, msg []byte) {
fmt.Println(string(msg))
}

broker.RegisterHandler("my-topic", handler)
go broker.Start(func(ctx context.Context, err error) {})
time.Sleep(10 * time.Second)
}

To see that app2 consume the message from app1, you have to first run app2, then run app1.

> go run app2.go
> go run app1.go
Look at the last line!!! Ps. You have to wait till the second last line to be printed out first then run the app1!

Conclusion

Unlike the traditional messaging system, Kafka provide you with messages ordering guarantee. The messages are sent by producer to a topic partition and the consumer see the message in the order of their logs. Big companies like LinkedLn, Grab, Pinterest, LINE, Agoda, Twitter, and many more use Kafka. So there is no reason for you to look over Kafka!

Lastly, I hope this article gives you a simple concept of how Kafka works or why and when you should use Apache Kafka. If I made any mistakes in the article, please let me know. Thank you for reading till here =).

Touch Technologies

“ เราไม่ได้ถูกต้องที่สุด แต่เราแสดงสิ่งที่เราทำ ”

Here are the list resources I used:

https://dzone.com/articles/what-is-kafka

https://www.indellient.com/blog/intro-to-apache-kafka-a-stream-processing-software-platform/

https://docs.confluent.io/platform/current/clients/index.html

--

--