Apache Kafka Guide #49 Kafka Connect Architecture Design

Paul Ravvich
Apache Kafka At the Gates of Mastery
4 min readMay 16, 2024
Apache Kafka Guide #49 Kafka Connect Architecture Design

Hi, this is Paul, and welcome to the #49 part of my Apache Kafka guide. Today we will discuss Kafka Connect Architecture Design.

We’ve been discussing Kafka, Connect Source, and Kafka Connect sync. You might be questioning how these elements fit into your architecture and how they interact with your sources and Kafka. Before diving into the hands-on exercises, it’s crucial to visualize this setup.

To aid in understanding, I’ve prepared a slide that outlines the architecture of Kafka Connect. Here’s how it looks:

Imagine your Kafka cluster, which consists of multiple brokers. In this example, there are four brokers. Now, consider your source or multiple sources, which could be a database, MongoDB, Twitter, or any other data source you’re using.

Between your source(s) and your Kafka cluster lies your Connect cluster. We will explore what a Connect cluster comprises, but essentially, it includes several workers. These workers are equipped with a connector and a configuration. Initially, they pull data from your sources based on the logic either provided by someone else or developed by you. If you create your own custom connectors, the workers will then push this data to your Kafka cluster in a subsequent step. This completes the data integration process into Kafka.

You have your Connect cluster set up to retrieve data from various sources and then write it to your Kafka cluster.

In the third step, as we have discussed previously, you might want to modify the data. This includes performing transformations, aggregations, and joins, among other operations.

For this purpose, you use a Streams application or the Streams API.

Although this course does not cover it in detail, it is important to know that this is the role of the Streams API in the third step: it facilitates the modification and handling of data entering and leaving Kafka.

In the final phase, you’ll want to synchronize your data with your sinks. These sinks can be various types of storage solutions such as databases, Elasticsearch, S3, HDFS, or any other system you choose.

At this point, you might be wondering how to transfer data from Kafka to these sinks. Fortunately, the process is straightforward. You simply need to configure your connect cluster, select the appropriate connector, and set up your configuration. This setup will enable the system to automatically pull data from Kafka and transfer it to your chosen sink efficiently.

In the final stage, you’ll want to synchronize your data with your chosen storage solutions, which can vary from databases and Elasticsearch to S3 or HDFS — essentially, whatever you prefer.

So, you might be wondering how to transfer your data from Kafka to these storage solutions. The process is straightforward. You simply configure your Connect cluster along with your connector, and set up your system. This setup allows the system to retrieve data from Kafka and transfer it to your specified storage solution efficiently.

This method describes the typical architecture of Kafka Connect and Kafka Streams.

Summary

Okay, so just to summarize, data originates from your source and is processed through your Connect cluster, which consists of workers. These workers push the data into a Kafka cluster.

This involves a variety of streams; it could be Spark, Kafka Streams, or any other technology you choose. They transform the data into a format suitable for Kafka.

Then, once again, your Connect cluster retrieves the data from Kafka and writes it to the configured sinks.

We will explore how to set up a Connect cluster and configure connectors. We’ll learn how to extract data directly from sources into Kafka and, conversely, how to retrieve data from Kafka and direct it into the sinks of your choice, such as Elasticsearch and JDBC.

Thank you for reading until the end. Before you go:

Paul Ravvich

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!