Apache Kafka Guide #50 Kafka Connect: Connectors, Configurations, Tasks, Workers

Paul Ravvich
Apache Kafka At the Gates of Mastery
3 min readMay 21, 2024
Apache Kafka Guide #50 Kafka Connect Connectors, Configurations, Tasks, Workers

Hi, this is Paul, and welcome to the #50 part of my Apache Kafka guide. Today we will discuss Kafka Connect and what are Connectors, Configurations, Tasks, and Workers.

Kafka Connect — Hight Level

We’ve already touched on the basic principles and some theories of Kafka Connect. Now, we’ll delve deeper into its details, starting with an overview. In summary, Kafka Connect employs source connectors to fetch data from common data sources and sink connectors to send data back to these sources. We’ve previously discussed a list of these connectors. This system greatly simplifies the process for developers, including those with less experience, enabling them to efficiently transfer data into and out of Kafka. Kafka Connect is an essential component of your ETL (Extract, Transform, Load) pipeline, particularly for real-time operations and scalability. It excels in a company-wide pipeline due to its robust scalability in distributed mode. Moreover, the code is reusable, with the main difference between one connector and another being merely the configuration, which is a significant advantage.

  • Source Connectors to get data from Common Data Sources
  • Sink Connectors to publish that data in Common Data Stores
  • Make it easy for non-experienced developers to quickly put their data reliably into Kafka
  • Part of your ETL pipeline
  • Scaling made easy from small pipelines to company-wide pipelines
  • Re-usable code!

Kafka Connect — Concepts

Connectors

We’ll discuss concepts related to Kafka Connect. The Kafka Connect Cluster includes multiple connectors. Each connector is a reusable code component, essentially a Java jar file. When you compile and package your code in Java, you produce jar files. Thus, a connector is a Java class packaged into a jar. Many connectors are available in the open-source community; it’s beneficial to use them. For example, if you want to transfer data from Postgres to Kafka, you can search for “Kafka Connect Postgres Kafka” and find relevant resources.

Kafka Connect Cluster has multiple loaded Connectors

  • Each connector is a reusable piece of code (java jars)
  • Many connectors exist in the open-source world, leverage them!

Tasks

Connectors, when configured with user settings, generate tasks. A task directly relates to a connector’s configuration, and a job configuration can spawn multiple tasks. For instance, applying a single configuration to a connector might result in two or three tasks.

Connectors + User Configuration => Tasks

  • A task is linked to a connector configuration
  • A job configuration may spawn multiple tasks

Workers

These tasks are carried out by Kafka Connect Workers and servers. Recall from a previous presentation, that a Kafka Connect cluster had three workers. These workers are responsible for executing the tasks. Each worker is a single Java process running on a server. Workers can operate stand-alone or in a cluster, but we recommend using them in a cluster configuration for optimal performance.

Tasks are executed by Kafka Connect Workers (servers)

  • A worker is a single Java process
  • A worker can be standalone or in a cluster

Thank you for reading until the end. Before you go:

Paul Ravvich

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!