Nerd For Tech
Published in

Nerd For Tech

Kafka Connect : Quick Start

In this article, we’ll see Kafka Connect components, why it is needed and how it helps to integrate Kafka with different systems.

Note: new to Kafka ?? check “Apache Kafka: Quick start

Kafka Connect

Topics Covered:

  • Kafka Connect
  • Connector
  • Task
  • Worker
  • Convertor
  • Transform

What is Kafka Connect?

Kafka Connect is an opensource component of Apache Kafka and provides scalable and reliable way to transfer data from Kafka to other data systems like databases, filesystems, key-value stores and search indexes. It uses Connectors that moves large data sets into and out of Kafka.

Kafka Connect uses Producers and Consumers as a building blocks and provide higher level functionality.

Need for Kafka Connect?

Apache Kafka is widely used in event driven microservice architecture, CDC (Change Detection Capture) requirements, log aggregation.

  • Source Connector:

It’s responsible for pulling the data from the source system & publish it to Kafka Cluster. Source connector internally use Kafka Producer API to achieve this.

  • Sink Connector:

It’s responsible for consume the data from Kafka cluster and sync it to target systems. Sink connector internally use Kafka Consumer API to achieve this.

Kafka Connect use cases?

  • Streaming pipelines from source to target system.
  • Writing data from application to data stores from Kafka.
  • Processing data from legacy application to new systems from Kafka.

Kafka Connect Components:

Kafka Connect — deep dive

Connector:

Connectors defines where the data should be copied to and from. Each Connector instance is a logic job that is responsible for managing the movement of data between Kafka and external system.

Confluent offers pre-built connectors, which helps to integrate with Kafka. We can also write new connector from scratch as per the requirement

Task:

Tasks plays a major role in the data model for Kafka Connect. Each connector instance has set of tasks that actually copy the data. We have two types of tasks — Source tasks and Sink tasks.

Source tasks contains code to get the data from the source system and it uses Kafka producer to push the data to a Kafka topic.

Sink task uses Kafka consumer to poll the data from a Kafka topic and it contains code to put the data into sink system.

Kafka connect provides built-in support for scalable and parallel data copying by breaking single job into many tasks with minimal configuration.

Transforms:

It plays important role during SMT — Single Message Transforms. Connectors can be configured with Transformations to make simple modifications to individual messages. We can also configure multiple transformations in connector configuration.

A transform is a function which accepts one record as input and returns the modified record. Kafka connect perform simple common transforms, we can also write our own transform by implementing Transformation interface.

Converter:

Converters are responsible for serializing and de-serializing the data.

  • When Kafka Connect as a source, Converter serializes the data received from Connector (or) Transform and push the serialized data into Kafka cluster.
  • When Kafka Connect as a Sink, Converter de-serializes the data read from Kafka cluster and send it to Transform (or) Connector.

Common converters:

  • String: org.apache.kafka.connect.storage.StringConverter
  • JSON: org.apache.kafka.connect.json.JsonConverter
  • JSON Schema: io.confluent.connect.json.JsonSchemaConverter
  • Avro: io.confluent.connect.avro.AvroConverter
  • Protobuf: io.confluent.connect.protobuf.ProtobufConverter
  • ByteArray: org.apache.kafka.connect.converters.ByteArrayConverter

Worker:

Connector and Tasks are the logical unit of work, must be scheduled to execute in a process. Kafka connect calls these processes workers and have two modes for running workers in Kafka Connect — standalone mode and distributed mode. We can choose modes as per our requirement.

  • Standalone workers:

As name indicates, its suitable for development & testing Kafka connect in local machines. Its suitable for light-weight, single-agent environments (example — Sending server logs to Kafka).

  • Distributed workers:

Distributed mode brings following benefits:

  1. Its recommended for production environments because of high availability and scalability — We can add more nodes or remove nodes to evolve as per requirement.
  2. Kafka Connect is more fault tolerant — If a node dies/leaves cluster for some reason Kafka Connect automatically distributes the work load of that node to other nodes in the cluster. If a new node joins, again work will be redistributed within the cluster.
  3. Distributed mode runs Connect workers on multiple nodes which in turn form Connect cluster. Kafka connect also distributes running connectors across the cluster.

Summary:

In this article, we went through what Kafka Connect brings for us and how it helps to integrate external systems with Kafka connectors.

We also saw various Kafka Connect components and what each component is responsible for, while reading and writing data from/to Kafka cluster.

References:

--

--

--

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Recommended from Medium

My journey becoming a Unity game developer: Security Cameras-Animate security cameras

What is App Modernisation?

Louvre Museum — Traditional and Modern in one picture

Deploying and managing ML services using Titan

Once upon a time, I didn’t know there was a community of software Developers in Nigeria not to talk…

Dockerizing with Distroless

Run macOS Big Sur on Windows

Fix cast tab from Chrome to TV freeze for Windows

Microsoft .NET challenge France 2020 final

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Shashir

Shashir

Middleware Chapter Lead & Java Developer — Java/Kotlin/Spring/Microservices/Kafka/Kubernetes(K8S) https://www.linkedin.com/in/shashi999

More from Medium

Healthy Kafka Cluster — Zero Under Replicated Partition

OpenTracing with Jaeger (Auto Instrumentation)

Control liquibase migration scripts on different environments.

How to Check the Health of a Spring boot Application using Spring Actuator