Apache Kafka Guide #48 Kafka Connect Introduction

Paul Ravvich
Apache Kafka At the Gates of Mastery
4 min readMay 14, 2024
Apache Kafka Guide #48 Kafka Connect Introduction

Hi, this is Paul, and welcome to the #48 part of my Apache Kafka guide. Today we will discuss what Kafka Connect is.

Introduction to Kafka Connect

In this lecture, we will explore Kafka Connect, including its historical context and its significance.

Brief History of Kafka Connect

Kafka Connect was first introduced as part of Kafka’s development timeline. Initially, in 2013, Kafka version 0.8 was released, which introduced several new features. These features included topic replication, improved competition, and significant simplifications to the producer client API.

Evolution Through Versions

Subsequently, Kafka version 0.9 was released in November 2015. This version brought a simplified, high-level consumer API that operated without a Zookeeper dependency. Additionally, it introduced enhanced security features such as encryption and authentication. This course marked the debut of the Kafka Connect APIs in their first version.

Further Developments

By May 2016, Kafka had progressed to version 0.10. This release was notable for the introduction of the Kafka Streams API. Additionally, the 10.1 update made further improvements to the Kafka Connect APIs and introduced the single message transform API.

Impact and Improvements

Since its introduction, Kafka Connect has been available for over a year and a half, as of the time of recording this video. Throughout this period, it has seen continuous improvements. These enhancements have facilitated many companies and programmers in developing robust applications, significantly leveraging the capabilities offered by Kafka Connect.

2013 Kafka 0.8.x:

  • Topic replication, Log compaction
  • Simplified producer client API

November 2015 Kafka 0.9.x:

  • Simplified high-level consumer APIs, without Zookeeper dependency
  • Added security (Encryption and Authentication)
  • Kafka Connect APIs

May 2016: Kafka 0.10.0:

  • Kafka Streams APIs

End 2016 — March 2017 Kafka 0.10.1, 0.10.2:

  • Improved Connect API, Single Message Transforms API

We’re going to connect to WikiLeaks’s streams, so it’s essential to understand the usage of Kafka in this context.

Primarily, Kafka is used for various common scenarios. If you have a source, you can push data to Kafka using the Producer API. For managing data within Kafka, if you want to create a new topic from an existing one, you use both the Consumer and Producer APIs. If your goal is to transfer data from a Kafka topic to a target store, the Consumer API comes into play. Similarly, if you want your application to consume data from Kafka, you would also use the Consumer API.

Now, where does Kafka Connect fit into this?

Kafka Connect uses the Source API. This API makes it easier to take a source and integrate all its data into Kafka. Additionally, Kafka Stream is used for performing transformations on Kafka topics. More crucially, Kafka Sink API facilitates the extraction of data from Kafka to various destinations.

For the past three decades, the use of the Consumer API has proven to be highly effective, so we continue to utilize it in this manner.

Kafka Connect plays a vital role by simplifying and enhancing the process of transferring data into and out of Kafka. We’ll explore numerous connectors that facilitate this integration, which I find particularly useful.

Kafka Connect

Kafka Connect solves problems for programmers like you and me. It enables us to import data from the same sources we have always wanted to access. Let me show you a list of these sources right here: Database’s JDBC, Couchbase, GoldenGate, Blockchain, Cassandra, DynamoDB, FTP, IoT, MongoDB, and many others, including Salesforce, SQS, Solr, Twitter, and more. The number of technologies hosting your source data is fairly limited; they aren’t infinite and can typically be grouped.

Similarly, we always want to store data in the same repositories. These include S3, Elasticsearch, HDFC, Hadoop clusters, JDBC databases, SAP HANA, HBase, Cassandra, DynamoDB, and so forth. These various storage options illustrate the diverse needs we cater to.

Kafka Connect has shown that achieving fault tolerance, proper distribution, and ordering can be quite challenging. However, many programmers have already created effective solutions for these issues, developing sources, sinks, producers, and consumers. Kafka Connect essentially provides a set of connectors. All the connectors I mentioned in this slide are existing tools that allow you, for example, to transfer data directly from your database to Kafka, and then from Kafka to Elasticsearch.

The essence of Kafka Connect is not to rewrite code that has already been written by others. Instead, it encourages you to use an existing connector and simply bring your configuration. This approach highlights the strength and utility of Kafka Connect.

Thank you for reading until the end. Before you go:

Paul Ravvich

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!