7 (More) Free Awesome Apache Kafka Resources for Beginners — 2024

Vanessa Wang
Confluent
Published in
7 min readMay 22, 2024

co-authored with Peter Moskovits

If you haven’t read it yet, be sure to check out our Top 7 Free Apache Kafka Learning Resources for Beginners in 2023 blog post.

In the second part of our series, we’ve compiled a list of resources that dive deeper into the world of Apache Kafka®, which we found helpful after building a foundation through the resources we shared in part one. This time, we’re focusing on material that takes us into essential and more advanced Kafka fundamentals. Understanding concepts like setting up schemas and using Kafka Connect will help you build durable and scalable data streaming applications, and it’s important to understand these early on, so that you set up your application for success. Later, we’ll check out ksqlDB and learn more concepts that are essential to building an application, including using the CLI and addressing common issues like maintaining the order of messages, preventing message duplication, and handling multiple event types in a topic.

As always, we found that the best way to learn was to watch the videos, then go through the hands-on exercises — we encourage you to do the same.

Seeing the keen interest in educational content on Kafka, we’ve also created a GitHub repo with an extensive list (60+ and counting) of curated Kafka learning resources. This collection encompasses a wide range of materials organized by and suited to different learning styles and skill levels. From in-depth guides and documentation to interactive courses and tutorials, all the way to articles and books, we’ve gathered resources to cater to a broad range of learners. If you have a resource that could benefit others in their understanding of Apache Kafka, we encourage you to contribute; information is in the repo on how to do so.

Ready to take your Kafka knowledge to the next level? Let’s get right to it!

The list of resources (at a glance):

  1. Schema Registry 101 (12 video modules)
  2. Kafka Connect 101 (15 video modules)
  3. ksqlDB 101 (29 video modules)
  4. What is the simplest way to write messages to and read messages from Kafka using CLI?
  5. How can I maintain the order of messages and prevent message duplication in a Kafka topic partition?
  6. How can I have multiple event types in a topic and maintain topic-name subject constraints?
  7. How do I get started building my first Kafka Streams application?

1. Schema Registry 101

YouTube Playlist (12) | Hands-on exercises with videos & written instructions (4)

Once you’ve learned how to build and deploy your first Kafka application, a natural next step is learning about the importance of schemas. Implementing schemas over your data is essential for any enduring event streaming system, particularly ones that share data between different microservices or teams. Schemas enforce the implied contract between applications that produce your data and downstream applications that consume your data. In this engaging course with Danica Fine, you can learn how to use schemas to establish this contract. The course includes several hands-on exercises that let you experience working with schemas.

Screenshot of a person presenting a hand-on exercise for Schema Registry in Apache Kafka

2. Kafka Connect 101

YouTube Playlist (15) | Hands-on exercises with videos & written instructions (5)

Kafka Connect is a free, open-source component of Apache Kafka that serves as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems.1 Take this video class to learn about the architecture, deployment modes, and differences between fully managed and self-managed connectors. Additionally, you can also learn how to run a Connect cluster in Docker and manage connector instances using Confluent Connect API, Confluent CLI, and Kafka Connect REST API, enabling you to focus on application development and less on the tools.

A diagram showing Confluent Cloud with a Kafka cluster inside of it, connecting bidrectionally with a self-managed connect cluster that has two workers, each with a connector. The worker on the left has source systems with an arrow pointing to its connector. The worker on the right has an arrow pointing from its connector to target systems. the self-managed connect cluster has a bidirectional connection with a second Kafka cluster.

From Confluent’s documentation on Kafka Connect, which is also super helpful as you navigate this component.

3. ksqlDB 101

YouTube Playlist (29) | Hands-on exercises with videos & written instructions (11)

ksqlDB has emerged as part of the Apache Kafka ecosystem as the best way to process streams of events. This is a great course to take after you’ve learned Kafka Connect, as ksqlDB interacts with Kafka Connect, and you’ll be able to integrate your ksqlDB applications with Kafka Connect to use the same SQL syntax. If you’ve taken Apache Kafka 101 (which we suggested in part 1 of this blog series), you were introduced to ksqLDB ( Apache Kafka 101 > ksqlDB video and the accompanying ksqlDB hands-on exercise.. ksqlDB 101 gives a complete overview of ksqlDB, how it works, and how to use it — and in fact is hands down the best comprehensive resource on ksqlDB. Topics covered in this course include how to filter and transform data streams, how to query tables and streams with pull and push queries, and how to understand the difference between a stream and a table. You can also learn common utility functions like flattening out nested records, changing serialization formats, and merging and splitting streams.

Try this out yourself by following the step-by-step exercises.

4. What is the simplest way to write messages to and read messages from Kafka Using CLI?

Tutorial — Hosted Cloud Environment | Tutorial — Basic Kafka

Building on your foundational knowledge of Kafka Connect, ksqlDB, and the critical role of the Schema Registry, our next tutorial is a logical and essential progression. This tutorial is specifically designed for you if you’re looking to deepen your understanding of Kafka without the complexity of writing any code. By focusing on using the command line interface (CLI), you will gain practical skills in producing and consuming messages. This skill comes also handy when troubleshooting, performing quick data manipulation, and understanding the basics of real-time message flow.

screenshot of a producer with sample code “confluent kafka topic produce orderes — parse-key” and a consumer with sample code “confluent kafka topic consume orders — print-key — delimiter “-” — from-beginning”.

5. How can I maintain the order of messages and prevent message duplication in a Kafka topic partition?

Tutorial — Basic Kafka

Diving deeper into the intricacies of Apache Kafka, this tutorial addresses a key concern in managing data stream: ensuring message order and avoiding duplication. If data integrity and sequencing are critical to your application, this tutorial is for you. By enabling idempotency in your Kafka producer, behind the scenes, a unique ID is assigned to each producer, and sequence IDs are tagged onto each message. Understanding and implementing this technique is crucial for maintaining the integrity and reliability of your streams, ensuring that your Kafka applications run reliably. This tutorial is a great resource for anyone looking to enhance their Kafka expertise, particularly in scenarios where data consistency and order are critical.

Screenshot that says Short Answer. “Set the ProducerConfig configuration examples relevant to the idempotent producer: enable.idempotence=true acks=all”

6. How can I have multiple event types in a topic and maintain topic-name subject constraints?

Tutorial — Hosted Cloud Environment | Tutorial — Basic Kafka

You have distinct but related event types and you want to produce them to the same topic, but you also want to maintain topic-name subject constraints. Why produce different events to the same topic? One reason would be you have low-traffic topics and you’d like to consolidate them to reduce overhead for the brokers. Or you need to get the exact order of different events and by producing them to the same topic you are guaranteed correct ordering per-partition.

To do multiple events with topic-name constraints you’ll need to use schema references, which is a schema that contains a field representing an object which is reference to another schema. Follow along this tutorial to learn about using schema references with both Protobuf and Avro.

Screenshot from the tutorial showing an event application

7. How do I get started building my first Kafka Streams application?

Tutorial — Hosted Cloud Environment | Tutorial — Basic Kafka

Embarking on your journey with Kafka Streams can be an exciting yet daunting task. This tutorial is crafted to set you on the right path. It is ideal for beginners, providing a straightforward, step-by-step guide to building a basic stream processing application. By the end of this step-by-step guide, you will have not only developed your first Kafka Streams application but also gained the skills needed to delve even deeper into real-time data processing applications.

You’d like to get started with Kafka Streams, but you’re not sure where to start. In this tutorial, you’ll build a small stream processing application and produce some sample data to test it. After you complete this tutorial, you can go more in-depth in the Kafka Streams 101 course.

screenshot of a stream processing topology defining business logic from the tutorial

We hope these resources help you continue your journey with Apache Kafka. Don’t forget to check out Confluent’s community Slack and forum to get help if you run into any issues while learning and building, as well as to meet others who are learning Kafka.

As always, your feedback is encouraged. Did you find these resources helpful? Are there other topics you’d like to see included? Let us know!

The views expressed in this article are those of the author and do not necessarily reflect the position of Confluent.

--

--

Vanessa Wang
Confluent

Developer Education at Confluent. Author. Cat caretaker.