Learning Kafka for Free on Confluent Cloud: An introductory Guide

Pradosh K

Published in

DataUniverse

7 min readJul 3, 2024

https://docs.confluent.io/_images/kafka-intro.png

This is a tutorial on getting started with kafka on Confluent Cloud.

Before we delve into the blog What is your understanding on kafka ?

Apache Kafka is a distributed streaming platform developed by the Apache Software Foundation. It is designed for building real-time data pipelines and streaming applications. Kafka’s architecture is based on a distributed commit log, which enables it to handle high-throughput, fault-tolerant, and low-latency data streams.

Key components of Kafka include:

Producers: These are the entities that publish data to Kafka topics.
Consumers: These are the entities that subscribe to topics and process the data.
Topics: These are categories or feed names to which records are sent by producers and from which records are received by consumers.
Partitions: Each topic is split into partitions, allowing Kafka to parallelize data processing and scale horizontally.
Brokers: These are the servers that form the Kafka cluster, each responsible for handling data and requests.

In this tutorial , we would focus on primarily doing things . We will use Confluent Cloud so we don’t have to run your own Kafka cluster and we can focus on the client development. We will also build Python client applications which produce and consume messages from an Apache Kafka cluster.

What is Cofluent ? How is it different from kafka?

Confluent Cloud is the industry’s only fully managed, cloud-native data streaming platform powered by Apache Kafka and Apache Flink. Apache Kafka is an open source, distributed streaming platform that enables 100,000+ organizations globally to build event-driven applications at scale.

Confluent Kafka is a commercial release of Apache Kafka that includes additional enterprise capabilities like multi-datacenter replication, schema management, and security enhancements. However, building real-time data pipelines and streaming applications can all be done using Apache Kafka.

Let’s set the objective

Create an account in Confluent
Create an kafka cluster
Create topic and message
Generate API keys for external application to connect to the cluster
Create a source connector to produce message
Produce and Consume messages programmatically

Let’s sign up https://www.confluent.io/get-started/

Fill in these details to create an account on confluent . Choose

Cloud service provider of your choice from Azure , AWS or GCP
Region name
Kafka Cluster Name

Filling out payment information is optional, but in case you don’t want service disruption and want to build some serious work on this cluster, then you may choose to fill up payment information . Or else skip this step

Voila Kafka is ready for you to serve

Post signing up, you would get 400$ free credit to learn and practice .

Copy the bootstrap server address, this you would need while configuring your producer and consumer

You can find that from Cluster Settings

Kepp this information with you

Endpoints
Bootstrap server 
pkc-p11ssdsxxxm.us-east-1.aws.confluent.cloud:9092

REST endpoint 
https://pkc-p11ssxxxm.us-east-1.aws.confluent.cloud:443

Then we are ready to create a topic

Create Topic

Provide a name of your topic, number of partitions and retention period

Now it’s time to create a message

Produce Message

Hit Produce new message

Here’s our message got created

[
 {
  "partition": 4,
  "offset": 0,
  "timestamp": 1720032949818,
  "timestampType": "CREATE_TIME",
  "key": 18,
  "value": {
   "ordertime": 1497014222380,
   "orderid": 18,
   "itemid": "Item_184",
   "address": {
    "city": "Bengaluru",
    "state": "Karnataka",
    "zipcode": 560067
   }
  },
  "headers": [],
  "exceededFields": null
 }
]

Create API keys

This is basically credential which is needed for a producer or consumer to access your kafka server

Download post creating it

Produce Sample Data using connector

Create sample data using this connector

Topic Selection

Kafka Credential

Now select the topic to which we want to send the data to .

Fo Datagen Source connector to write, this connector needs to authenticate itself using the API keys which were created earlier

Set Configuration

Sizing

Review and Launch

Let’s click messages tab

This above tutorial taught us on how to setup a connector to produce message and write it to the topic and on how to create a topic.

Now it’s time to do write some program .

You can also produce the message and write it to a topic and consume it programmatically

First you need to install confluent kafka library

mkdir kafka-python-getting-started && cd kafka-python-getting-started

#activate virtual environment

(venv) C:\personalwork\sparkproject\kafka-python-getting-started>pip install confluent-kafka

We have already created a sales topic. Let’s write to this topic .

#!/usr/bin/env python

from random import choice
from confluent_kafka import Producer

if __name__ == '__main__':

    config = {
        # User-specific properties that you must set
        'bootstrap.servers': 'pkc-xxxxxxxxxx.us-east-1.aws.confluent.cloud:9092',
        'sasl.username':     '6VUTxxxxxxxxxxJ45',
        'sasl.password':     'kNc4BR+jtxxxxxxxxxxxxxxs+zYTUxwtmBIAfgddouLEIxPFyzav8QWj8OoRwKSS594z/',

        # Fixed properties
        'security.protocol': 'SASL_SSL',
        'sasl.mechanisms':   'PLAIN',
        'acks':              'all'
    }

    # Create Producer instance
    producer = Producer(config)

    # Optional per-message delivery callback (triggered by poll() or flush())
    # when a message has been successfully delivered or permanently
    # failed delivery (after retries).
    def delivery_callback(err, msg):
        if err:
            print('ERROR: Message failed delivery: {}'.format(err))
        else:
            print("Produced event to topic {topic}: key = {key:12} value = {value:12}".format(
                topic=msg.topic(), key=msg.key().decode('utf-8'), value=msg.value().decode('utf-8')))

    # Produce data by selecting random values from these lists.
    topic = "sales"
    user_ids = ['eabara', 'jsmith', 'sgarcia', 'jbernard', 'htanaka', 'awalther']
    products = ['book', 'alarm clock', 't-shirts', 'gift card', 'batteries']

    count = 0
    for _ in range(10):
        user_id = choice(user_ids)
        product = choice(products)
        producer.produce(topic, product, user_id, callback=delivery_callback)
        count += 1

    # Block until the messages are sent.
    producer.poll(10000)
    producer.flush()

Let’s execute this to produce some messages

Now if you would want to consume the messages , you would need to define a consumer.

#!/usr/bin/env python

from confluent_kafka import Consumer

if __name__ == '__main__':

    config = {
        # User-specific properties that you must set
        'bootstrap.servers': 'pkc-xxxxxxxx.us-east-1.aws.confluent.cloud:9092',
        'sasl.username':     '6VUTxxxxxxxxxxVE7GOJ45',
        'sasl.password':     'kNc4BR+jxxxxxxxxxxxxYTUxwtmBIAfgddouLEIxPFyzav8QWj8OoRwKSS594z/',

        # Fixed properties
        'security.protocol': 'SASL_SSL',
        'sasl.mechanisms':   'PLAIN',
        'group.id':          'kafka-python-getting-started',
        'auto.offset.reset': 'earliest'
    }

    # Create Consumer instance
    consumer = Consumer(config)

    # Subscribe to topic
    topic = "purchases"
    consumer.subscribe([topic])

    # Poll for new messages from Kafka and print them.
    try:
        while True:
            msg = consumer.poll(1.0)
            if msg is None:
                # Initial message consumption may take up to
                # `session.timeout.ms` for the consumer group to
                # rebalance and start consuming
                print("Waiting...")
            elif msg.error():
                print("ERROR: %s".format(msg.error()))
            else:
                # Extract the (optional) key and value, and print.
                print("Consumed event from topic {topic}: key = {key:12} value = {value:12}".format(
                    topic=msg.topic(), key=msg.key().decode('utf-8'), value=msg.value().decode('utf-8')))
    except KeyboardInterrupt:
        pass
    finally:
        # Leave group and commit final offsets
        consumer.close()

Execute the consumer.py

Rerun the producer to see more events, or feel free to modify the code as necessary to create more or different events.

Once you are done with the consumer, enter Ctrl-C to terminate the consumer application.

I would write more such practical guide in my future blog. We would go through more interesting concept on kafka in the future blogs. We would go in detail on broker,topic,consumer group and partition . Please follow and share a note of appreciation if you find my blogs beneficial. Happy learning.