How to Install Apache Kafka on macOS, Linux, and Ubuntu

4 min readJun 10, 2024

In this guide, we’ll walk you through the process of setting up Apache Kafka on macOS, Linux, and Ubuntu. Apache Kafka is a powerful stream-processing platform widely used for building real-time data pipelines and streaming applications. This guide will cover the prerequisites, installation steps, and basic configuration to get Kafka up and running on your system.

This Article assumes that the user has already configured RBAC required for Kafka setup mentioned here:

https://docs.snowflake.com/en/user-guide/kafka-connector-install.html#creating-a-role-to-use-the-kafka-connector

Please refer to the link below documentation for the installation prerequisites and specific versions:
https://docs.snowflake.com/en/user-guide/kafka-connector-install.html#installation-prerequisites

Prerequisites

Before we begin, ensure you have the following installed:

Prerequisites:

Apache Kafka
Snowflake-Kafka-connector
Scala 2.12 or newer
Java Development Kit (JDK): Kafka requires Java to run. Ensure you have JDK installed and the JAVA_HOME environment variable set JAVA_HOME set.

Before we start! 🦸🏻‍

If you like this topic and you want to support me:

Clap my article 50 times; that will really help me out.👏
Follow me on Medium to get my latest article🫶

Step 1: Download Kafka Binary

Download the Kafka binary based on the Scala version. Here, we’ll download Kafka 2.13–3.2.1.

wget https://archive.apache.org/dist/kafka/3.2.1/kafka_2.13-3.2.1.tgz

Extract the downloaded tar file:

tar -xzf kafka_2.13-3.2.1.tgz
cd kafka_2.13-3.2.1

Step 2: Start Zookeeper and Kafka Services

Kafka uses Zookeeper to manage distributed brokers. First, start the Zookeeper service:

# Start Zookeeper
./bin/zookeeper-server-start.sh config/zookeeper.properties

Open a new terminal window or tab and start the Kafka broker service:

# Start Kafka Server
./bin/kafka-server-start.sh config/server.properties

Step 3: Create Kafka Topics

Create a topic in Kafka to publish and consume messages:

# Create a Topic
./bin/kafka-topics.sh --create --topic your_topic_name --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1

Step 4: Produce and Consume Data

Produce Data

Use Kafka’s producer to send data to your topic:

# Start Producer
./bin/kafka-console-producer.sh --topic your_topic_name --bootstrap-server localhost:9092

You can now type messages into the console to send to the topic.

Consume Data

Use Kafka’s consumer to read data from your topic:

# Start Consumer
./bin/kafka-console-consumer.sh --topic your_topic_name --from-beginning --bootstrap-server localhost:9092

This command consumes messages from the specified topic and prints them to the console.

Step 5: Monitor and Manage Kafka

For basic monitoring and management, you can list and describe topics:

# List all topics
./bin/kafka-topics.sh --list --bootstrap-server localhost:9092

# Describe a specific topic
./bin/kafka-topics.sh --describe --topic your_topic_name --bootstrap-server localhost:9092

Example: Using Kafka for Real-Time Data Streaming

To illustrate Kafka’s real-time data streaming capabilities, let’s consider an example where an e-commerce company tracks users’ activities in real-time to recommend products based on their site activity.

1. User Activity Tracking

Kafka Producers send data about user activities to a topic named user_activity.

2. Real-Time Processing

A Kafka consumer subscribed to the user_activity topic processes data in real-time, analyzing user behavior and generating recommendations.

3. Recommendation Generation

Based on the processed data, the system generates personalized product recommendations for each user.

Best Practices for Apache Kafka Deployment

Proper Capacity Planning: Understand workload requirements and plan Kafka cluster capacity considering message throughput, retention policies, and storage needs.
High Availability Configuration: Deploy multiple brokers across different availability zones or data centers for automatic failover and replication.
Optimized Topic Design: Design topics with appropriate partitioning, replication factors, and retention policies for optimal performance.
Effective Monitoring and Alerting: Implement monitoring and alerting to track Kafka cluster health, throughput, and latency.
Security Hardening: Secure Kafka clusters using encryption, authentication, and authorization mechanisms to protect data confidentiality, integrity, and availability.

Conclusion

Setting up Apache Kafka on macOS, Linux, and Ubuntu is a straightforward process. By following the steps outlined in this guide, you can have Kafka up and running, ready to handle real-time data streaming and processing tasks. Kafka’s robustness, scalability, and performance make it an excellent choice for building data-intensive applications. Happy streaming!

For more detailed information, refer to the official Apache Kafka Quickstart and Snowflake Kafka Connector documentation.

Thank you for reading! If you found this guide helpful, please give it a clap and follow me for more insightful content on real-time data analytics and web development with Django. Your support keeps me motivated to share more valuable resources. Let’s stay connected!

Editor’s Note: ProSpexAi is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.