Shubham Kumbhar
3 min readJul 14, 2023

Certainly! Here's a concise Kafka cheat sheet that covers the essential concepts and commands:

Kafka Cheat Sheet:

1. Kafka Basics:
- Apache Kafka: A distributed streaming platform.
- Topics: Streams of records that Kafka organizes data into.
- Partitions: Divisions of a topic for scalability and parallelism.
- Producers: Applications that write data to Kafka topics.
- Consumers: Applications that read data from Kafka topics.
- Brokers: Kafka server instances that store and replicate messages.

2. Kafka Command Line Tools:
- Create a topic:
```
kafka-topics.sh --create --topic <topic_name> --bootstrap-server <bootstrap_servers> --partitions <num_partitions> --replication-factor <replication_factor>
```
- List topics:
```
kafka-topics.sh --list --bootstrap-server <bootstrap_servers>
```
- Produce messages:
```
kafka-console-producer.sh --topic <topic_name> --bootstrap-server <bootstrap_servers>
```
- Consume messages:
```
kafka-console-consumer.sh --topic <topic_name> --bootstrap-server <bootstrap_servers> [--from-beginning]
```

3. Kafka Configuration:
- server.properties: The main Kafka configuration file.
- zookeeper.connect: ZooKeeper connection string.
- listeners: Network listeners for Kafka.
- log.dirs: Directory where Kafka stores log data.
- num.partitions: Default number of partitions for new topics.

4. Kafka Consumer Groups:
- Consumer group: A group of consumers that work together to consume data.
- Group ID: Identifier for a consumer group.
- Start a consumer group:
```
kafka-console-consumer.sh --topic <topic_name> --bootstrap-server <bootstrap_servers> --group <group_id>
```
- Reset consumer group offsets:
```
kafka-consumer-groups.sh --bootstrap-server <bootstrap_servers> --group <group_id> --reset-offsets --to-earliest --all-topics --execute
```

5. Kafka Producers:
- Message key: Optional key used for partitioning and message ordering.
- Message value: The actual data being produced.
- Message serialization: Convert data to bytes before sending.
- Asynchronous send: Non-blocking producer send.
- Synchronous send: Blocking producer send.

6. Kafka Connect:
- Connect API: Framework for connecting external systems to Kafka.
- Source connectors: Import data from external systems into Kafka.
- Sink connectors: Export data from Kafka to external systems.
- Connector configurations: Define connector-specific properties.

7. Kafka Streams:
- Streams API: Stream processing library in Kafka.
- Streams DSL: High-level stream processing abstraction.
- Windowed operations: Aggregations over time-based windows.
- Stateful operations: Maintain and update state in stream processing.

8. Kafka Security:
- Authentication: Verifying the identity of clients and servers.
- Authorization: Granting or denying access to Kafka resources.
- SSL encryption: Securing data transmission between clients and brokers.
- SASL (Simple Authentication and Security Layer): Framework for pluggable authentication mechanisms.

1. Kafka Producers:
- Producer acknowledgments: Configure `acks` parameter for controlling message reliability.
- Message compression: Enable compression to reduce network bandwidth usage.
- Batch size: Adjust `batch.size` to optimize the producer's message batching.
- Message serialization: Consider using a lightweight serialization format like Avro or JSON.

2. Kafka Consumers:
- Consumer offsets: Understand how Kafka tracks offsets and manages message consumption.
- Polling duration: Configure `poll()` timeout to balance between responsiveness and efficiency.
- Offset management: Choose between manual offset management and automatic offset commit.
- Rebalancing: Know how consumer groups handle rebalancing when adding or removing consumers.

3. Kafka Performance Optimization:
- Monitor Kafka metrics: Utilize tools like Kafka's built-in metrics and external monitoring solutions.
- Partitioning strategy: Choose an appropriate key for message distribution to avoid hotspots.
- Replication factor: Balance between fault tolerance and performance when setting replication factor.
- Message size: Optimize message size to avoid network congestion and disk I/O bottlenecks.
- Hardware considerations: Provision sufficient resources (CPU, memory, disk) for your Kafka deployment.

4. Fault Tolerance and High Availability:
- Replication factor: Set a replication factor greater than 1 for data redundancy.
- In-sync replicas (ISR): Understand the concept of ISR and its importance for fault tolerance.
- Leader election: Familiarize yourself with the leader election process in Kafka.
- Monitoring and alerting: Implement robust monitoring and alerting systems to detect and handle failures promptly.

5. Kafka Ecosystem:
- Kafka Connect converters: Use converters to handle data serialization and deserialization.
- Schema Registry: Employ the Schema Registry for managing schema evolution and compatibility.
- Kafka Streams state stores: Understand how to define and utilize state stores for interactive queries.
- Exactly Once Semantics: Study how Kafka ensures end-to-end exactly once message processing.

6. Additional Tips:
- Familiarize yourself with Kafka's design principles, such as horizontal scalability and fault tolerance.
- Practice hands-on exercises, build simple Kafka applications, and explore real-world use cases.
- Stay up-to-date with the latest Kafka releases, features, and community discussions.
- Be prepared to explain common Kafka use cases, such as real-time analytics, log aggregation, and event sourcing.

Remember, this cheat sheet is meant to be a quick reference guide, so it's essential to dive deeper into each topic to gain a thorough understanding. Good luck with your Kafka interview preparation!

Remember, this cheat sheet provides a brief overview of key Kafka concepts and commands. For more detailed information, consult the official Kafka documentation and explore other learning resources.

Shubham Kumbhar

Passionate software engineer. Expert in Java Microservices, Spring Boot, Kafka, Apache Camle, Red Hat, Openshift. 5+ years exp. Let's innovate together!