Kafka Use Cases and Applications

Mahesh Saini
The Life Titbits
Published in
4 min readMay 8, 2023

--

1. Publish-subscribe

  • Pub/Sub (or Publish/Subscribe) is an architectural design pattern used in distributed systems for asynchronous communication between different components or services.
  • The key to this is the fact that Pub/Sub enables the movement of messages between different components of the system without the components being aware of each other’s identity (they are decoupled).
  • Pub/Sub provides a framework for exchanging messages between publishers (components that create and send messages) and subscribers (components that receive and consume messages).

2. Log aggregation

  • Dealing with large volumes of log-structured events, typically emitted by application or infrastructure components. Logs may be generated at burst rates that significantly outstrip the ability of databases to keep up with log ingestion and indexing, which is typically an ‘expensive’ operation.
  • Kafka can act as a buffer, offering an intermediate, durable data store. The ingestion process will act as a sink, eventually collating the logs into a read-optimized database (for example, Elasticsearch).
  • The log aggregation pipeline may also contain intermediate steps, for example, to normalize the logs into a canonical form, or sanitize the logs — scrubbing them of personally identifiable information.

3. Log shipping

  • While sounding vaguely similar to log aggregation, the shipping of logs is a vastly different concept. Essentially, this involves the real-time copying of journal entries from a master system to one or more replicas.
  • Assuming stage changes are fully captured in the journal records, replaying these records allows the replicas to accurately mimic the state of the master, albeit with some lag.

4. Staged Event-Driven Architecture (SEDA) Pipelines

  • Staged Event-Driven Architecture (SEDA) applies pipelining to event-oriented data. Events flow unidirectionally through a series of processing stages linked by topics, each one performing a mapping operation before publishing a transformed event to the next topic.
  • Intermediate stages simultaneously act as both consumers and producers and may scale autonomously and independently of one another to match their unique load demands. By breaking a complex problem into stages, SEDA improves the modularity of the system. As a pattern, SEDA is readily found in data warehousing, data lakes, reporting, analytics, and other Business Intelligence applications, and is often an element of Big Data applications.

5. Complex Event Processing (CEP)

  • Complex Event Processing (CEP) extracts meaningful information and patterns in a stream of discrete events, or across a set of disjoint event streams.
  • CEP processors tend to be stateful, as they must be able to efficiently recall prior events to identify patterns that might span a broad time-frame, ranging from milliseconds to days, depending on the context.
  • CEP is heavily employed in such applications as algorithmic stock trading, security threat analysis, real-time fraud detection, and control systems.

6. Event Sourcing — CQRS

  • CQRS is by far the most common way that event sourcing is implemented in real-world applications. A CQRS system always has two sides, a write side and a read side:
  • In the write side (shown on the left side of the diagram), you send commands or events, which are stored reliably. The read side (shown on the right side of the diagram) is where you run queries to retrieve data. (If you are using Apache Kafka, it provides the segregation between the two sides.) So unlike in vanilla event sourcing, the translation between event format and table format in CQRS happens at write time, albeit asynchronously.
  • Separating reads from writes has some notable advantages: You get the benefits of event-level storage, but also much higher performance, since the write and read layers are decoupled. Of course, there is also a tradeoff: the system becomes eventually consistent, so a read may not be possible immediately after an event is written. This must be taken into account when designing a system.

Don’t forget to hit the Clap and Follow buttons to help me write more articles like this.

And, if you are looking for summarized articles on Apache Kafka, you can also check my previous articles like Foundational Concepts of Kafka and Its Key Principles, Topic Log Compaction in Apache Kafka, Kafka — Data Durability and Availability Guarantees, and Why is Apache Kafka fast?

References

--

--