Understanding the Basics of Stream Processing

Kushagra Kesav
CodeX
Published in
3 min readJul 28, 2024
(Blue Planet Studio/Shutterstock)

In this article, we’ll understand some fundamental concepts of stream processing that are essential for anyone looking to learn about it. Whether you’re new to stream processing or seeking a refresher, here’s a concise overview of what I’ve learned:

Fundamentals of Stream Processing

Stream processing revolves around three core concepts:

  • Events
  • Event Streams
  • Tables

1. Events

An event is a discrete piece of data that records an action representing something that has happened. For examples:

  • A rendering is completed
  • A user submitted a form
  • An order is placed

Events can also be referred to as records or messages. They serve as the building blocks for data operations and are crucial for tracking and responding to changes in real-time.

2. Event Streams

An event stream is essentially a sequence of events recorded over time. It provides a chronological record of all the events that have occurred. For example, consider the lifecycle of an order:

  • Order placed
  • Order packed
  • Order shipped
  • Order in transit
  • Order delivered

Each of these stages represents an event in the stream, capturing the progression from order placing to completion.

3. Tables

In contrast to event streams, tables represent the state of data at a specific point in time. They are static snapshots of data, allowing us to view the most current state. For example, in the order process, it would store the final status as “delivered” (most recent status of each entity).

Tables are particularly useful for querying and analysing the current state of our data, providing a clear picture of what’s happening at any given point of time.

Operations

The operations we can perform differ between event streams and tables:

  • Event Streams: Here the operations are limited to insert only. This makes sense as streams are designed to capture continuous data. We can not modify or delete past events.
  • Tables: Here we have the flexibility to perform various operations, such as inserting, updating, and deleting records. This allows us to query and analyse the data at any given point of time.

Transformations

One of the powerful aspects of stream processing is the ability to transform data between streams and tables also know as Stream-table duality.

  • Stream to Table: You can aggregate an event stream to produce a table. For instance, using SQL functions like COUNT() or SUM(), you can summarize data from the stream into a table format.
  • Table to Stream: Conversely, changes made to a table can be streamed out to reflect updates in real-time. This process is often called change data capture(CDC).

In fact, we can say that a table is fully defined by its underlying change stream. As new events occur, the table will continuously update itself.

Stream processing is a dynamic and powerful functionality to handle real-time data, whether you’re building real-time analytics systems or managing data pipelines, understanding these basics will provide a strong foundation for more advanced techniques and applications.

Feel free to dive deeper into each of these topics and explore how they can be applied to your own projects!

--

--