Kafka use cases in Pega

Published in

Cloud Workers

5 min readNov 16, 2023

Kafka is an open-source event-streaming platform for processing high volumes of event data. It is a distributed and highly scalable platform for storing and processing real-time data pipelines and data integrations . It uses a publish-subscribe design pattern where one software component produces events/messages and another component consumes and processes them.

The architecture of Kafka consists of the below core components:

Producer — an application/service that publishes message streams to Kafka broker
Consumer — another application that consumes and processes published events on a subscribed topic.
Topic — Logical stream where the messages are published to and consumed from.
Partitions — Every Kafka topic is distributed into multiple partitions. A partition is a linearly ordered sequence of records of a topic. Partitions are distributed across multiple brokers in a cluster and it is this key capability that makes the platform highly scalable to handle varying volumes of data and fault tolerant.
Broker — Server that manages incoming requests and events from producers and consumers.

Each message in a partition is labelled with an offset. Each consumer thread processes messages from respective partitions and updates the message offset.

Kakfa API library contains several core libraries for applications to write/read and process messages from Kafka. Some of its API libraries are:

Producer API — For creating and publishing events
Consumer API — for consuming and processing events
Streams API — for stream data processing and analyzing in Kafka
Connect API — to build event streams from existing relational DBMS like Postgres

Kafka is widely used across most modern data architectures due to its ability to handle real-time streams efficiently. Some of the major reasons for its widespread usage and popularity are:

It is highly reliable and fault tolerant — Data in a Kafka cluster is replicated across multiple brokers, multiple copies of the same data exist in a Kafka cluster, eliminating the risk of a single point of failure.
High throughput — Kafka utilizes an optimized way of managing event streams between its compute and storage layer (broker). In addition to this, it’s distributed architecture around Topic partitions maximizes the volumes of data processed. It can process some millions of records per sec to some trillions in a day.
Highly scalable — The servers in a Kafka cluster can scale to a few thousand servers across data centers. Also, the number of partitions for a given topic can be scaled according to the load.
Modern architecture — Decoupled communication between producer and consumer applications makes it an ideal fit for communication between disparate real world applications.

Pega utilizes this event streaming platform across a spectrum of use cases. Until previous releases, all external services, including Kafka, were packaged as part of the Pega Infinity software distribution. However, with its latest Infinity23 release, Kafka will now need to be managed as a separate service.

Within Pega, events are produced to and consumed from Kafka topics using stream datasets. A stream dataset instance is executed through either:

The Execute-DataSet method in an activity or
In a data flow referencing the stream data set as input source

Stream Dataset referencing Kafka server instance

Pega extensively leverages Kafka in various scenarios, some of which include:

Analyzing and processing customer events

Pega can analyze streams of customer events, draw meaningful insights and determine the next best action to deliver optimal business value.

It can read customer events from multiple Kafka streams using real-time data flows with a stream dataset as an input source. These incoming events are then analysed by a predefined event strategy.

The insights drawn by the event strategy are then used to deliver the next -best-actions for the customer using a decision strategy.

Real-time Data Flow executing an Event Strategy and a Decision Strategy

2. For background processing in Queue Processors

Queue Processors, both Standard and Dedicated, use Kafka Topics for processing messages in the background. A request for asynchronous processing is written to a Kafka topic which is read and processed by the Queue Processor in the background.

A stream data set and a real-time data flow are used behind the scenes to write and read messages from a Kafka topic.

Each Queue Processor on the platform reads from one single topic in Kafka. Queue Processor supports multi threading, allowing each thread to process messages concurrently from multiple partitions, which increases the overall performance and throughput during high load scenarios.

3. OOTB User Notifications

Standard OOTB notifications across channels such as Web Gadget, Email and Mobile use Kafka topics to deliver notifications. Notification requests are published to Kafka topics and these request are subsequently read and processed by OOTB Queue Processor pyProcessNotification to deliver the notification.

4. Building Realtime applications

UI Subscription channels in Pega serve as a tool for building real-time applications, such as Stock Exchange dashboard which has a few hundred to thousand updates every second.

A producer module/application publishes updates to the configured UI Channel created via Configure → User Interface → Notification Channels

Upon publication,subscribers with relevant context process the message and render the update on the screen using the configured On-Load method on the UI.

Kafka use cases in Pega

Written by Bindu Rani