Casper Kafka Event Store Pt 3

Adding Producers and Consumers to the Kubernetes Cluster

Carl Norburn
Casper Association R & D
4 min readDec 8, 2022

--

DALL-E representation of the Stormeye development team

Now that we have a cluster with a kafka broker and zookeeper ensemble running we can start producing and consuming emitted events.

Some important decisions had to be made here:

  • How many nodes will we connect to?
  • How many producer and consumer replicas?
  • Maximum size of an event?
  • Where is the data going?
  • What’s the DR strategy?

Before we discuss these decision let’s first look at how we approached writing the kafka consumers and producers.

At project inset we had separate, stand alone GitHub repos for Consumers and Producers. As the project matured we moved over to a single repo with shared classes plus we leveraged the Java SDK functionality.

Relevant GitHub repositories are here:

The Producer

The producer has two jobs:

  • Connect and start a stream from a node(s)
  • Wrap events in metadata and fire to Kafka topics as messages

To connect and stream a node we are using the functionality of the SDK

The above code will connect to a node with retries. DR also comes into play here with the option as replaying the stream from a specific event id.

The method above will accept for each event in the stream the URI of the emitter and a deserialised event. The topic is extracted from the event, a key is generated, a partition set and then the event is fired to a Kafka topic.

Topics are determined from the emitter node endpoint, /sigs, /main, /deploys

After trial and error with setting a Kafka partition we decided to let Kafka determine which partition to send to. We hoped that by adding a unique key and writing to a specific partition we could prevent duplicate events and thus have multiple producers. But at this moment we have yet to prove this. The Kafka partitions and replicas per topic are set at the Kubernetes cluster deployment:

The producer configuration is set in the following code:

Most settings speak for themselves, but what is interesting is the PRODUCER_BYTES constant which sets the message size. The default for a Kafka message is 1mb. We saw Casper events > 20mb.

This caused a philosophical and existential discussion amongst the team. Should we even be using Kafka if the messages can be this size? What are the implications of handling events this size?

In the end, after discussion with Casper Labs, we decided that as the frequency of large events is low, it’s acceptable to increase the max size in Kafka. We’ve set it 256mb.

The Consumers

We are using two consumer groups in the project:

  • consumer-audit — writes the raw events to mongo for replays and DR
  • consumer-store — writes to Postgres for downstream consumers

Consumer groups are a very useful feature of Kafka, they allow multiple consumers to read the same message. Messages are not marked as read until all registered consumer groups have read them. This is where Kafka’s scalability becomes an advantage. More consumer groups can be added later, and as we have the replay functionality through the audit service, they can consume events from a set point in time.

The consumer-audit consumer group is a straight pass through process which will just write the raw event to mongo, we had to use gridFS to allow >16mb messages.

Kafka config details are left to the annotations, the audit service will simply save the raw event

The consumer-store consumer group will apply Casper business logic to any message it reads and store it in the relational Postgres db.

This is the meat of the project, work is ongoing here. This is where block validation and consensus amongst other analytics will happen.

The following code snippet shows how this happens:

The storage factory decides where to send the event for processing. Each event type has it’s own factory class:

The above method handles the DeployProcessed event which will extract the deploy data and any transfers, withdrawals and bids.

The other factory storage classes include blocks, eras, rewards.

An API has been added to view the live Postgres data here

So where does that leave us? Lets have a look at the current architectural diagram:

Everything in the circle is in the Kubernetes cluster and ready to be used.

The applications on the right outside the cluster are potential applications waiting to be developed.

The node events pass into the Kafka brokers, there they wait until the consumer reads them, then the consumers process them.

We’re currently running one replica of the Producer in the cluster, connecting to one Casper node. This prevents event duplication. A single producer can connect to multiple nodes. More producers can be added connecting to more nodes. This will be explained more in a later article.

At the moment two replicas are running each for the consumer groups.

There is also an Audit API endpont here

The id store mongo db is used for DR, it stores the id of the last event processed. This sets the restart point when reconnecting.

The next article will walk through the Kubernetes cluster dashboard and logs.

Previous articles are here:

Casper Kafka Event Store Pt 1

Casper Kafka Event Store Pt 2

Carl Norburn is a Software Engineer with Stormeye. He also knows a thing or two about DevOps. Public repos, twitter

--

--