Sizing your Event Hub (Apache Kafka) Cluster on Oracle Cloud

Published in

Oracle Developers

4 min readNov 17, 2017

The intention of this blog post is to provide a high level guideline on how to size your Event Hub Cloud Service Cluster on Oracle Cloud. Choosing the exact cluster size and configuration varies greatly on your use case and requirements and hence a post on this topic(pun intended) can only aim to provide a general direction on the decision making process.

Sizing a cluster is explained in this post across two dimensions:

Sizing for Throughput
Sizing for Storage

Sizing for Throughput

Producer/Consumer throughput

Before trying to size your Event Hub Cluster, the question to ask would be what is the expected throughput of your Producer(s) and your Consumers(s). Your system throughput would be as fast as your weakest link.

What is the rate at which the producers are going to be producing ?, What is the message size of the produced messages ? How many consumers would be consuming from the Topics/Partitions and at what rate would they be able to process each message given the message size.

Number of Brokers (and Zookeepers)

Increasing the number of brokers and configuring replication across brokers is a mechanism to achieve not just parallelism and higher throughput but also to achieve high availability. HA may not be a factor when running Dev and Test on Event Hub in which case the need to deploy multiple brokers may not exist. However, it is strongly recommended to deploy multiple brokers ( 3+).

ZooKeeper plays a critical role in the Broker cluster management — keeping track of which brokers are leaving the cluster and which new ones are joining, leader election and configuration management. This makes it necessary to ensure that Zookeeper is also deployed in a HA cluster.

The recommendation for sizing your Zookeeper cluster is to use 1 instance for Dev/Test Environments, 3 instances to plan for 1 node failure and 5 instances to plan for 2 node failures.

CPU Shape of your Broker(s)

As of writing this post the available options on Oracle Cloud for EventHub Cloud Service for CPU Shapes are :

OC1m — 1 OCPU, 15 GB RAM

OC2m — 2 OCPU, 30 GB RAM

OC3m — 4 OCPU, 60GB RAM

OC4m — 8 OCPU, 120 GB RAM

An OCPU provides CPU capacity equivalent to one physical core of an Intel Xeon processor with hyper threading enabled. Each OCPU corresponds to two hardware execution threads, known as vCPUs.

This is a sample benchmarked throughput:

Configuration:

Producers/Consumers: 10/10
Topics: 10
Partitions: 10 per Topic,
Replication factor: 3
Broker Nodes: 5 (OC3m)
Zookeepers: 3 (OC1m)
No compression
acks=1

Note: This throughput is achieved via Kafka Native APIs. REST API throughput is known to be much less at around 1/4 of the Native Kafka API throughput. The test is run from Producers and Consumers in VMs in Oracle Cloud — with minimum latency between VMs.

Sizing for Topics and Partitions

The decision to choose the #partitions depend on the desired throughput and the degree of parallelism that your Producer / Consumer ecosystem can support. Generally speaking, increasing the # of partitions on a given topic , linearly increases your throughput — however the throughput bottleneck could end up being the rate at which your Producer can produce or the rate at which your consumers can consume.

The simple math used by Kafka users and published in a few Kafka blogs is as below:

Lets say the desired throughput is “t”. Max Producer throughput is “p” and max consumer throughput is “c”.

# of Partitions = max ( t/p, t/c).

A rule of thumb often used is to have at least as many partitions as # of consumers in largest consumer group.

Sizing for Storage

Factors to consider when sizing your storage on a Broker :

# Topics, # Partitions per topic
Desired Replication factor
Message Sizes
Retention period
Rate at which messages are expected to be Published and Consumed from Event Hub Broker

A paper napkin math with the above factors can get you a rough size of the required storage per broker on your Event Hub Cluster.

Summary

In, summary, the above post provides rough guidelines on how one should approach sizing their Event Hub Cluster.

Monitoring your Cluster for CPU, Memory and Storage by setting thresholds for alerting can help you start small and figure out over time how your cluster is performing and trending. Expect fine tuning your Cluster based on your needs, as your system is ready to grow. Managed scaling is part of the value an Event Hub Cloud Service (Managed Kafka) will provide.

Sizing your Event Hub (Apache Kafka) Cluster on Oracle Cloud

Sizing for Throughput

Sizing for Storage

Summary

Written by Kunal Rupani