Kinesis vs. Kafka

Published in

Flo Health UK

7 min readAug 4, 2020

Backstory

Flo needs to understand how users interact with the app and what features they use more frequently. Based on this information, we decide how to improve our product. We gather this information from the different parts of our system in the form of analytic events. They must be stored in a message broker waiting for processing. For the past two years, we’ve used AWS Kinesis as our internal message broker. However, the more we’ve worked with it, the more pitfalls we’ve found. At some point, we realized that we would like to consider alternative message brokers without the drawbacks of Kinesis. The most promising replacement candidate for us was Kafka. During our investigation, one question arose: whether Kafka is better than Kinesis from a latency/throughput perspective. So, we decided to find out the answer through benchmarks.

Benchmark framework

We developed a small benchmark framework based on the Akka Streams library. Why it uses the Akka Streams? Well, for two reasons. First, because we were already using it in our backend services. Second, because there are integrations of Akka Streams with both Kinesis and Kafka (i.e., the Alpakka library). The library versions are as follows:

akka-streams — 2.5.31
akka-stream-kafka — 2.0.0
akka-stream-alpakka-kinesis — 2.0.0

How do we simulate a realistic events stream? Instead of generating some random synthetic events, we took 100000 real events in JSON format from our production environment. Every event producer (Kinesis or Kafka) makes sampling with replacement from this 100000-event pool that gives us a realistic infinite event stream.

Each event is marked with a timestamp when it’s selected from the event pool, the so-called ‘created_at’ field. We can interpret this timestamp as the moment of event creation. Both Kinesis and Kafka have the ‘internal stored_at’ timestamp, which indicates the moment when our event is successfully stored. This timestamp is used as the event storing time. Finally, when we consume an event, we get the current moment’s timestamp, which is the event receiving time. In addition, we get the size in bytes for each event. All of this data is used to calculate latency (write, read, and total) and throughput (write and read) metrics.

A short remark about Kafka cluster and topics configuration. Since we are hosting on AWS cloud, instead of deploying our own Kafka cluster from the scratch we decided to use AWS MSK (Kafka cluster managed by Amazon). Also we used the following settings to configure Kafka topics used for benchmarking:

SSL is on
Replication factor is set to 3 (emulating reliability)
Acknowledgement during writing to Kafka is set to “all” (waiting for data to be written to most of the replicas). This also contributes to reliability and fault tolerance.

Calculating metrics

Let’s first introduce some notations:

tc_n – timestamp when the n-th event was created (milliseconds)

ts_n – timestamp when the n-th event was stored (milliseconds)

tr_n – timestamp when the n-th event was received (milliseconds)

s_n – n-th event size (bytes)

M – sliding window size, used for read and write throughput calculation. The bigger the value we use for this parameter, the smoother the throughput metric will be. For this benchmark, we set this parameter to 10,000.

Now we can describe which metrics we’re going to calculate and how.

Write latency (milliseconds):

Read latency (milliseconds):

Total latency (milliseconds):

Write throughput (MB per second):

Read throughput (MB per second):

Benchmarking strategy

We developed 3 main test cases:

Default config. We just take consumers and producers for both message brokers and use them without any specific setup (all parameters are default). This case simulates when a developer uses a library as-is, without any configuration.
Latency first. We will try to configure the parameters of both messages brokers to improve latency.
Throughput first. We will try to configure the parameters of both message brokers to improve throughput.

In order to adjust between “Latency first” and “Throughput first” cases we will use the following settings.

Kafka producer:

batch.size (default is 16384) — The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This configuration controls the default batch size in bytes. By increasing this setting we increase throughput but decrease latency.
linger.ms (default is 0) — The maximum time to buffer data in asynchronous mode. For example, instead of sending immediately, you can set linger.ms to 5 and send more messages in one batch. By increasing this setting we increase throughput but decrease latency.

Kafka consumer:

fetch.min.bytes (default is 1) — The minimum amount of data that the server should return for a fetch request. If the data is insufficient, the request will wait for that much data to accumulate before answering the request. By increasing this setting we increase throughput but decrease latency.
fetch.max.wait.ms (default is 500) — The maximum amount of time that the server will block before answering the fetch request, if there isn’t sufficient data to immediately satisfy the requirement given by fetch.min.bytes. By increasing this setting we increase throughput but decrease latency.

Kinesis producer:

putRecords() maxBatchSize (default is 500, maximum is 500) — The amount of records which will be sent to Kinesis as one batch. By decreasing this setting we increase latency but decrease throughput.

Kinesis consumer:

getRecords() maxBatchSize (default is 10,000, maximum is 10,000) — The amount of records that will be read from Kinesis as one batch. By decreasing this setting we increase latency but decrease throughput.
idleTimeBetweenReadsInMillis (default is 1000) — A delay in milliseconds between getRecords() calls. By decreasing this setting we increase latency but decrease throughput.

Based on the our Kafka cluster cost, we derived possible Kinesis configurations that consume the same amount of money. It’s either 9 shards with 24 hours retention policy or 4 shards with 7 days retention policy. Therefore we are going to use two possible configurations of the Kinesis stream: 4 shards and 9 shards. To equalize parallelization levels, we’ll also use two configurations for Kafka topic: 4 partitions and 9 partitions. Gathering it all together, we have 12 test cases in total. In each case, we’ll gather metrics only for 500,000 events.

Default config case

Kafka producer configuration:

batch.size — 16384
linger.ms — 0

Kafka consumer configuration:

fetch.min.bytes — 1
fetch.max.wait.ms — 500

Kinesis producer configuration:

putRecords() maxBatchSize — 500

Kinesis consumer configuration:

getRecords() maxBatchSize — 10000
idleTimeBetweenReadsInMillis — 1000

Kinesis 9 shards stream

Detailed results (benchmark plots)

Kinesis 4 shards stream

Detailed results (benchmark plots)

Kafka 9 partitions topic

Detailed results (benchmark plots)

Kafka 4 partitions topic

Detailed results (benchmark plots)

Default config case comparison

Latency first case

With Kafka, the default settings are already latency-oriented. As such, we weren’t able to improve latency significantly.

Kafka producer configuration:

batch.size — 8192
linger.ms — 0

Kafka consumer configuration:

fetch.min.bytes — 1
fetch.max.wait.ms — 100

Kinesis producer configuration:

putRecords() maxBatchSize — 1

Kinesis consumer configuration:

getRecords() maxBatchSize — 1000
idleTimeBetweenReadsInMillis — 1

Kinesis 9 shards stream

Detailed results (benchmark plots)

Kinesis 4 shards stream

Detailed results (benchmark plots)

Kafka 9 partitions topic

Detailed results (benchmark plots)

Kafka 4 partitions topic

Detailed results (benchmark plots)

Latency first case comparison

Throughput first case

Kafka producer configuration:

batch.size — 262144
linger.ms — 5000

Kafka consumer configuration:

fetch.min.bytes — 262144
fetch.max.wait.ms — 5000

Kinesis producer configuration:

putRecords() maxBatchSize — 500

Kinesis consumer configuration:

getRecords() maxBatchSize — 10000
idleTimeBetweenReadsInMillis — 2000

Kinesis 9 shards stream

Detailed results (benchmark plots)

Kinesis 4 shards stream

Detailed results (benchmark plots)

Kafka 9 partitions topic

Detailed results (benchmark plots)

Kafka 4 partitions topic

Detailed results (benchmark plots)

Throughput first case comparison

Final results

Kafka beats Kinesis in all test cases in every metric. Kafka is also more flexible in terms of adjusting between latency and throughput. By contrast, almost the only way to adjust latency and throughput for Kinesis is to change shards count (which is quite expensive). So the winner is Kafka.

Kinesis vs. Kafka

Backstory

Benchmark framework

Calculating metrics

Benchmarking strategy

Default config case

Kinesis 9 shards stream

Kinesis 4 shards stream

Kafka 9 partitions topic

Kafka 4 partitions topic

Default config case comparison

Latency first case

Kinesis 9 shards stream

Kinesis 4 shards stream

Kafka 9 partitions topic

Kafka 4 partitions topic

Latency first case comparison

Throughput first case

Kinesis 9 shards stream

Kinesis 4 shards stream

Kafka 9 partitions topic

Kafka 4 partitions topic

Throughput first case comparison

Final results

Written by Henadz Varantsou