31 Mar 2016
Note: This post is open to suggestions that can help achieve fairer results with these benchmarks. It is a versioned post and I would be incrementing it if anything related to these benchmarks changes. Changes can be tracked here as well.
In my previous post, I wrote about how we can interface Jython with Kafka 0.8.x and use Java consumer clients directly with Python code. As a followup to that, I got curious about what would be the performance difference between Java, Jython and Python clients. In this post, I am publishing some of the benchmarks that I have been doing with Java, Jython and Python producers.
Acknowledgement: I was further inspired by Jay Kreps’s amazing blog post on Kafka 0.8.1 benchmarks and a lot of context around my own benchmarks have been borrowed from his post. It would be a good idea to read his post first before going through the rest of it here.
All of these benchmarks were run on AWS EC2 instances with the following specifications:
m4.largeshared tenancy Kafka broker servers as a cluster with 128 GiB EBS Magnetic storage each.
m4.xlargeshared tenancy Zookeeper server for the whole Kafka cluster with 128 GiB EBS Magnetic storage.
I am not aware of any reasons where a Zookeeper cluster would perform significantly better than a single Zookeeper server particularly for this configuration, so I went for a single server instead. For these benchmarks, Zookeeper server was also used to run the producers for each language choice.
I kept Kafka and Zookeeper server configurations to mostly the default ones defined in
zookeeper.properties respectively. In particular, none of the server configurations were tweaked for these benchmarks.
You can see the server configurations here.
For each of these benchmarks, message size was kept small at 100 bytes for exactly the same reason as cited by Jay Kreps.
Following are the build versions for the tools used for running these benchmarks:
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
- Python futures module:
I divided the benchmarks into three different cases with each case recording results for all the runs. Before going into the results, lets be clear about the following:
- Case: Case is the configuration for running the benchmarks. For example, producing on a topic with no replication is a case.
- Run: The choice of language for running a benchmark is a run. For example, a Java producer benchmark is a run.
- Pure Java: Java code running official Apache Kafka producer API. See implementation.
- Java interfaced Jython: Jython code is just used to call producer code. The actual implementation is still in Java. See implementation.
- Pure Jython: Where possible, everything is written in Python code using Python standard libraries, except the actual internal producer implementation which is imported from official Kafka libraries. See implementation.
- Pure Python: Pure Python code that uses
kafka-pythonclient for Kafka. See implementation.
These benchmarks were run for the following cases:
- Single producer (single thread) producing on a topic with no replication and 6 partitions.
- Single producer (single thread) producing asynchronously on a topic with replication factor of 3 and 6 partitions.
- Single producer (single thread) producing synchronously on a topic with replication factor of 3 and 6 partitions.
In this context, a producer running in async mode would mean that the number of acknowledgments the producer requires the leader to have received from its followers before considering a request complete is 0. Sync mode is where the leader would wait on all the followers to acknowledge before acknowledging it to the the producer.
Single producer producing on a topic with no replication and 6 partitions.
Results are tabulated below for this case. I was shocked to see such a stark difference between
kafka-python and official Java producer clients. And as expected, Java and Java interfaced Jython producers showed very similar results.
Single producer producing asynchronously on a topic with replication factor of 3 and 6 partitions.
Not surprisingly, Java and Java interfaced Jython producers fared similarly here as well. It is also interesting to note that results for Pure Jython and Pure Python producers are not much different from the previous respective runs. My assumption is that in this case, they are mostly limited by the speed of code execution rather than producing to a topic with 3x replication. What surprises me is the fact that Java producer in this case produced half as many records/sec compared to the run in previous case, although it was run in async mode. This in contrast with the results for Kafka 0.8.1 in Jay Kreps’s post.
Single producer producing synchronously on a topic with replication factor of 3 and 6 partitions.
For this case, I increased the
64000 for each run to accommodate for sync mode.
Again, Pure Python and Pure Jython producers showed similar throughput, although max latencies are slightly higher than the previous run. It does seem that throughput in this case is mostly limited by code execution or
kafka-python producer client implementation. Throughput for Pure Java and Java interfaced Jython has almost halved compared to the previous case which makes sense here as we ran these producers in sync mode.
You can find everything related to these benchmarks here. I would love to hear some feedback on this while I am working on publishing results for consumer benchmarks in my upcoming post.
Originally published at mrafayaleem.com on March 31, 2016.