Apache Kafka: How to Test Performance for Clients Configured with SSL Encryption

Paul Robu
METRO SYSTEMS Romania
5 min readNov 4, 2020

Apache Kafka can add business value, because it is built for real-time data ingestion, data integration and messaging at scale. It works admirably with very large amounts of data arriving at high velocity, allowing to achieve high throughput, high durability and high availability. This is why more and more of the products that we build for Metro are having Apache Kafka and event streams at the heart of their architecture.

Depending on the business requirements, we have different use cases and different Kafka configuration settings to optimize. This article describes a method that helps to make the right design choice, by understanding how to test and measure the performance of an Apache Kafka cluster configured with SSL encryption.

By default, data in Kafka is transmitted as plaintext. Encryption ensures that the data exchanged between clients and brokers remains secret in-between:

Plaintext and SSL Encrypted Data Transfer in Apache Kafka
Plaintext and SSL Encrypted Data Transfers in Apache Kafka

To access Kafka over SSL, both producers and consumers need to be configured with the security.protocol, truststore and keystore from the brokers. So, first step is to have a config file (named ssl-perf-test.properties in this example) with config settings like below:

security.protocol=SSL
ssl.truststore.location=/path/to/your/kafka.client.truststore.jks
ssl.keystore.location=/path/to/your/kafka.client.keystore.jks
ssl.truststore.password=your_SSL_certificate_passphrase
ssl.keystore.password=passphrase

Now, there are specialized tools to test Kafka’s performance like Pepper-Box JMeter plugin or OpenMessaging, but out-of-the-box any Kafka installation comes with command-line tools designed to do just that. To learn more about them, run:

$ bin/kafka-producer-perf-test.sh --help
$ bin/kafka-consumer-perf-test.sh --help

As supplementary info, Kafka also has a built-in test framework named Trogdor, that can be used to perform fault injection or to execute workloads for a test system. The source repository also includes an extensive test suite for system integration and performance tests.

Test the Producer Performance

First let’s create a topic for the test data. The following example will result in a topic named ssl-perf-test with 16 partitions, a replication factor of 3 and retention time of 24 hours. Replace broker0 as needed for the Kafka cluster in your environment:

$ bin/kafka-topics.sh \
--create \
--topic ssl-perf-test \
--partitions 16 \
--replication-factor 3 \
--config retention.ms=86400000 \
--config min.insync.replicas=2 \
--bootstrap-server broker0:9093 \
--command-config /path/to/ssl-perf-test.properties

Then run the producer performance test script with different configuration settings. The following example will use the topic created above to store 3 million messages with a size of 1 KiB each. The -1 value for --throughput means that messages are produced as quickly as possible, with no throttling limit. Kafka producer related configuration properties like acks and bootstrap.servers are mentioned as part of --producer-props argument, while our SSL config file that we mentioned earlier it’s being referred as value for the --producer-config argument:

$ bin/kafka-producer-perf-test.sh \
--topic ssl-perf-test \
--throughput -1 \
--num-records 3000000 \
--record-size 1024 \
--producer-props acks=all bootstrap.servers=broker0:9093,broker1:9093,broker2:9093 \
--producer.config /path/to/ssl-perf-test.properties

The output, pretty much self explanatory, will look similar to what we have next:

141634 records sent, 28326.8 records/sec (27.01 MB/sec), 629.6 ms avg latency, 1758.0 ms max latency.179121 records sent, 35795.6 records/sec (34.14 MB/sec), 812.8 ms avg latency, 2772.0 ms max latency.352998 records sent, 70500.9 records/sec (67.23 MB/sec), 532.0 ms avg latency, 2499.0 ms max latency.453196 records sent, 90639.2 records/sec (86.44 MB/sec), 361.9 ms avg latency, 1240.0 ms max latency.500870 records sent, 100174.0 records/sec (95.53 MB/sec), 328.4 ms avg latency, 1139.0 ms max latency.259630 records sent, 51647.1 records/sec (49.25 MB/sec), 603.1 ms avg latency, 1814.0 ms max latency.228931 records sent, 45504.1 records/sec (43.40 MB/sec), 612.3 ms avg latency, 2626.0 ms max latency.178849 records sent, 35577.7 records/sec (33.93 MB/sec), 1082.5 ms avg latency, 4023.0 ms max latency.198005 records sent, 39601.0 records/sec (37.77 MB/sec), 871.6 ms avg latency, 3182.0 ms max latency.255892 records sent, 51076.2 records/sec (48.71 MB/sec), 643.8 ms avg latency, 1948.0 ms max latency.230971 records sent, 46194.2 records/sec (44.05 MB/sec), 678.1 ms avg latency, 3144.0 ms max latency.3000000 records sent, 52723.150735 records/sec (50.28 MB/sec), 586.06 ms avg latency, 4023.00 ms max latency, 370 ms 50th, 2032 ms 95th, 3089 ms 99th, 3707 ms 99.9th.

In this example, roughly 53k messages are produced per second on average, with a maximum latency of approx. 4 seconds.
The 95th percentile latency of 2032 ms means that for 95% of messages (2.85 mill. out of 3 mill.) it took less than two seconds between when they were produced and when they were written on all broker’s filesystem.
In the same way, the 3707 ms value for the 99.9th percentile means that 1 in 1000 messages experienced a delay of ~3.7 seconds from when it was produced until it was published and committed on broker’s segments.

The above command example will create randomly generated byte records with the specified size. Another option for producing test data is to randomly read it from a file, by specifying the --payload-delimiter and --payload-file arguments. See script’s help output for more details.

Test the Consumer Performance

To run the consumer performance test script as an example, will read 3 million messages from our same topic ssl-perf-test, via brokers SSL port 9093 with the configs from our earlier ssl-perf-test.properties file. Additional parameters can be specified, see script’s help output for details.
For better output readability, use jq to transpose the rows and columns:

$ bin/kafka-consumer-perf-test.sh \
--topic ssl-perf-test \
--broker-list broker0:9093,broker1:9093,broker2:9093 \
--messages 3000000 \
--consumer.config /path/to/ssl-perf-test.properties | \
jq -R .|jq -sr 'map(./",")|transpose|map(join(": "))[]'

The output (for Apache Kafka 2.6.0 as the latest version at the time of this writing) will show the total amount of data consumed (in MB and number of messages) and the throughput (MB.sec and nMsg.sec) which are of interest. As inferred from the source code, it also shows fetch timestamp and rebalance timestamp in Unix epoch time. Expectation according to this KIP is to show total rebalance time for the consumer group and the total fetching time for the group, excluding the rebalance time. For example:

start.time: 2020-10-26 09:21:39:216
end.time: 2020-10-26 09:22:00:973
data.consumed.in.MB: 2907.0630
MB.sec: 133.6151
data.consumed.in.nMsg: 3000270
nMsg.sec: 137899.0670
rebalance.time.ms: 1603704103558
fetch.time.ms: -1603704081801
fetch.MB.sec: -0.0000
fetch.nMsg.sec: -0.0019

Test the End-To-End Latency

End-to-end latency is the time between when a message is produced and when it is consumed. This is especially important for real-time applications.

If we look at the content of kafka-consumer-perf-test.sh script for example, will see a call for kafka-run-class.sh script, with kafka.tools.ConsumerPerformance class as argument. So in the same way we can try running:

$ bin/kafka-run-class.sh kafka.tools.EndToEndLatency --help

which will show:

USAGE: java kafka.tools.EndToEndLatency$ broker_list topic num_messages producer_acks message_size_bytes [optional] properties_file

So for example, to produce and consume 10k messages of 1 KiB each in our ssl-perf-test topic, with acks value set to 1 (leader acks) and encrypted data transfer via SSL port 9093, we will have:

$ bin/kafka-run-class.sh kafka.tools.EndToEndLatency \
broker0:9093,broker1:9093,broker2:9093 \
ssl-perf-test 10000 1 1024 /path/to/ssl-perf-test.properties

which will produce an output similar to the following:

0 216.022156
1000 2.475187
2000 2.6165819999999997
3000 3.147
4000 2.4254409999999997
5000 2.512002
6000 1.867163
7000 2.291893
8000 2.103498
9000 2.441616
Avg latency: 2.8909 ms
Percentiles: 50th = 2, 99th = 13, 99.9th = 36

Further Reading 📚

For better understanding of end-to-end latency versus producer and consumer latencies, a good read is https://www.confluent.io/blog/configure-kafka-to-minimize-latency/.

--

--