How fast is Apache Pulsar’s client

5 min readOct 23, 2023

Before we jump into the main topic, it’s worth giving a quick explanation of what Apache Pulsar is. In simple terms, it’s a message bus that encompasses functionalities of both Apache Kafka and RabbitMQ, and has a long list of useful features. If you want more details, there are many blog posts and videos on YouTube from StreamNative (the company that relates to Pulsar as Confluent does to Kafka). If you want to know who uses Pulsar, then check out the videos from various Pulsar Summits (one of them starts in just 3 days).

In recent months, as someone who maintains the .NET client for Apache Pulsar, I’ve come across two similar issues (one two). Both hinted at a lack of satisfaction with its performance. The problem was — how could I verify these claims? The standard test — setting up a local Pulsar cluster — didn’t show much difference compared to the official Java client, reported to be much faster. Because of this, I didn’t see a need to dig deeper.

But then the article emerged which claimed a big performance boost for .NET 8 (with the only shame that F# appeared exactly zero times there). However, when I tried upgrading .NET in test client project, I saw no performance improvement, which was obviously a sign that the bottleneck happened to be in the local docker environment. And that was the moment when I decided to move forward and write specialized Pulsar server optimized to work at maximum speed just for client testing. So, meet https://github.com/Lanayx/PulsarClientPerfTester ! Making it was not a very big deal, since most of the code was taken from Pulsar.Client implementation and reduced just to support the test scenario.

So, let’s move on with the test scenario. Actually there are two of them:

Produce data with the maximum speed (default settings)
Consume data with the maximum speed (default settings)

Why default settings? Because it’s what is used the most and also shows the quality of the defaults themselves. The only exception is blockIfQueueFull settings, it should be set to true to avoid failures. As for data itself I just took example from one of the issues above: array of 750 bytes, and it is a fairly typical size in real-world use.

The benchmark code for producer is very simple:

var n = 10000000;
var bytes = new byte[750];
for (var i = 0; i < n; i++)
{
    await producer.SendAndForgetAsync(bytes);
    if (i % 100000 == 0)
        Console.WriteLine($"Sent {i} messages");
}

likewise for consumer:

var n = 10000000;
for (var i = 0; i < n; i++)
{
    var message = await consumer.ReceiveAsync();
    await consumer.AcknowledgeAsync(message.MessageId);
    if (i % 100000 == 0)
        Console.WriteLine($"Received {i} messages");
}

With that I’ve made the baseline for Pulsar.Client on .NET 6

# Pulsar.Client 2.13.2, .NET 6
Sent 10000000 messages in 52542ms. Speed: 190K msg/s
Received 10000000 messages in 53967ms. Speed: 185K msg/s

The results seemed satisfactory at first, almost 200K messages per second, which I initially presumed was a decent performance. However, my perspective shifted when I put the Java client to the test.

# Java pulsar-client 3.1.0, OpenJDK 21
Sent 10000000 messages in 20766ms. Speed: 481K msg/s
Received 10000000 messages in 21129ms.Speed: 473K msg/s

Which means Java client was more than twice faster! Let’s see what upgrade from .NET 6 to .NET 8 gave me:

# Pulsar.Client 2.13.2, .NET 8 RC2
Sent 10000000 messages in 39390ms. Speed: 253K msg/s
Received 10000000 messages in 51287ms. Speed: 194K msg/s

That was an improvement for sure, but certainly not good enough. So I started exploring the ways to improve performance and found out that the key factor was hash calculation for the messages, namely CRC32C. Modern processors already support this operation natively, however, the intrinsic for it is only supported starting from .NET 8, so the decision was obvious — client should target .NET 8. This is a breaking change, so Pulsar.Client 3.0 (a new major version) will be released as soon as .NET 8 is officially released.

Together with moving to better hash calculation I’ve made several other improvements without sacrificing readability: reduced amount of byte array copying by heavy usage of MemoryStream and RecyclableMemoryStream, changed several tuples and discriminated unions to structs, removed logging allocations in hot spots and some others. So, here are the final results:

# Pulsar.Client 3.0.0-beta.2, .NET 8 RC2
Sent 10000000 messages in 11580ms. Speed: 863K msg/s
Received 10000000 messages in 12096ms. Speed: 826K msg/s

I also wanted to compare other official clients, but this appeared to be not an easy task for me.

DotPulsar client doesn’t support batching yet, so I couldn’t use it for producer test (since it would be not fair), but it didn’t work for me even for consumer test — for some reason it failed to connect to my perf test server
Go client is pretty good on consume, it shows 690.49K msg/s, which is better than Java. However I’m not very familiar with goroutines, so I couldn’t write exactly the same test code for producer in go, the closest I reached was 81.96K msg/s
Node client refused to work for me if i tried to produce or consume more than 10K messages, showing 50K msg/s speed on 10K.

You can find all the examples and benchmark code in the repo and try reproducing results yourself, the results above where received on my Lenovo laptop with 32GB of memory and AMD Ryzen 5 processor. As for now it seems that Pulsar.Client gives the best client speed for Apache Pulsar at the moment of writing the article.

Before wrapping up, I’d like to highlight several things:

I’m happy that the project is written in F#, modeling message types as discriminated unions, internal workers (like producer or consumer) as actors and entities as records has greatly simplified the code and together with the clever type system and low clutter rate made it easy to refactor code or add new features.
I’d like to say thank you to the F# and .NET communities in Telegram for support and performance hints, hopefully that collaboration was useful for both sides.
Feel free to fix the issues mentioned above or adding code for other clients to the PulsarClientPerfTester repository. The only requirement is that the code should use the same settings as aforementioned clients.
I currently work full time and don’t use Pulsar at work, so I don’t actively add new features to the client in my free time. If you or your company are interested in new features, you can consider supporting such work through GitHub sponsorship program.

That’s it, you can find me in Pulsar Telegram channel to discuss related topics or follow in twitter.

How fast is Apache Pulsar’s client

Written by Vladimir Shchur