Released: Stormpot 2.4

Version 2.4 of Stormpot, my Java object pooling library, has been released to Maven Central. You can add it as a Maven dependency to your projects like this:

<dependency>
<groupId>com.github.chrisvest</groupId>
<artifactId>stormpot</artifactId>
<version>2.4</version>
</dependency>

This version is fully backwards compatible with the previous 2.x versions. It is a performance release, that also adds a few features. You can find the documentation here: http://chrisvest.github.io/stormpot/

Performance

The key performance advancement made in this release, is the use of lazySet for releasing objects back to the pool, instead of the compareAndSet operation that was used previously. This chart shows how the performance of Stormpot has evolved, from 2.0 to 2.4:

The benchmarks were run on a machine with 4 Xeon E5–4610 v2 CPUs running at 2.30GHz. Each CPU has 8 cores, and each core can support 2 hyper-threads, for a total of 64 logical cores.

The chart shows that the BlazePool implementation has improved throughout the versions, with the greatest improvement in 2.4. Depending on the CPU, a claim+release cycle now takes only ~15–30 nanoseconds. The QueuePool has also been improving, but not enough to make it stand out. All QueuePool versions are in the same line over the X-axis.

The first version of BlazePool, which was introduced in 2.1, had problems with False Sharing. This was fixed in 2.2, which explains the performance gain in that version. Performance did not improve noticably in 2.3. In 2.4, the number of lock:cmpxchg instructions needed per claim+release cycle dropped from 2 to 1, by making release use the lazySet API. With this change, Stormpot can now perform over 2 billion claim+release operations per second, and scales linearly as long as the hardware has more oomph to give.

Since the introduction of BlazePool, Stormpot has been the fastest open source object pool implementation for Java. This chart shows how different object pool implementations stack up:

Notice that the Y-axis is logarithmic. If it wasn’t, only BlazePool and the ConcurrentBag from HikariCP would stand out from the X-axis.

It is clear that BlazePool is in a league of its own. The benchmark for the ConcurrentBag used in HikariCP is not representative of the performance of HikariCP, because the integration into the benchmark is different from the integration into the HikariCP library. It is, however, indicative of the quality of the engineering that goes into that connection pool implementation. All other implementations that cross the 10M ops/sec threshold, do so only with a single-threaded workload, and scale negatively from there. BlazePool and the ConcurrentBag from HikariCP are the only implementations that get faster as more threads are added.

Both of these benchmarks measure the throughput of the pools when they have more objects available, then the benchmarks have worker threads. This means that in principle, no thread needs to starve, waiting for another thread to release an object before it can continue. When we limited the number of objects to, say, 10 objects, and introduce contention, an ideal pool implementation should stay at the performance of 10 threads even as more threads are thrown at it. Specifically, it should not scale negatively and get slower. This chart shows how the different pool implementations — same as the ones above — behave in this situation:

All implementations appear to either maintain their performance, or drop slightly. The ConcurrentBag in HikariCP appears to quickly drop off by a large amount, when it experiences contention. The QueuePool implementation get into trouble at around 29 threads. I have not investigated the reason, but my guess would be that it crosses a thread-spinning threshold in the underlying LinkedTransferQueue, and starts to block the worker threads, instead of letting them spin. Since this benchmark is all about claiming and releasing objects as fast as possible, spinning threads is much more likely to be a winning strategy than blocking. The BlazePool implementation also uses LinkedTransferQueue, but its thread-local cache allows threads to hog objects, and get the cycle count up. In other words, the reason BlazePool performs so well in this benchmark, is because it is extremely unfair.

Features

Most of the work in 2.4 went into the performance improvement mentioned above, but a few new features have also snuck in.

It is now possible to explicitly expire a claimed object, using the expire method. You would use this if it is possible for objects to expire while they are claimed. For instance, you might choose to relax your expiration policy for performance reasons, and optimistically claim objects. When the optimistic assumptions fail, you can explicitly expire the object, claim a new one, and redo any work that was started with the previous object.

A new CompoundExpiration has been contributed by Guillaume Lederrey. Thanks! It combines two Expiration instances into one, calling them in order. This way, you can easily combine a TimeSpreadExpiration with a domain-specific Expiration instance, and not worry about the nitty details of how to correctly implement time-based expiration.

Guillaume also contributed improved variance bounds for the existing Expiration implementations, making them more widely useful without getting into trouble with the type checker. Again, thanks!

Up Next

The next release of Stormpot will be 3.0. This will be a backwards incompatible release, that will clean up some of the APIs, add Java 8 as a minimum requirement, and add some lambda love and some convenience APIs. As usual, there is no planned date for the next release. It will be whenever it is ready.