Benchmarking Object Pools
Object Pooling is a creational design pattern that uses a set of objects in a “pool” rather than creating and destroying them on demand. This pattern is very useful if the particular object is very expensive to create and if you want to improve the performance of your software by avoiding the object creation on demand.
Object Pooling can improve the performance significantly. However, the object pooling has to be used with care as the pooling changes the usual object lifetime and you have to make sure that object can be reused next time.
Why did I want to benchmark object pools?
In currently released versions of WSO2 API Manager, the StackObjectPool is used in few places. In last year, a customer reported a performance issue when there are high number of concurrent users accessing the APIs. The performance issue was that some response times were longer than expected. We were able to reproduce the issue in a development environment and we took a Profiling Recording using the Java Flight Recorder. From the profiling recording, we identified that the StackObjectPool used for key validation clients was causing a bottleneck. The main reason is that StackObjectPool will create new objects when all the objects in the pool are used by other threads and any returned objects to the pool are discarded if the pool is already full. The method to create a new object is a synchronized method, which caused many thread contentions. As a workaround to this performance issue, we increased the pool size. Then the time to wait to acquire a lock was less and therefore the response times were much better.
When working on this issue, I wanted to see how other Java Object Pool implementations work and find out what’s the best performing Object Pool. Easiest way to find which one is the best is to write a benchmark using JMH.
Benchmarking Object Pools with JMH
JMH is the best tool you need to write a proper Java Benchmark and produce accurate results. For more details, read Avoiding Benchmarking Pitfalls on the JVM.
The benchmark code does following steps.
- Borrow an object from the a Java Object Pool implementation. (The object is very expensive to create. It first consumes CPU using Blackhole.consumeCPU() method with 10,000 tokens and then a String object, which has 10,000 characters is created.)
- Simulate a delay using Blackhole.consumeCPU() method. Main reason is that usually an object is taken from a pool to do some operations and the object will be used for some time before returning the to the pool. I used 1,000 tokens.
- Use the object by executing a method to get data and using a black hole in JMH to consume the data from the object.
- Release the object.
I used following Java Object Pool implementations to benchmark.
- Commons Pool Generic Object Pool 1.6: Released 20 June 2017
- Commons Pool Stack Object Pool 1.6: Released 20 June 2017
- Commons Pool 2 Generic Object Pool 2.4.3: Released 28 October 2017
- Fast Object Pool 2.1.0: Released 24 August 2017
- Furious Object Pool 1.1.2: Released 13 March 2013
- Stormpot Blaze Pool 2.4.1: Released 02 June 2016
- Stormpot Queue Pool 2.4.1: Released 02 June 2016
- Vibur Object Pool 21.2: Released 29 November 2017
Commons Pool and Commons Pool 2 also have Java SoftReference based object pool implementations. See Commons Pool Soft Reference Object Pool and Commons Pool 2 Soft Reference Object Pool. These SoftReference based Object Pools were excluded from the benchmark.
When working on the benchmarks, I found out that Stormpot author Chris Vest has also written similar object pool benchmarks. He also has a medium story on Stormport 2.4 release and benchmark results. Even though Chris Vest has done a comprehensive benchmark, I wanted to continue with my project to analyze results myself and improve my knowledge on JMH. I also improved my code by looking at Stormpot benchmarks. I used latest versions of all object pool implementations.
Source code and running the benchmark.
My Object Pool Benchmarks code is at https://github.com/chrishantha/object-pool-benchmarks
There is a script named
benchmark.sh to run the benchmarks. Please build the benchmarks using
"mvn clean install" before running the benchmarks.
The benchmarks were executed in my Lenovo ThinkPad X1 Carbon 4th Generation Laptop.
Following are some details about the hardware and software.
- Model: Intel(R) Core(TM) i7–6500U CPU @ 2.50GHz
- CPUs: 4
- Threads per core: 2
- Cores per socket: 2
- Sockets: 1
- System Memory: 8GiB
- L1 Cache (Instructions): 64KiB
- L1 Cache (Data): 64KiB
- L2 Cache: 512KiB
- L3 Cache: 4MiB
- OS: Ubuntu 17.10
- Kernel: 4.13.0–16-generic
- Java: 1.8.0_152
- JMH: 1.19
Latest versions of all Object Pool Implementations were used. I used the command
mvn versions:display-dependency-updates to check latest updates to all dependencies in the project.
I used following command to run the benchmarks.
time ./benchmark.sh 2>&1 | tee benchmark.log
The script also ran the benchmarks using different number of threads for pools with different sizes.
- Forks: 2 (A new JVM is started for each fork)
- Warm-up: 5 iterations, 1 second each
- Measurement: 5 iterations, 1 second each
- VM Options: -Xms4g -Xmx4g (4GB Heap)
- Threads: 10, 50, and 100
- Pool Sizes: 10, 50, 100, and 150
- Benchmark Modes: “thrpt” and “sample”
- Time Unit: ms (milliseconds)
Different threads sizes and pool sizes were used to understand how Object Pool implementations behave under low to high contention.
The GC profiler was also included to measure the GC allocation rates. When using an object pool, the allocation rate should be less.
The benchmark took almost 3 hours to run. Following is the output of
Visualizing the results
The benchmark script created 3 result files (for each execution with different number of threads) and I used Python to visualize results.
I used an amazing Python library called “pandas”, which has easy-to-use data structures to analyze data. I used pandas to concatenate all CSV files from the benchmarks and analyze data. I used Seaborn, which is a Python visualization library based on matplotlib to visualize the results.
Summary of the results
Let’s compare the throughput score (ops/ms) for all object pools.
From this chart, the key observation is that the Stormpot Blaze Pool has the highest throughput when the number of threads are greater than the Object Pool Size.
Let’s compare the results for average sample time (ms/op) results.
The Commons Pool Generic Object Pool has the highest average sample time when the number of threads are greater than the Object Pool Size. That means that the Commons Pool is not very good at handling many concurrent threads.
When considering overall results, the Vibur Object Pool and Fast Object Pool have managed to perform well when considering the average sample time.
As I mentioned earlier, the GC profiler was included in the benchmarks to compare the allocation rates (MB/sec).
The allocation rates for Commons Pool 2 Generic Object Pool seems to be very high when the number of threads are higher than the object pool size. The allocations rates for Commons Pool Generic Object Pool are also high.
The Stack Object Pool also shows a considerable allocation rate when there are more threads than the object pool size. This is the exact behavior observed with the performance issue encountered with WSO2 API Manager as mentioned earlier. Stack Object Pool basically creates objects on demand when the pool is exhausted.
The Fast Object Pool and Stormport Blaze Pool have very low allocation rates. This is a good point as we expect that there will be less object allocations when using a pool of objects. I didn’t investigate why other pools have high allocation rates.
I compared the performance of different Java Object Pool implementations and by looking at all results, I can see that Stormpot Blaze Pool performs much better under high contention in terms of throughput and GC allocation rate.
The Commons Pool Generic Object Pool and Commons Pool 2 Generic Object Pool have performed poorly in terms of throughput and sample time when comparing with other object pools.
The Vibur Object Pool, Fast Object Pool, Stack Object Pool have performed better in terms of throughput and average sample time for all scenarios. However, there are high allocation rates for Stack Object Pools when there are more threads than the object pool size.
When looking at overall results, it seems that it’s better to have more number of concurrent threads than the object pool size to access the object pool.
Considering overall results, there is no clear winner. Nevertheless, the Vibur Object Pool, Fast Object Pool and Stormpot Blaze Pool have shown better results in terms of throughput and average sample time. However, there are long tail latencies for all these object pools, which cannot be completely ignored.
See Appendix for more charts analyzing the average sample time and sample time percentiles.
The Object Pools Benchmark code is tagged as v1.1.0 in GitHub and the benchmark results are uploaded to the v1.1.0 release: https://github.com/chrishantha/object-pool-benchmarks/releases/tag/v1.1.0
It is always better to do your own benchmarks considering your use case. You can use the code in GitHub and modify it depending on your requirements.
This section compares the average sample time and sample time percentiles.
Let’s compare the sample time percentiles (ms/op). Please make sure to look at Y-axis scale (as it’s different for each figure)
Following charts compare the average sample time for individual object pools.
Following charts compare the sample time percentiles of each object pool.