Traditionally, in high performance systems, repeatedly allocating and deallocating memory has been found to be costly. (i.e a malloc vs free cycle). Hence, people resorted to building their own memory pool on top of the OS, dealing with fragmentation/free list maintenance etc. One of the popular techniques to doing this being slab allocators.
This post is about doing a reality check about the cost of explicitly doing an
free() cycle, given that most popular OS es, specifically Linux gotten better at memory allocation recently. Along the way, I will also compare the JVM memory allocations (which should be faster since we pay a premium for the freeing of memory via garbage collection).
So, all set here we go; The following is a comparison of native c allocations, java jvm based allocation, java direct buffer allocation. For each of them we measure the following.
Allocation/free rate (rate/s): This gives you an upper bound on the single threaded throughput of your application. You cannot possibly do more operations than this, without building your own memory pools.
Cost of allocation in mircoseconds (cost-μs): This gives you a lower bound on the latency. You absolutely have to pay atleast this cost per operation without building your own memory pool.
I make a million allocations (except for 100kb case,where I do 100K allocations) varying the object size. The machine is a beefy linux server running Linux 2.6 kernel. For java, I use the following jvm options.
-server -Xms22g -Xmx22g -XX:+AlwaysPreTouch -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 -XX:SurvivorRatio=2
Swap is turned off and pretouch flag ensures jvm heap is warmed up.
The 10kb and 100kb runs for Java heap allocation had garbage collections, * while direct allocation does not have any collections.
- 10kb run : 5 ParNews, ~2ms
- 100kb run : 95 ParNews, 3ms avg, 30ms max
Here are some observations.
- In general, all allocation costs increase with object size
- Small object allocations are faster in java as expected than malloc()
- For some reason free() seems to be cheaper/faster than malloc(). Fragmentation tests will tell the real story.
- Java heap allocations considerably slow down with larger objects, when GC kicks in. i.e under memory pressure. native c based allocation don’t seem to degrade that drastically from 10kb to 100kb
- Java heap allocations are always faster than direct allocations, which is expected.
- Java direct allocations are supposed to be just a JNI malloc() call. But with 100kb, it seriously slows down and might become a real bottleneck. If you have smal objects mostly, you may be able to get away.
You can get the C program and java program used for this here : malloctest.c, AllocTest.java. Next, we will test access speed by basically copying memory (copy rate/s & copy cost-μs) and writing a specific index in the byte array (write rate/s & write cost-μs). We will be using 100kb byte buffers and average across 1000 operations.
- index updates are fastest with c and slowest in java direct buffer
- No huge difference between accessing/copying direct buffers vs jvm heap byte