The Cost of Virtual Machine

Published in

Bartek Andrzejczak

5 min readMay 17, 2015

In one of my last posts I wrote about the cost of garbage collection. Inspired by the talk of Nikita Salnikov-Tarnovski at Warsaw JUG titled Heap, off you go I’ve decided to compare C++ and Java performance based on the code written by Nikita and my C++ port. In the second and last part of the series on performance, I will tackle the issue of highly optimized code.

Java code for this blog post uses almost no garbage collection due to the use of the Unsafe class which helped to allocate memory off the heap without the use of type and boundary checking. C++ code also uses native-allocated buffer of bytes to store all data. This is the clash of the titans. How much CPU time do we loose every day by choosing byte code over assembly code? Can JIT compiler make Java comparable to C++? Let’s find out!

How to make Java go faster

In the last blog post we’ve seen that Java code adding 50M objects to the list and then calculating something based on that list is pretty slow when compared to C++, but the difference wasn’t that frighting. On average C++ was 18% faster. What shocked me, was the throughput of Java application — ratio of the code execution to whole program execution — it was around 10%. It means that for 9 out of every 10 seconds our application stops completely and hands execution over to the garbage collector. This looks like a great optimization field. Who needs garbage collection anyway? We can take care of our garbage ourselves. All we need is a nice clean space allocated in the native memory section. Type checking? Constraints on buffer length? Who need those?! It’s faster to take care of all the possible stuff by ourselves.

The scary Unsafe

In C++ world you access native memory on a daily basis. Just write int number = 5 and voila. In Java on the other hand all your memory allocation and cleanup is handled by the virtual machine. It finds a place to put your object in, it cleans it, prepares it, initializes everything and gives you a ready object. Finally, it will clean this object too. But do we have to use its services? As it turns out there are some ways of by-passing garbage collector all-together and messing with the bytes by ourselves.

Accessing native memory doesn’t have to be that dangerous. There is a class that can handle it pretty graciously — ByteBuffer it’s called. There can be a direct and indirect ByteBuffer where in case of direct one JVM will do it’s best to save it’s content into native memory directly (without the use of other buffers). ByteBuffer lets you unbuckle your belt, but still preserve all the airbags. It will clean the space you’re allocating, it will make index 0 the start of your buffer and it will throw an exception when you try to access data from outside the Buffer. ByteBuffer works outside the realm of garbage collector, but it’s still inside JNI.

Do you want to go really hardcore? Let me discourage you by the following output of one of my programs:

Yup. That’s a Segmentation Fault. You can access the native memory with no limits. You can read values from wherever the OS allows you. If you make a mistake, be prepared to get some garbage result in the best case scenario, and an error such as above in the worst case scenario. That’s the Unsafe realm, buddy.

The benefits of Unsafe

Now let’s talk about some bright sides of using Unsafe class. Who with the right mind would use such a thing in a JVM environment? If you want to squeeze everything you can from the JVM, you’ll know where to turn. Is it worth to use Unsafe? It must be! After all when Oracle wanted to delete Unsafe from the JVM and ask companies if they’re using it, there was some significant number of teams, including Apache Cassandra team, protesting the proposed changes.

Let’s look at the basic code provided by Nikita:

This is an inner class representing the single object, that will be written into the memory. As you see you have to explicitly define offsets of your class fields. Of course there are no “real fields” here. All you’ve got are getters and setters which access native memory. I know it’s a pretty farfetched example, you could move objectOffset into constructor, make getters and setters private and create some nice public API, but it’s not the point of this example. The point is that you could have objects from your business domain saved into memory with Unsafe.

Let’s follow the Unsafe lifecycle. First you need to allocate buffer. As you’ll see below, it’s not that easy. You can’t just create it with new Unsafe(). Unsafe is a Singleton with a private constructor and an accessor method called getUnsafe. The problem is with this method’s code:

As you can see it’ll throw SecurityException if your class loader isn’t system domain loader. So far I’ve seen two ways of getting handle of Unsafe instance:

By accessing the private field with reflection

By making it’s constructor accessible with reflection

Here you go. Now you can freely use it. Other then the thrill of getting segfault on production, it’s pretty much everything that’s different here from the ByteBuffer. This is how you allocate the memory with Unsafe:

And, as shown before, this is how you can write to the buffer, and read from it:

Remember, that the fields are not initialized by the JVM. If you call get before set, you’ll most likely get some garbage.

Performance

Let’s skip the C++ code, cause it’s not really important here, and frankly, it’s also very similar to the Java code above. Let’s jump straight to performance evaluation.

I’ll use the same example as before — adding 50M objects to the list (here it’s buffer), and then computing some values for all of them. Here are the results:

The results are pretty depressing for me as a JVM user. You could even use the world knockout. But is is really that bad? Highly optimized JVM code is still more than 20 times slower than optimized C++ code. So what? Does JVM really needs such a level of optimization? I think it’s clear by now, that Java and JVM in general doesn’t suit real-time low-level systems or some integrated circuits.

The other thing is, that while Java with the Unsafe is 20x slower than optimized C++, it’s still much faster than regular Java code operating on an ArrayList. This is great. It gives us an opportunity, to optimize our applications, if every other option fails.

Disclaimer

I don’t guarantee, that the code that Nikita wrote is the fastest code for this problem on the JVM. I also cannot guarantee, that my optimized C++ code is as fast as it can be. All the sources can be found on GitHub, so if you find a way to make it faster, don’t be shy and send me some message.

Original Nikita’s Java JMH Benchmark: https://github.com/iNikem/offheap
My C++ port built with Google’s benchmark: https://github.com/bandrzejczak/cpp-benchmark