Comparing Kotlin and Go implementations of the Monkey language II: Raiders of the Lost Performance

6 min readSep 24, 2021

Previously….

In our previous episode, we compared two implementations of the Monkey language, one in Go, the other one in Kotlin. We also discussed performance; Go was slower running Monkey code on interpreter (Eval) mode (26.73s vs 11.93s), but faster on VM mode (5.89s vs 11.36s).

The Go state

I forgot to mention in the previous post, is that I was compiling my Go code with version 1.16.4. Just by changing my Go version to 1.17.1, I can see a significant increase in performance. Now interpreter mode takes 19.38s, and the VM mode takes 5.18s. Outstanding performance increase just by changing the compiler, no changes in the code.

The big unknown: Memory

Raw performance isn’t the only concern; memory is important too. But how we can know, in an efficient way, how much memory our application consumes?.

On OSX you can use the /usr/bin/time -l command:

/usr/bin/time -l ./fibonacci -engine=vm

It will show you many statistics from your application:

             9695232  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                2389  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                3059  signals received
                 292  voluntary context switches
               18429  involuntary context switches
         64464600283  instructions retired
         21829906372  cycles elapsed
             6828032  peak memory footprint

We only need the first one maximum resident set sizefor our case, it shows the figure on bytes, but we can translate it to MB.

Is that Python?

In summary, the state of Go at this point:

Let’s see how we can improve Kotlin:

The Kotlin state

I took several steps to try to improve Kotlin performance. Disclaimer: I’m using JDK 11.0.12 (Zulu) for these tests

Using `copyOfRange`

One of the comments that I received (On a GitHub issue, nonetheless) was that I could increase the performance of Monkey by trying to use some of the JVM features to manage arrays like System.arrayCopy . The problem with that approach is that it doesn’t play that well with UByteArray, but it give a good idea of how tackle it.

Kotlin comes with an extension function for arrays named copyOfRange that includes an implementation for UByteArray. And it works, the code is now significantly faster. But, of course, now we’re evaluating memory as well, so here we go.

Interpreter mode doesn’t have any significant improvement, but VM mode jumped from 11.36s to 8.05s.

The state of Kotlin at this point:

Eval mode on Kotlin is already very fast, and VM mode is getting closer… but the memory consumption. Yikes. Go is almost 40 times more efficient on memory usage.

Using GraalVM as JDK

What is GraalVM?

By their website:

… a high-performance JDK distribution. It is designed to accelerate the execution of applications written in Java and other JVM languages while also providing runtimes for JavaScript, Ruby, Python, and a number of other popular languages. GraalVM’s polyglot capabilities make it possible to mix multiple programming languages in a single application while eliminating any foreign language call costs.

With no code modification, just by using GraalVM, we can see an increase in performance.

Eval mode from 11.44s to 7.34s (Now even faster than the previous VM version) and VM mode from 8.05s to 6.78s. But there is no free lunch; you’ll pay for it with memory consumption.

The performance increased, but the memory is now twice as before.

GraalVM Native Image

One of the nice tricks from GraalVM is that you can compile your JVM application into a native one:

With GraalVM you can compile Java bytecode into a platform-specific, self-contained, native executable — a native image — to achieve faster startup and smaller footprint for your application.

This change isn’t a straightforward one. First, I modified my build.gradle.kts file and install a couple of tools (More info on my repository README.md).

When I tried to compile it, I saw this message:

[monkey-native:45744]    classlist:   1,358.91 ms,  0.96 GB
[monkey-native:45744]        (cap):   4,357.96 ms,  0.96 GB
[monkey-native:45744]        setup:   6,389.07 ms,  0.96 GB
[monkey-native:45744]     (clinit):     306.39 ms,  1.74 GB
[monkey-native:45744]   (typeflow):   3,980.27 ms,  1.74 GB
[monkey-native:45744]    (objects):   7,425.63 ms,  1.74 GB
[monkey-native:45744]   (features):     450.96 ms,  1.74 GB
[monkey-native:45744]     analysis:  12,531.56 ms,  1.74 GB
[monkey-native:45744]     universe:   1,974.49 ms,  1.76 GB
Warning: Reflection method java.lang.Class.getEnclosingMethod invoked at kotlin.jvm.internal.ClassReference$Companion.getClassSimpleName(ClassReference.kt:169)
Warning: Reflection method java.lang.Class.getEnclosingConstructor invoked at kotlin.jvm.internal.ClassReference$Companion.getClassSimpleName(ClassReference.kt:170)
[monkey-native:45744]   Warning: Aborting stand-alone image build due to reflection use without configuration.
   [total]:  22,467.11 ms,  1.76 GB
Warning: Use -H:+ReportExceptionStackTraces to print stacktrace of underlying exception
[...]
[monkey-native:46140]    classlist:   1,319.11 ms,  0.96 GB
[monkey-native:46140]        (cap):   2,394.58 ms,  0.96 GB
[monkey-native:46140]        setup:   4,369.70 ms,  0.96 GB
[monkey-native:46140]     (clinit):     247.87 ms,  1.74 GB
[monkey-native:46140]   (typeflow):   3,542.73 ms,  1.74 GB
[monkey-native:46140]    (objects):   3,276.68 ms,  1.74 GB
[monkey-native:46140]   (features):     510.05 ms,  1.74 GB
[monkey-native:46140]     analysis:   7,961.70 ms,  1.74 GB
[monkey-native:46140]     universe:     799.56 ms,  1.74 GB
[monkey-native:46140]      (parse):     850.97 ms,  1.74 GB
[monkey-native:46140]     (inline):   1,020.52 ms,  1.77 GB
[monkey-native:46140]    (compile):   6,896.97 ms,  2.41 GB
[monkey-native:46140]      compile:   9,687.63 ms,  2.41 GB
[monkey-native:46140]        image:   2,016.34 ms,  2.41 GB
[monkey-native:46140]        write:     602.54 ms,  2.41 GB
[monkey-native:46140]      [total]:  27,036.43 ms,  2.41 GB
[...]
Warning: Image 'monkey-native' is a fallback image that requires a JDK for execution (use --no-fallback to suppress fallback image generation and to print more detailed information why a fallback image was necessary).

In this case, because somehow we use reflection on our codebase, it cannot create a native image directly; it creates a fallback image, i.e. an image that still requires the JDK.

There are three ways to fix this. I tested all and more or less are equivalent (at least in my case), but I didn’t track compilation time.

Delete every use of reflection on your source code. In my case is just one line, but in other cases it can be impossible.
Add a --no-fallback argument. It’ll take your reflection code and try to generate static code.
Add a -H:ReflectionConfigurationFiles=<list of json files separated by comma> argument. It’ll replace specific reflection calls for static code. For more information, check on the documentation. For my particular case, I use this configuration.

Now, using option number two, I have an adorable native image. Let’s tested it… and is worst!

Well, not all. Memory consumption is a lot better, and actually, Eval mode isn’t far from Go’s Eval mode.

I have mixed feeling about GraalVM, as JDK is fantastic, if you can afford the extra memory, but for native applications is pretty underwhelming. On the other hand, it is a testament to how good the JVM optimisations are. It also can be that my particular use case isn’t a good fit for GraalVM native image.

I hear stories of people using it on backend environments with pleasing results. It makes sense as backends are a common use case for JVM applications (The market for Monkey implementations on the JVM is very tame these days).

A not-so-obvious fix

One thing that I realise after some point is that we can write an implementation of Fibonacci that runs better on Monkey. So, let’s have a look at some Monkey bytecode.

Our original implementation

It generates a pseudo-bytecode like this:

When runs, it executes 425'098.998 instructions.

Let’s write a faster implementation.

And looks at the pseudo-bytecode

It has fewer constants, and the bytecode is shorter.

When it runs, it executes 328'467.734 instructions.

Let’s see how it runs using our fastest implementations.

With all this trickery, we made Kotlin get very close to Go on performance.

Conclusion

We had a lot of fun comparing implementations and tweaking things here and there to make it faster. But in the end, Go is faster on VM mode (for a fraction now), and it consumes a lot less memory than Kotlin (no surprises here). On the other hand, Kotlin still has an advantage on Eval mode, getting close to VM mode.

In the next post, we’ll have a look at Kotlin Native. See you later.