About good programmers, good programs and performance

Alexander Nozik
12 min readFeb 6, 2019

--

At first I wanted to write an article about high performance calculations in Kotlin, but after all discussions going around lately, it became much more philosophical. Don’t worry, there will be something about performance and yes you can use Kotlin for high-performance tasks.

So, what they say about good programmers:

A good programmer could write a good program in any language.

Well, we all know that but still, there is always a question of choice. And while we can’t always chose a programmer, we can chose language, so the main question is what we should call a good language. There is no definition of that, but in my opinion there are two criteria:

  1. A good language is a language that good programmer will select to write his good program.
  2. A good language is the one in which a not-so-good programmer can still write a good program.

The first criterion seems more objective because who can better estimate quality of the language than the professional? Usually professionals prefer expressive languages which gives them freedom and numerous tools to write the code they own way. Of course in many cases the opinion of specific expert could be affected by his previous experiences, but it is not a problem. Problem arises from criterion number two. In real life, we do not have a lot of good programmers. For each programming genius, who can write complicated frameworks in C and read machine code from screen, we will have thousands people, who will use and, gods, save us, maintain, libraries made by those selected few. For those thousands, simplicity and tooling are much more important than expressiveness and flexibility.

So we came to the conclusion that we need two things from programming language: expressiveness and simplicity.

Also, one needs to remember that not all language users use it in the same way. We can separate them in three groups:

  1. Language developers. Those are actually good programmers (usually).
  2. Library developers.
  3. Users.

In fact, since Users is the most numerous group, the simplicity usually beats expressiveness. Look at Python or Go, or JavaScript. Those languages are simplified to the bare bones, almost without tools for complex programming and people love it. Of course, we are talking about Users here (not the good programmers). Developing a new complicated library in Python or JavaScript could be really painful.

But what about performance?

Performance over convenience?

We’ve talked about good programmers and good languages (from language point of view), but what about good programs? And what about performance. If you read discussions about program languages (and particularly, never ending discussion about which language is better), you will notice that sooner or later, it will be all about performance. “Language A crunches number 20% faster than language B”. “No, you are wrong, because the program in B could be optimized, and you are not using correct benchmarking procedure.” And so on. The performance became the holy symbol for all those discussions, but let me tell you a blasphemous thing:

Performance does not matter.

Let me explain. Do you remember my first statement? About good programmers? A good programmer can write super-high-performance code in any language. If you want an example, take Python. It is known as one of the slowest languages, but numpy/scipy libraries are taken as a performance standards everywhere and used in most performance-critical calculations. Those parts are not written in Python, but does it really matter? Python library developers and users could use them as if they are part of the language. User does not care about language purity, he cares about usability.

Also one needs to remember that it is performance of the program what is important, not the abstract performance of language compiler. There is important distinction here. The performance comes in two pieces: algorithm optimization (one needs to write effective algorithms and avoid costly solutions) and compilation optimization (made automatically by the compiler). While the second is important, it is the first that is usually not ideal. Personally, I work on scientific software and I have plenty of examples, where people write in C++ for performance sake (it is rumored to be faster, than, say, JVM) and write the code that could be rewritten to work 100 times faster. The language is important here. Hundred of language features usually mean hundreds of ways to do everything wrong. It is much better to have limited features and good libraries. Again, look at C++ and look at numpy. It is quite easy to remember not to use python functions in numpy to make it fast, but in C++ manual memory management, you can do it super-fast, but you need to be a specialist for that. In most cases, you will make it less than optimal.

So Python is good?

The Python is super popular. It is simple, fast(if one limits himself to scipy), and easy to use, yet it is very limited. So far I was talking about Users group and about general acceptance. But what Developers think is also important. Not for usage, for evolution. The Developer and especially, library developer needs to be comfortable with language in order to work productively and create new exciting features and libraries. If the language is just too demanding, it does not worth it. One example is C. You can write anything in C, but amount of work could be astronomical for any more or less complicated piece of code. You, of course can use previously developed libraries, but since C does not have its own packaging, it is very hard to make the build reliable without “build everything yourself” instruction in documentation. Also since the package ecosystem is fragile, nobody wants to change how existing libraries work in fear of breaking backward compatibility somewhere. It means that you stick with code and architectural solutions from the last decade. It works, but could it work better?

Python has the same problems and some others too. Dynamic typing and REPL-like environment allows to create simple programs very fast, but exactly the same features make it hard to develop large pieces of code. You need to write tests for each of your functions and classes and in good python code you the test code is of the same size or larger than the code it tests. Package system relies on semi-manual local environment setup, which means that you need to believe that user will setup everything correctly himself. But as it frequently happens, pythons main problem lies in its main power: numpy. You can’t create new data structures and new functions not written in C/Fortran without dropping in a very deep performance pitfall. You are forced to use ndarray everywhere and can’t use plain functions. In fact it this limitation is rather bad for teaching. After python, people do not believe that it is possible to do something without matrices and ndarrays.

And let’s not forget about tooling. If you want to really create a high-performance code, you can’t do it without high quality profiler. Usually, you do not just need to know, that each of functions you call work fast, you need to know, how do they work together. It is quite easy to write super-duper numpyish code only to miss one native python call, which will work 30 times slower and will define the total performance of the program.

What about Kotlin?

This article is actually about Kotlin, or intended as such. Kotlin is a candidate for a good language. It keeps the balance between expressiveness and simplicity. On one hand one can write Python-like procedural code in it. On the other hand it allows almost Haskell or Scala-like type constructs without introducing additional language features. In most cases simplicity is achieved by replacing language features by library factory-methods or using functions-with-receivers (also known as contexts or scopes). The multiplatform feature allows to compile Kotlin code to any runtime platform. With the best possible tooling, type-safety and fast growing community and library coverage, Kotlin seems to be one of the best candidates for next language for complex scientific applications (I work in science so it is important for me, but of course it is used not only in science). It probably won’t replace Python in the nearest future just because Python is much more simple and as we already discussed, simplicity is what end user wants. But it could in most cases replace C++ which is very hard to write and maintain.

I won’t discuss Kotlin advantages. There are a lot of articles about it and developers just love it. The only question that still needs to be solved is performance. There is a known prejudice about JVM languages (and Kotlin main platform is JVM).

JVM is slow! Well, not really.

I won’t present any benchmarking results here for two reasons:

  1. There a lot of them out there. Just use Google.
  2. They are not informative anyway. Those tests are usually using either some kind of optimized numerical algorithm, or some code that does nothing.

Even in unfavorable conditions (JVM is just not meant for short-running tasks), you can see that latest JVM (especially JDK-11, which includes some nice vectorization optimizations) performs very close to native-compiled languages like C++ or Rust which makes it one of the fastest compiler ever and definitely fastest JIT compiler (we of course mean the speed of compiled code, not the compilation speed which is terrible, say for C++). The myth about slowness of JVM date back to early 2000-s, when the compiler optimization was indeed much worse. Current tests done for example during development of kmath library show about 10 % performance increase compared to similar numpy operations. Yet, there are still some problems. For example, boxing.

Thinking outside of the box

Kotlin eliminates a lot of performance problems known from Java. Inline functions allow to perform functional operation without creating intermediate objects, thus removing infamous stream processing overhead Inline classes in some cases allow to avoid unnecessary allocations (though handling inline classes is a bit tricky at the moment, it is not always obvious if the class will be actually inlined). Most important, coroutines allow to avoid thread creation hidden cost which always was one of major performance pitfall for any parallel programming. Modern JDK are very good in inlining method calls so even complicated object structure does not introduce additional runtime costs. The major problem which is left is boxing. Let me explain it for those, who do not know what I am talking about.

As you probably know (or should know) in most modern languages there are variables of two type: primitive types like numbers and booleans and reference types aka classes. Different languages treat those objects differently, but in JVM, primitives have a separate type and always transferred by value, reference types are transferred by reference. It is possible to wrap a primitive in a object that holds it and thus create a reference type frequently called a Box. Using Boxes have both pros and cons. On a plus side, one can hide an abstraction behind the reference and use the same generic reference for different implementations, for example Number class could hold a double, int or even complicated structure like a BigDecimal. Also one can use boxes in structures that work with references like Lists. The minus is that each call to the value inside the box requires additional dereference operation and heap access, which is rather expensive compared to operations on primitives. JVM 1.5 and later has a feature called autoboxing which allows to pass a boxed value like Integer to the places that require primitive int and vice versa. The VM automatically puts the value into the box or extracts the primitive from the box. The bad thing is that performance still suffers on this operations. Especially if one performs multiple boxing-unboxing operations in a row.

From the developer point of view, it means that if one wants to get good performance on primitive operations, he needs to create a specialized code that deals with primitives and primitive arrays like double[] in Java or DoubleArray in Kotlin. And it is really hard to write the code which will work fast on generic numbers. The boxing problem is also present for structures, but it does not have such a dramatic performance impact as for primitive operations.

Kotlin is both good and bad in terms of boxing. It is good because there is not distinction between primitive numbers and boxed numbers, compiler makes decision about it automatically. It means, that it will use unboxed variant if possible and boxed one if not. On the bad side, it is much harder to understand just by looking on code if boxing happens or not. The same goes for inline classes. They could be used to avoid object boxing locally, but if transferred somewhere, it becomes boxed and it is really hard to understand, where it happens without decompiling the bytecode. Also, one needs to remember that Kotlin function-types are generic by nature, so any primitive or object passed through it will be boxed if the function is not inlined. Those problems could be avoided by Developer, but require careful handling. In future, the problem of boxing on JVM will be probably partially solved by introducing Valhalla value-types and better escape-analysis in GraalVM (even now Graal shows very promising boost on boxed array evaluation). Also Kotlin language team is working on language-specific solutions.

The boxing problem is not specific to Kotlin and Java, it arises in one or another form in all languages. Yes, even in those that have specialized solutions like value-types. Python, for example solves it with “brute force”, it just infers dynamic type and than uses specialized native implementation of this type. This solution is available in Kotlin as well. You can just write specialized versions for mathematics and it will work really fast (similar or even in some cases faster than native implementation). Or you can just use JNI to connect to your favorite native implementation. The access to native libraries is more cumbersome in JVM than it is in Python, but in fact not much.

The verdict

  • You do not need a good language to write a good program if you are a good programmer. And yes, good means fast as well.
  • You still want to use a good language even if you are a good programmer, because good language will allow you to write it faster and safer, also it will mean that your program will evolve faster and tooling is important.
  • User does not need good language, he wants simple language. And he does not want a good program, he wants a working program. It is OK to write simple working programs, but in the long run, the evolution will be slow and painful.
  • The language performance is a myth. What matters is ability of the program to solve problems. If problem does not need high performance, you do not need to optimized it. If program needs to be fast, it could be done in any language. So the language should be selected for it convenience, not for mythical performance. In real life, there are only limited number of places, where performance matters and those places could be easily written once and hidden inside the libraries.
  • JVM ecosystem and Kotlin in particular are mature and comfortable enough to be used to solve high-performance problems. And language is really a good compromise between simplicity and flexibility. It has some JVM-based limitations, but performance problems are mostly solved and the one actual problem — boxing could be avoided (and in fact will be solved in future by Valhalla, GraalVM or both). So, yes, you can write a performance-critical applications in nice Kotlin language without native back-end.

Afterword

I intended this article to be much more technical, but in the end, it does not matter. There are a lot of articles about performance out there, most of them do not make sense. What I wanted to say is that you can make your program run fast in any language. What matters is that you make it fast without sacrificing simplicity and language features. Also you need to keep balance in all things.

For illustration I used snapshots from popular Russian cartoon Смешарики (there is English translation out there, but the original is better). The plot of this specific chapter is that there is a race in the desert and everyone is building a bizarre-looking race car. In the end, everyone crashes: one is too fast and can’t turn in time, one blown up, one went flying, and after all the winner was the simplest and slowest one (and it is not the conclusion I want to draw from the article above).

--

--

Alexander Nozik

Senior research scientist at MIPT, (ex) team lead at JetBrains Research.