Comparing Python, Java and Go performance
I am a chauvinist of compiled languages. I have always been. While interpreted languages may help a developer to start writing and testing code faster, I see a compiler as the best possible long term investment. The way I see it you get at least 2 huge advantages when your code is compiled:
- You get all your code checked every time you change it, even before you can start using it.
- You get much faster execution times. Depending on the particulars of the tasks at hand it may translate in significantly lower infrastructure costs, perhaps even an order of magnitude lower for installations with really high traffic.
How much faster is compiled over interpreted code is one of the main reasons why I am writing this article.
Given my acknowledged preference of compiled over interpreted languages I am now facing a challenge: I have inherited this vast amount of code I have vested interest in, fully written in Python, whose better known and tested implementations are based on code interpreters, not compilers. What should I do with it? Rewrite all of it? Part of it? None?
In this blog I examine some preconceptions I have by comparing the performance of different tasks using different programing languages: Java, Go and Python. Why Python is obvious, that is what I am evaluating to be replaced or not. Java because I have been a fan of it for almost 2 decades, seeing it maturing, while performance and functionality got better and better. Finally I came in the last 2 years to start using and really liking Go. While there are elements of Java I still miss in Go, such as class inheritance, the language syntax is clean and compact, compilation and execution are fast, the generated code compact and its out of the box go-routines based approach to concurrency is really elegant.
Some of the preconceptions I mentioned are:
- Compiled code runs at least one order of magnitude faster than interpreted code. I base this opinion on a previous experience comparing the performance of Java code before and after the JIT has compiled the byte code. I have found the relation to be approximatelly 30 to 1.
- Go code runs a bit faster than Java. I remember some past tests done in my previous job that found Go to be up to 30% faster than Java in some tasks, but I have read recent blogs claiming Java to be a bit faster than Go.
I took advantage of the java code in another article I wrote recently: https://email@example.com/understanding-java-jit-compiler-787f87daa110. I wrote it again in Python and Go and compared the results. The code calculates and prints the execution time in nanoseconds of 200 batches, each executing 50 times the fibonacci number of 100.
The code used to produce the results can be found at: https://github.com/rodrigoramirez/fibonacci.
The output, skipping the initial lines for each of Java, Go and Python looked like:
Java Go Python
122 123 11683
119 107 11539
123 104 11358
120 115 11926
119 118 11973
120 104 11377
109 103 12960
127 122 15683
112 106 11482
The average times in nanoseconds, ignoring again the initial lines is:
Java Go Python
130 105 10050
So calculating fibonacci numbers Java code is a bit slower than Go code, about 24% slower this time. Python, on the other hand is almost 100 times slower than Go, 9458% slower according to my test.
This test confirmed my original ideas about Java and Go. But what a nasty surprise was to see how much worse interpreted Python performs. Not one order of magnitude worse, as I suspected, but two.
I wonder with results like this why Python is used so much?
The first thought that comes to mind is that a lot of people pays more attention to code simplicity and ability to quickly achieve results over performance and long term costs. I believe this to be the case of data scientists; being Python well known for the vast amount of libraries existing why bother looking around? Optimization can certainly come later.
A second idea is simply than many people never measure or compare their results with alternative implementations. With so many startups and a fierce competition to get something out, perhaps the optimization phase never arrives.
A third option may be that there are ways to make the same code written in Python perform much faster.
What About Compiled Python
After some research I decided to test the same Python code using PyPy, a Python implementation written in Python, that includes a JIT compiler just like Java does. As with Java we need to ignore the initial output and skip the part produced while the code is still being JIT compiled. Cutting to the chase the averages table including a PyPy column looks like:
Java Go Python PyPy
130 105 10050 1887
While average response time using PyPy is more than 5 times faster than Python, it is still more than 20 times slower than Go.
The Need for Additional Tests
The test above is rather a generic test focussed on integer arithmetic. Going back to the Python code I mentioned in the introduction, I will need to pay attention to:
- I/O via Kafka, HTTP listeners and databases
- Parse JSON messages
This article shows that processing simple arithmetic operations, code written in Go is a bit faster than its Java counterpart, and 2 orders of magnitude faster that interpreted Python code.
Personally I would not replace a well known and tested Java code with its Go equivalent just based on these results.
On the other hand, the test shows that Python code on a high traffic critical path may not be a good option. If you face this situation, consider using a Python compiler as a short term workaround.
To make the final decision of what should be rewritten and when, other factors not explored in this post should be considered, such as how I/O intensive vs CPU intensive is the code to be potentially rewritten.
From the Comments
The comment from Shaun Dream made me realize the code I wrote for Go and Java, using 64 bit integers can only compute fibonacci 92 before the results get rigged by aritmethic overflow. Having said that, it is my opinion that the reasoning and conclusions on this article remain valid.