How to solve the same numerical Problem in 7 different Programming Languages

Andreaskuhn
11 min readMay 23, 2022

--

All logos are under a creative commons license and taken from their respective Wikipedia pages: Python, C++, Rust, Perl, R, Julia, FORTRAN

Finding the right tool, to solve a problem, can often be a big problem by itself. In the case of programming, this translates to choosing the right programming language. Obviously, a lot of factors came into play, like:

  • How skillful are you in a language ?
  • Are there packages that provide most of the functionality needed ?
  • Or do you have to write a lot from scratch ?
  • Is execution speed a concern ?
  • Should the code be easily readable for other programmers/users ?

The first two factors are beyond my control and typically constrain you already to one or two languages. Therefore, to make things more interesting, let’s assume an idealistic scenario where you want to do a small project, in which almost all code has to be written from scratch. Additionally, you possess no prior knowledge in any programming language.

Therefore, I will show you a simple numerical problem together with its solution in 7 different programming languages. As a fair comparison between programming languages is a quite complicated endeavor, you will have to endure a lot of generalizations and simplifications in the next sections. But nevertheless, I will work out some major rules of thumb that might be very useful for your future (real) projects.

The problem we will be looking at is a numerical solution of the logistic differential equation. This equation describes the time evolution of a quantity U. It typically represents a population of some kind (e.g. number of rabbits/viruses in a given system). Such a population shows an exponential growth until external factors (limited food, space, hosts) force it into a saturated state where the population growth asymptotically approaches zero. The equation has this form:

And its solution with a very small starting population looks like that:

time evolution of population U

One possible way to solve such a differential equation numerically is to discretize and calculate it for small time steps iteratively (Euler method). The discretized version of the equation looks like that:

The next step is to choose a starting value U(t = 0), then calculate the value of U(0+Δt) and, then iteratively insert the calculated solution again into the same equation:

This is done until the final timestep t = T is reached. We have to do this calculation N = T/Δt times.

The most natural way to express such a pattern in a programming language is a loop. I will start with the most popular programming language right now: Python. After that, I will show you what would be different, better and/or worse if I had used R, Perl, Julia, Rust, C++ or FORTRAN instead.

An exemplary solution in Python looks like that:

Python proves its stereotype of good readability as it is possible to express the problem in a very compact format. Aside from the strange lambda syntax to define a compact function, it is very clear what is going on. The only external package used is numpy which provides fast arrays and various functions associated with them. The other most important property of a program is its runtime. In this case, the function took 714 μs to execute. For a human such a timespan is unbelievably fast, but for a modern CPU where a single clock cycle takes under a nanosecond 714 μs are not really that impressive. Let’s see how the other languages perform here.

Note: Precise runtime measurements are complicated, as the runtime depends on many factors (used hardware, OS, background load, cached data, energy saving modes, …). We tried to make it as comparable as possible, by always using the same machine (Intel® Core™ i5–10310U) with the same OS (Ubuntu 20.04 LTS) and without any software running in the background. The function was executed 1000 times and the mean runtime is given. All runtimes given in this post have been determined in that way.

Another very popular language, especially in the space of data science is R. An implementation in R looks like that

As R is another high level language, the code can be expressed in a similar compact way. One major difference are the curly brackets to declare namespaces compared to indentations in Python. In addition, arrays are 1- instead of 0-indexed and there is no similarly compact way of defining a function. Judging by this small example the syntax is a little bit more “ugly” and worse to read than the Python version, but it is still quite easy to grasp what’s happening. The runtime (R :510 μs) is in the same order of magnitude (Python: 714 μs).

A quite “new” contender (first release 2012) in the space of data science and numerical computing is the programming language Julia.

Funfact: The name Jupyter from Jupyter notebook (which we mainly used for this project) stems from the three languages Julia, Python, R.

A Julia implementation of the problem looks like that:

As mentioned before, Julia is the youngest language of the three and took major inspiration in its design from MATLAB, Python and R. This results in a similarly “beautiful” syntax as Python, but some things are different as well. There is a whole part of the documentation about which things are different in Julia compared to other languages ;). Just a few examples: Arrays are 1 indexed as in R and the end keyword is used for code structuring. One thing that’s better (at least in my opinion) in Julia compared to Python is its mathematically inspired way to define a short function: f(U) = U*(1-U).

But enough about syntax, one of the big selling points that Julia always promises is its high execution speed compared to other high level languages like Python, R, Ruby, … . And we can confirm that: The Julia implementation took 6.39 μs to execute, which is two orders of magnitude faster than the runtime of the comparable Python and R implementations (714 μs, 510 μs). So far, we have a clear winner in terms of combined syntax and speed score.

Note: All the following implementation have not been written solely by me, but have been hacked together in a social event called “Hackyhour”, at our institute, the CCTB at the University of Würzburg. Special thanks go to iimog and CaptainSifff and all other participants for their contribution.

So let’s think a bit outside the box and take a look at a language that has been very popular in the early days of the internet and has still a lot of hardcore fans, but at least as many people who profoundly hate it: Perl.

There is actually a Perl kernel for Jupyter notebooks called IPerl, which we used here and also many more kernels for all kinds of programming languages.

In the blink of an eye, we can see that we are dealing here with a totally different beast compared to the previous languages. The first thing that catches the eye, is that statements have to be ended with a semicolon. This is somewhat ancient, but we could live with that, however there is a lot more strange stuff going on. For example the first symbol of a variable name define which type a variable has $ for primitive types like Int or Floats and @ for arrays…

Also, Perl makes heavy use of special variables. Instead of defining parameters of a function (`sub`) explicitly, they automatically end up in a special array `@_`. The length of the array `@u` is stored in the associated special variable `$#u` which can not only be used for reading the length but also for changing it. A backslash is used to create a reference to an array (e.g. `\@u`). For a short and convenient variant of the numpy `linspace`, the PDL module is used, which provides `zeroes` and `xlinvals`.
Being aware of Perls peculiarities, it becomes clear that the code behaves identically to the Python, R, and Julia versions.

Overall the syntax is not very welcoming, as it appears that a lot of stuff is unnecessarily complicated. Of course, Perl fans would say that exactly therefore it is so powerful, … Nevertheless, Perl has lived up to its reputation to be very hard to read for someone not too familiar with the language.

But let’s look at performance, how does this beast of a language compare to the others. Not too bad, the Perl implementation took 235 μs to execute, which beats Python (714 μs) by a factor of 3 and R (510 μs) by a factor of 2. But compared to Julia (6.39 μs) it is over 30 times slower.

Enough of the high level languages, let’s go more low level and take a look at languages that use less abstractions, but also promise higher speeds. The first one, we are looking at is Rust:

Rust also has a kernel for Jupyter notebooks called evcxr, which we used here.

Compared to the previously shown languages, one different concept can be spotted instantly: Static types. We have to tell the compiler explicitly what type the arguments of a function or the elements of an array have. Declaring types manually can be inconvenient in some cases, but offers more control of what is actually happening in the program. The rest of the syntax looks a bit more “clunky” than Python or Julia, but far more readable than Perl.

The performance is great, as the Rust implementation took only 5.24 μs to run, beating the previous champion Julia (6.39 μs).

Rust is of course not the only low level language. Let’s take a look at the probably most used language in this domain: C++.

There is also a kernel for C++ in Jupyter notebooks called xeus-cling, which we used here.

Unfortunately, there is no (or at least not one that we could find) function in the C++ standard library to create a linear spaced array. Therefore, I had to copy one from Stack Overflow. Like Rust, C++ is a statically typed language. A new concept here are iterators, which are variables that point to an element in a range of elements (like an array). To get the actual value of the element, the dereference-operator * is used. This allows a quite compact syntax in the body of the loop, but this additional complexity also makes reading/understanding of the code harder.

This results in an execution time of 37.97 μs, which is surprisingly slow compared to Rust (5.24 μs). But some C++ experts quickly pointed out that this is probably due to the xeus-cling kernel.

As the C++ experts couldn’t stand seeing their language loosing, they quickly came up with an optimized C++ implementation compiled by g++:

The optimisation was mainly to delete unnecessary code and focus on solely solving the differential equation. The biggest timesaver in that regard was to initialize the u array outside the function. This array is passed to the function as an argument and only gets modified, but not initialized every time the function is called.

Additionally, there is also a lot of runtime optimization possible by choosing the right compiler flags for g++. All compiler flags come with costs and benefits in compile time, execution speed, memory usage, portability ,… . But as the goal was to beat Rust in execution speed, we settled on these :

-O3 -march=native -ffast-math

  • -O3 is the highest order of optimization for execution speed
  • -march=native allows the binary to be specifically optimized for the used cpu-generation (which limits portability)
  • -ffast-math enables the compiler to use a number of optimization pathways in floating point arithmetic which can cause a loss in precision (in this case there was no difference).

The results were impressive, as the C++ implementation now took only 2.21 μs to run. This made it the leader, comfortably beating Rust (5.24 μs) by a factor of 2.

As one of the C++ experts happend to know FORTRAN as well, he wrote an implementation for Gfortran.

I don’t want to go too much into details about the syntax of FORTRAN as the code speaks for itself. In short, today nobody should ever use FORTRAN outside of existing legacy projects.

But there is one oddness of FORTRAN that is too strange, not to mention it here, the implicit none statement.

It disables a “feature” stemming from a time long ago, when the file size of source code itself could be a limiting factor. Therefore, to save type declarators like integer :: in front of variables, all variable names starting with the letters i, j, k, l, m and n were treated as integers and all other variables as real/float arguments. Nowadays, file size is never a concern and automatically asinged types based on variables names are a nightmare to debug. Therefore, the implicit none statement can be found in every “modern” FORTRAN program. Why it did not become part of the standard definition of the language is another question….

Fun fact: As g is treated as a real argument, god is real in FORTRAN ;).

But enough about the hate, let’s look at the bright side of FORTRAN: The speed. And it does not disappoint here. The compiled program using the same compiler flags as C++ runs in 2.24 μs, which is basically identical to C++ (2.21 μs).

I wasn’t satisfied in seeing such old languages win. There has to be some progress made in language design. So I took a second look at the Julia version, as it was the one that showed the biggest potential. Using all the tricks the C++ guys showed me, I settled on this optimized Julia version.

This function also solely calculates the solution of the differential equation and does not initialize any arrays. The macros @fastmath and @inbounds were used, which work somewhat similarly to the compiler flags in g++:

  • @fastmath allows the just-in-time compiler of Julia to use similar optimization pathways in floating point arithmetic as the -ffast-math flag of g++.
  • @inbounds disables bound checking of arrays.

The big difference between these macros and compiler flags is that, these optimizations only happen inside the scope of the macros, allowing a more fine-grained optimization.

The results are impressive. The optimized Julia implementation now runs in 1.99 μs, narrowly beating C++ (2.21 μs) and FORTRAN (2.24 μs).

To sum this all up in one graph:

Now we could conclude on the general trends we have seen here:

Python and R are easy to use but slow.

Perl is hard to use and slow.

Rust is hard to use and fast.

C++/FORTRAN are even harder to use, but if used correctly can be very fast.

Julia is easy to use and fast and can become very fast with a bit of tweaking.

Therefore, if you need a fast and easy to use programming language, you should definitely take a look at Julia.

--

--

Andreaskuhn

MSc in Physics. Slowly transitioning into Bioinformatics/Data Science. Phd student at the CCTB of the University of Würzburg.