Meeting Julia, a great new alternative for numerical programming — Part II a high-level perspective

The Sloan Foundation 2.5m Telescope at Apache Point Observatory. (Picture source)

A couple of months ago I started learning Julia and wrote an article investigating its renowned performance. I wanted to see with my own eyes what everybody was talking about and I was still a little skeptical of, and I was positively surprised. Julia really manages to do a great job at generating high-performance machine code, owing nothing to languages such as Java, Go, Fortran or C++.

That article covered low-level aspects of the language and its performance. Today I deliver the promise I made back then to write a new article bringing a high-level perspective. How does Julia feel like as a language, and what is it like working with it? I’ll mostly talk a lot about how I understand Julia in the context of modern programming, compare Julia with other languages and discuss some unique features.


Recap

Let’s first review some fundamental facts about Julia. The language was created on top of the LLVM, putting Julia in the same generation of languages as Swift and Rust, and giving it something in common with other projects such as Clang, Numba, Scala Native and the Haskell GHC. Julia is the only of these projects created from scratch with a clear commitment to enabling scientific and high-performance numerical work, while also being suitable for generic applications, unlike say Matlab or R.

Julia performs ahead-of-time compilation at runtime, what is not exactly JIT (just-in-time) compilation as done by the JVM, but it means your code gets compiled interactively, unlike say C++. Julia is not an interpreted language in the sense of Python or Ruby and does not have a virtual machine. Julia does have a command prompt that lets you interact with the system in the same fashion as Python, Matlab, LISP, and your good old Unix shell.

Julia is very terminal-friendly

Interface

The Julia creators did a great job at making it possible to develop in different ways, so you can freely pick what is most convenient to you. It can be used from a terminal, like the figure above shows, and you can run programs from a text file created using your favorite text editor. I have been mostly using Julia from within Emacs, and I’m quite happy with how it works. I only miss sophisticated features such as code reformatting and goto-definition, but I bet this is coming eventually.

If you are more into a graphical environment, the best option seems to be Juno, which is an Atom module for Julia. The figure below shows a screenshot of that, with an embedded plot.

Juno IDE on Atom, showing a nice plot

Just to be clear: you don’t need a graphical IDE to have graphical plots. If you plot from the command line using the Matplotlib back-end it will open up a window pretty much like using Python in the terminal. The Julia Plots module supports many back-ends, including Matplotlib, text-mode, Plotly.js in the browser and even GR and PGF for great integration with LaTeX.

Another great alternative for working with Julia is using Jupyter Notebooks. Julia put the Ju in Jupyter! I for one am very excited to have access to such an amazing technology that can allow me to work with a compiled language on a remote machine from a browser window. The next figure shows a JuliaBox session where I plot my favorite transcendental function, and then later check out the assembly code used to calculate it. The instructions callq and vdivsd perform a function call to calculate the sine and then a double-precision division.

Easy plots and disassembly on a web app, is this heaven?

One cool feature that is worth mentioning even at the risk of sounding silly is that all of these interfaces, from Juno to Jupyter, the Emacs julia-mode and even the terminal, they all support entering special characters by typing\theta<TAB>. It’s amazing.

A new interpretation

Perhaps one of the most popular ways to describe programming languages is to classify them between compiled versus interpreted, static versus dynamic, and also along the lines of being strongly or weakly typed, or even “typed” at all. Stefan Karpinski did his own attempt at comparing Julia to other languages in these terms in one of his always excellent presentations. You can also read some more of his thoughts on this StackOverflow answer.

A typedness-pureness-dynamicness language chart by Stefan Karpinski

I think the mark of a good attempt at making these classifications, observable in Stefan’s, is to recognize how they are actually not very accurate, and that a proper understanding of how different languages work requires more information than what a chart like that can tell you. I personally think we should start trying to really come up with a more modern way of understanding languages.

Interpreted languages are just a kind of lazy compilation, and a compiler is just an eager interpreter. And what we call dynamic typing can be implemented in any static language by just rolling on your own virtual methods table or whatever. Modern languages just offer this kind of thing implicitly, what can be nice. It is just a language feature not unlike supporting a native string or matrix types, or even for-loops to replace go-tos. It’s all just “syntactic sugar”…

It may be hard to see Python as a lazy compiler, but the point is that whatever it does has to turn into machine level instructions at some point. What different compilers and interpreters do is just to try to be smarter about it by potentially anticipating or optimizing the code generation. In this point of view, languages only differ in how long does it take for the ‘/’ you type in your keyboard to become a vdivsd instruction in your CPU. In interpreters like Python this happens at the last moment, and in a strictly eager C++ compiler, this happens before runtime, as it always has to produce the standalone executable binary code. Julia can produce standalone executables too, right now using the PackageCompiler.jl. But most often it is used with interactive compilation at runtime.

The JVM has already for years challenged the concept of compiled versus interpreted languages by being capable of continually optimizing the machine code at runtime. And we haven’t really recovered yet from the invention of this technology, still trying to understand languages in 1980s terms. Java and friends are conspicuously missing from the above chart.

I believe the big question today when comparing languages for high-performance numerical programming is how much can the language optimize your code to produce high-quality expressions handling the basic native numerical types like 64-bit floating-point and integers. And the main reasons a language may not give you that is either because it is a naïve interpreter that does not even try, or because your types are boxed to provide you dynamic features that you probably don’t want in the middle of your numerical processing.

Java can actually be pretty good at numerical code. The main reason I think that prevents a more widespread adoption in this area is just that it was never the main priority to its creators and users during its development. In practice, that means fewer libraries, tools, and references, but also that you may too easily produce slow performing code due to lingering dynamic programming features such as boxed types. So it is not so much that it is not possible to do it, but it is hard to ensure what is going on. And if it may prevent you from using the nicest features of the language, what is the point of trying Java if your code ends up looking like C anyway?

More than questioning what we mean with our language classifications in a post-JVM world, I think it is important to realize how languages have been evolving in a way that they are all becoming pretty much alike. Modern C++ with template functions and type inference looks an awfully lot like duck typing. Python, on the other hand, has been investing more and more in its optional type annotations with mypy. The experience in Haskell, this bastion of static typing, is no different: it is considered a good practice to try to write your functions as generic as possible. Also, it is statically typed but has a REPL, and things like dynamic typing and class inheritance can be usually replaced with Haskell-y things like sum types and typeclasses. So the meaning of these language description terms is really getting less accurate, dare I say more generic.

The situation with Julia is that it is very capable of creating generic and dynamic code using boxed types, not unlike Java and Haskell, but it puts a strong emphasis in being able to produce optimized code appropriate for numerical processing and gives you tools for you to confirm that this is happening.

A word about garbage collection

The past section did not mention the topic of garbage collection, which deserves a full section by itself. Memory management and garbage collection must be one of the most regrettably misunderstood topics in modern computing. Talking about Julia seems like a great opportunity to spread out some clarifications about how GC works, especially in modern times, and what it all means to a programmer.

Ironically, many programmers seem sometimes to be people who despise technological innovation. Maybe because the speed of technological innovation in computing is just too fast for anyone to keep up with, making it a daunting task to continually learn to use new tools. Or maybe programming is just difficult enough with the tools you already know, and sticking to familiar and effective techniques may often seem like a better choice then putting your trust in the unproven and unknown.

A wise programmer does good by trying to maintain a scientific attitude of simultaneous skepticism and open-mindedness. There is a time for pragmaticism at work, and the programming activity can easily entice conflicts and vain futility. But it is important to make sure we don’t get ourselves dragged into that, and that productivity and the pursuit of truth and intellectual enlightenment are our main priorities in the end of the day.

Most advances in computing have been passionately challenged by defenders of the good old ways. It happened when moving from batch processing to interactive, single to multi-user systems, mainframe to microcomputer, eager versus lazy, the use of programming languages like BASIC and Fortran over assembly, structured programming over GOTO, and then functional combinators like map and filter over for-loops. It seems there has never been a good idea in programming proposed without an army of detractors who thought it was the worst thing ever. It’s like for most people learning to program is a traumatic experience to which you never find closure

Automatic memory management is just one of such technological innovations. Existing for as long as the first LISP interpreters, automatic memory management exists to solve very real and challenging problems that come up when you start to build large and complex applications that just cannot be implemented with only trivial stack allocation or other obvious rules for deleting unused objects.

To believe that manual memory management code can often outperform the use of automatic is no different than to think assembly programmers will often obtain better performance than users of a compiler. Or that a Fortran compiler will often beat a C compiler. This is just a fiction. In the case against GOTO, some people had some good points that “there are things you cannot do like this…” But that was less of a good argument in defense of GOTO, and more of a missed opportunity to understand what high-level abstractions and compiler optimizations could be introduced to replace those uses of this instruction in a program.

In the case of GOTO this is perhaps a more subjective thing, but at least since Zorn 92 the argument for automatic memory management is much more objective. Of course, the first compilers may not have been that great at generating optimized code, but they got there with time. The first garbage collectors were probably not amazing when they arrived, but had their time to mature, and a bad garbage collector is most certainly not worse than bad manual memory management code. At least a bad GC already has the advantage of letting you focus on coding your actual application instead of writing boring bit-brushing code.

One of the most important myths to dispel about garbage collection is that it necessarily implies having horrible long pauses in your program. This may have been true for some early generations of the JVM which were not developed with highly interactive applications in mind. But in general GC pauses are no different than normal operational system pauses, and you don’t often hear people say how awful it is to have an operational system because it keeps pausing your program and you don’t have control over it.

Running into problems with your GC is not unlike having some problem with your OS. The solution often involves something like tuning your system. Or you just have to change something in your application such as making sure you are not holding unnecessary references or making old objects refer to young objects, or creating reference loops… All of these also create problems with manual memory management. And if you are really desperate, to the point you are already using a real-time OS in your application because you cannot afford the jitter, there is such a thing as hard real-time garbage collectors. And I hardly believe manual management is a trivial matter in such applications. There is nothing in the C++ specifications saying that malloc and free should be hard real-time operations, by the way. What the C++11 standard does call for is minimal support for garbage collection, although this was not necessarily followed yet by the compiler implementers.

GC is very handy for abstracting memory allocation and allows you to approach memory use as an aspect of your application separate from domain logic. And it can do it with great performance. But on top of that, a generational compacting garbage collector can also provide you with excellent features such as moving around objects in memory so that page faults are minimized, something that can be very important to obtain top performance. This is a fact most GC critics omit in their arguments and a feature that is practically unattainable by manual management. Unless you are effectively implementing your own GC.

How to make fast Julia code

Well, but what does all that mean to Julia anyway? Julia has a custom garbage collector with some generational features that allows it to offer most of the same convenience and goodness derived from GC in languages such as Python, LISP, and Java. Objects can really be passed here and there intuitively, without wasteful copies or any of the weirdness from non-garbage-collected languages. And what does that mean for high-performance programming, does it mean you just go on writing beautiful high-level code and relying on the compiler and the GC and that’s it?

The answer is… of course not! You don’t want to be garbage-collecting anywhere near the critical parts of your code! But that’s not because GC is bad, you don’t want to be doing any memory management whatsoever in the core of your numerical code, be it manual or automatic. This is true for any language, and manual management cannot save you from this. What would be the point of doing a lot of manual deallocation in your critical loops? Programmers should be aware of how their programs are using memory, regardless of how it is managed, and do the right thing in each language to avoid the overhead associated with allocation and deallocation.

Usually, when we talk about garbage collection the main issue is memory deallocation. This is only really critical in interactive applications where you may have an object referred to by multiple other objects and threads, and memory is a precious resource. Complex CG like games seems to be a good example. This is not very frequent in scientific programming, where often you can and should pre-allocate, and deallocation may only happen at the end of the program. Effectively that means even though we have automatic memory management, we still end up having to do this manual pre-allocation thing. There are excellent garbage collectors out there, but figuring out that pre-allocation for you is an area where automatic memory management has not made a lot of progress yet, with the notable exception of tensor compilers.

While the day doesn’t come where compilers are even smarter about memory management, let me tell you my experience. When I write high-performance code in Julia I start by first making some draft code where I freely use things like comprehensions and create arrays on-the-fly, etc, until I am sure my expressions are right, and of what is the critical part of my code that I should optimize and what are its dependencies. At that moment you might already have in place all sorts of tests and I/O, and you can even start profiling.

A screenshot from ProfileView.jl

All of this is not unlike having a Python or Scala prototype of your code that you later start optimizing by moving some of its parts to a faster and lower-level language such as C++ or Java, with the difference that in Julia you are still just using the same language, and thus avoid the two-language problem, something often said in articles promoting Julia. I have worked with language mixtures like this for many years and never realized I had a two-language problem until I started using Julia. Today I see this is a very real thing, and having a single modern, powerful and efficient language that can do everything by itself is much better.

And how does that faster Julia code look like? There are a few guidelines. Basically, you have to make sure that you pre-allocate big arrays, and that memory access only happens when strictly necessary. One very helpful tool that allows you to keep using small arrays in your program while generating fast code is StaticArrays.jl. You should also make sure your code only refers to local variables, avoiding to access the global scope, and that your functions get specialized for basic numerical types.

A couple of tools that can be very helpful for that are the @btime macro from BenchmarkTools.jl, that not only measures time but how much memory gets allocated, and also @code_typecheck and @code_native. The first one lets you know if your code is using just good, native types or not. The second one is basically a disassembler. You don’t really need to understand assembly in detail for analyzing its output, you can just get a feeling if the code is doing mostly numerical operations, is using special instructions or not, inlining is happening, that StaticArrays is doing its job, and that there are no bad things happening such as undesirable calls and memory IO.

One of the best reasons for doing numerical work in Python and Java instead of C is that it is so much better to profile, inspect and disassemble the code. Julia definitely offers the same advantage, even though the tools may still be not as mature and developed as in older platforms, naturally.

One great talk to hear about Julia for high performance is this one about real-time robot control. The speaker reports being able to run a 1ms bipedal robot control loop written in Julia, quite a feat. Maybe a small-scale but functionally complementary feat to the often advertised use of Julia in the big-data petaflop processing project to analyze the data from the Sloan Digital Sky Survey, which telescope can be seen in the first image illustrating this article.

Going meta

In some ways, the previous sections only show Julia features that we might say are pretty conventional, even if very well done. Like the package manager, for instance. It’s excellent. Especially its seamless integration with Git. But in 2019 this doesn’t look so much like an impressive innovative feature to be excited about as it is one of those things that us spoiled users have come to just expect from any “decent” language… Isn’t there anything new and exciting about Julia as a language?

Julia does offer something quite unique, and that is its macro system. Julia macros are said to be inspired by LISP, and they go all the way to call Julia a homoiconic language.

Julia seems to be very much a product of MIT, where LISP was born, and its parser is actually written in Scheme and runs on top of Jeff Bezanson’s FemtoLisp. That’s why you might argue that it is an interpreted language, although I don’t quite agree with that given that you can even build object files. It’s more like a runtime-compiled language whose parser happens to be written in an interpreted language, that’s all. There is also some talk about moving to a Julia based parser, but this is hardly considered a priority.

When I first heard of Julia having a LISP interpreter inside it I immediately thought of Greenspun’s tenth rule, and I thought this was very cool. But much more than that, right before starting with Julia I had been studying Racket, PLT Scheme’s rebranding, and was quite impressed with its proposal of allowing the implementation of different languages on top of a LISP engine. I eventually started wondering if one could write a language like that appropriate for numerical computing.

In a way, maybe Julia concretized this idea. But in my current understanding LISP doesn’t really have such a central role in how it works as it would in a Racket-based language. I think it’s more like this LISP interpreter has the role that would be partially taken by tools like Flex and yacc in other languages. But LISP is still a big influence on how the language works, especially regarding the macros.

A Julia macro works by taking Julia expressions as data that can be manipulated like anything else, like strings and number arrays. You can then write Julia code to transform these expressions and then give it back to the Julia interpreter/compiler to run.

The best usage examples of Julia macros are things like the polynomial evaluation by @evalpoly, and code generation for traversing tensors and other cool things. And of course, we have already seen how it is also used for things like @btime. By the way something that if you want to do in Python you might end up using the iPython magic %timeit instead of just relying on the actual language itself.

I wanted to understand macros better and came up with some examples that I’m not sure are very representative, but illustrate what I at least wish macros could do. The idea is that we often have things we want to do on top of our code that end up becoming a secondary thing intermingled with the real code, and I would like this to happen more like a kind of alternative interpretation of the code. For instance, I can just write a single definition of a function but can obtain from it an alternative implementation that also prints logging messages.

https://gist.github.com/nlw0/bd698d2785d3cf1d98e2ac1e6e2a7d31

I am really not too sure if this is something macros could be good for, but the examples definitely show something that is not at all usual in other programming paradigms and is definitely more like LISP than anything else. I am not too sure Julia is as powerful as LISP or Prolog, though, where one of the main things is the ability to implement meta-interpreters. But there is at least one Julia project called Cassette.jl that seems to be trying to achieve something like this.

Another language that we might compare to Julia regarding its macros is C++. Of course, the #define macros in C++ are a joke compared to what Julia does. But I believe a lot that is being done today with template metaprogramming in C++ might be done in Julia using macros. And one might argue that if TMP is effectively a different language from C++, then this would mean another instance of Julia beating the two-language problem.

Conclusion

Compared to C++, Julia feels like using a lot of template functions, auto and TMP, that is the way many current C++ programmers like to do things now. Compared to Python, it has all the conveniences of a modern syntax and a REPL, and the same ability to plot and to write generic programs like web servers, except it can effortlessly achieve the performance of a compiled language. There is a price in waiting for compilation sometimes, something that is high in the priority list of Julia developers to improve (the “time to first plot problem”). Compared to Matlab, it is a much more sensible language and the native array support is better in my opinion. No magic stuff, just a really sensible language design, and of course, free of charge.

Is there anything I do not like in Julia?… I think the main thing is just that I wish they had been a little more influenced by the more type-strict functional languages like Haskell and Scala. Functional programming really changed how I think, but I can live with Julia the way it is by regarding it as a better alternative to Python and C++.

This was a pretty long article that spent a lot of time not talking about Julia itself, but about programming in general. I think this is a good sign, it shows Julia is not about doing what other languages do but using different keywords. Julia is about letting you do more with computers, and not about bikeshed arguments and tired computer science debates from the 20th century. It is a fresh new look at what programming can be like in the 21st century. Julia is about taking you as far as modern technology can go, so you can reach beyond.

I suggest you try it now!