Crystal has a compilation issue

My company has lately investigated the possibility of using the Crystal programming language for some our new products and I was responsible to investigate its feasibility in regards to the development experience.

Unfortunately, I found out that Crystal has long compilation times, and I believe this is intrinsic to the design choices they did in the compiler and the type checking mechanism.

Since I will be running some benchmarks, note I am running them on an Early 2015 MacBook Pro with 2.7 GHz Intel Core i5, using two logical cores and 16 GB of RAM.

Benchmarks

The benchmark here is very simple. I pick a MVC web framework, I generate 100 scaffolds (i.e. source code with models, views and controllers) and then I measure the time to compile the application in development. If the language does not require compilation, we measure the time to run a simple command, enough for the application to boot.

While I have originally benchmarked only Amber (in Crystal), I decided to measure Phoenix and Rails as reference points.

Amber v0.8 (Crystal v0.25)

Amber is a web framework for Crystal. Getting started with Amber is really straight-forward. Once installed, I ran:

amber new demo
cd demo
shards install
amber generate scaffold User0 name:string age:integer
...
amber generate scaffold User99 name:string age:integer

That is enough to scaffold 100 resources. Then I ran time crystal build lib/demo.cr. For this application, I had to wait 6 minutes and 48 seconds for it to compile. Obvious to say, this is not acceptable by any metric.

Furthermore, Amber does not seem to perform any caching or partial recompilation of your application build. Running any command, such as amber watch or crystal spec, forced me to wait almost 7 minutes once again.

Phoenix v1.3 (Elixir v1.7)

Curious to see how other frameworks would perform, I have decided to do the same test for the Phoenix web framework for the Elixir programming language:

mix phx.new demo
cd demo
mix phx.gen.html Accounts0 User0 name:string age:integer
...
mix phx.gen.html Accounts99 User99 name:string age:integer

Then I ran time mix compile --force and the whole affair ended in 10.4 seconds. Not only that, Phoenix seems to cache its results. Running any command, such as booting the app or loading your test suite takes only 1.2 seconds. Recompiling the application after changing a single file is also reasonably fast, as Elixir seems to perform partial recompilation.

Rails v5.1 (Ruby v2.5)

Finally I decided to compare those frameworks with Rails, since both claim Rails is one of their inspirations:

rails new demo
cd demo
rails generate scaffold User0 name:string age:integer
...
rails generate scaffold User99 name:string age:integer

Ruby is not a compiled language, so instead I decided to benchmark the time to run bin/rails runner "puts Rails.env"which returned in 3.0 seconds. Running other commands, such as booting the app and loading the test environment, take roughly 3.0 seconds too.

Should I care?

If you are planning to use Crystal, you should definitely evaluate the complexity of your project before getting started, as compilation times seem to get higher and higher as your codebase grows.

I am aware most don’t expect to write an application with one hundred resources from day one. But I have to confess I have already worked with way larger applications than this. There are many situations that lead to large codebases, such as business demands, fast growth, etc. Some teams even do it intentionally. If you look at the most popular open source Rails applications, such as Discourse and GitLab, you can quickly see they are much bigger than the ones we generated here.

It is also worth considering how much of this is the fault of the Amber web framework, after all, a framework that leverages meta-programming is likely to introduce its own overhead during compilation.

For instance, the Crystal compiler, which is written in Crystal itself, has around 50k lines of code and it compiles in 30 seconds. 30 seconds is a relatively long time for 50k LOC. For reference, the biggest Elixir application I could find was Ecto, which has 26k LOC and compiles in 3.8 seconds.

Another technique we could use to measure compilation times is to measure the initial build of a project, which has to take into account the code of the project and its dependencies and where the usage of macros is kept to a minimum. A fresh Amber application brings in 44k lines of code just in its dependencies. The initial build of a newly generated Amber application takes 25 seconds. That’s aligned with the 30 seconds taken to compile the Crystal compiler. After the initial build, dependencies are cached, reducing the build time to 3.1 seconds, but the compilation of your application itself does not seem to be cached. In other words, once I generated 100 resources, I had to wait almost 7 minutes every time I started the application or every time I ran tests.

Such numbers are also reproducible in other places. A fresh application created by the Lucky web framework has an initial build of 29.5 seconds with each subsequent command taking 5.0 seconds. In both cases, expect compilation times to ramp up as the codebase grows.

What does the future hold?

Crystal is still a young language. Its first commit is from September 2012. That’s 20 months after Elixir’s first commit and likely about two decades after Ruby’s.

Crystal has not reached 1.0. Nor has Amber. I wouldn’t doubt if there are low-hanging fruits optimizations to be done in Amber to reduce the compilation time. Macros could likely be optimized to generate less code. The compiler can likely be optimized too.

However, I also have reservations around the current decisions done in Crystal’s static type checker and its compiler. I have spent the last week studying Crystal’s source code and, from my understanding, it seems to perform global inference. This brings intrinsic limitations to compilation time. Over the next paragraphs I will outline my findings. It may have mistakes, so corrections are appreciated.

The benefits of global inference is that you do not need to explicitly list the types of your methods in many situations. The downside is that the type checking mechanism requires, in the most naïve approach, to evaluate the source code in order to find type errors.

Let’s see an example. Imagine that you have method1(arg) that calls method2(arg) that calls method3(arg) up to method100(arg) where method100(arg) performs arg + 1. In order to type check a code like method1(0), the type system will execute method1(Integer) that then executes method2(Integer) up to method100(Integer) which then checks for arg + 1. In order words, it needs to evaluate the code using types as arguments. If you invoke method1("foo"), then it does the same traversal, which consequently finds a type error.

Of course, there is one obvious solution here: caching! Imagine that your codebase has M methods and N types. For simplicity, let’s assume all M methods are invoked with all N types during type checking. Your cache size needs to be M * N.

However, it gets more complex if a function receives more than one argument. Imagine that you have a method that receives two arguments and you receive two types: Integer and String. Now the possible evaluations for caching are 4:

  • some_method(Integer, Integer)
  • some_method(Integer, String)
  • some_method(String, Integer)
  • some_method(String, String)

A method with 3 arguments has possible combinations, where N is the number of types. While it is very unlikely that all of your methods will be invoked with all possible types, this shows the theoretical complexity of the problem at hand in terms of space.

So as your codebase grows, the amount of code to be evaluated during type checking grows too. While this is expected in typed languages, the complexity here is bound to the system size and the relationship of its components. For instance, a type system that relies on local inference bounds its complexity to the size of each individual component. With local inference, once method1(arg) is compiled, a type is assigned to it and you do not need to traverse all the way down to method100(arg)when you invoke it. Unfortunately moving to local inference will either lead to a more complex type system, or require developers to annotate methods more frequently.

Note those decisions also put a damper on some compilation improvements, such as partial recompilation. That’s because once method100 changes, you need to type check all callers of method100, all the way up to method1. In practice this means that, even if partial recompilation is in place, you may frequently end-up type checking large chunks of your codebase, and paying the time penalty again.

It may also be that local inference is completely unfit to the type of code that the Crystal compiler generates. For example, imagine that we have a similar scenario, where method1(arg) invokes method2(arg) up to method100(arg). Except that now method100(arg) is implemented as arg * 1 instead of arg + 1. The difference here is the following:

  • 1 + 1 returns 2
  • "foo" + 1 raises an error

while

  • 1 * 1 returns 1
  • "foo" * 1 returns “foo” as it duplicates the string once

In other words, method100 becomes a polymorphic site as it accepts strings or integers. If you want to optimize the invocation cost of this polymorphic cache with ahead of time compilation, the type checking mechanism needs to be able to find all possible uses of method100 and emit efficient native code that checks for either a string or an integer. The space complexity of the polymorphic site also depends on the number and types of its arguments.

Polymorphic calls are the bread and butter of programming in Ruby and it is understandable Crystal wants to support similar idioms.

With this in mind, it is no surprise that Crystal’s compilation takes a while and why it is capable of generating optimized code. It may also explain why the binary generated by the Amber framework in an application with mostly boilerplate code is so large: 65MB, without statically linking its dependencies.

Designing a programming language is not easy. It requires a series of compromises. The compromises currently made by the Crystal team makes it a productive language for compiling tiny applications into safe and performant executables. Crystal is often compared to Rust and Go, as they all generate native code, but they serve very different purposes, especially as the last two were designed with building large systems in mind. The Go programming language is known precisely by the trade-offs made in its type system to keep compilation times low and the runtime efficient.

The current compilation times may also limit Crystal’s usage in the web space, as at the moment is does not seem suited for medium or large applications like Phoenix and Rails are. However, opposite to Ruby and Elixir, Crystal does include a type system. The type guarantee may make it a good fit for small services, but beware of the costs associated to growth.

I suspect that in order to achieve reasonable compilation times, compromises will have to be made on the type checking and performance departments. For now we have decided Crystal is not a good fit for us. Caution is naturally expected for adopting languages that are before 1.0 and I will be cautiously waiting for 1.0.