From Kentucky Mule to faster Scala compiler— project brief
I can’t understand why people are frightened of new ideas. I’m frightened of the old ones — John Cage
What you find below is a fun exercise in pitching a risky software project to a hypothetical MoneyedCorp’s director-level decision maker. Inspired by musings on software ∩ business by patio11.
Technology tends to progress incrementally with rare leaps. This proposal is for a non-incremental leap in Scala’s 2 compiler performance yielding an estimated 5x-10x speedup. In broad business terms, for a company employing 1000 software engineers this amounts to $1.2 million in savings in gross labour cost per month. Relatively large amount stems from a rare opportunity: Scala compiler is an exceptionally low performer and is a critical bottleneck for a large number of highly paid engineers.
The proposal describes a project of high-risk/high-reward nature.
Scala compilation performance has been an unsolved problem for more than 6 years — since the time bigger companies adopted the language. Although cumulative gains from small optimizations over the last few years amount to about 2x improvement, bigger savings haven’t been found. Despite no shortage of deeply experienced and talented engineers looking.
Recently, I reached an important milestone in a research project called Kentucky Mule, where I prototyped an alternative architecture of the Scala compiler inspired by Facebook’s Hack. The milestone greatly reduces the technical risk of finding Scala’s language features that make the architecture impossible to implement. The prototype validates the main ideas on a non-trivial code sample: Scala’s standard library.
Before reaching this milestone, the risk of the architecture collapsing in a non-recoverable way was high. There was no practical way to reduce it other than to implement a prototype.
Why nobody has done it before
As informs’ essay elegantly puts it, the innovation builds on the body of knowledge accumulated in incremental steps. The very realization that Scala compiler needs a reworked architecture comes from years of experience working with the current one, optimizing it and encountering its limitations. The thought that there might be a better way is directly inspired by Facebook’s Hack that has been designed to work at a massive scale of Facebook’s code base and with industrial performance expectations. However, Hack’s architecture doesn’t transfer directly to Scala due to Scala’s unique language features.
Secondly, the compilation performance hasn’t been a priority to the original creators of Scala. Scala has academic roots and, sadly, academia doesn’t see compiler performance to be a fruitful research area. As noted by one of the members of the team that worked on the Scala compiler, Scala’s compilation represents “a messy data-dependency problem” with no clear opportunity for an improvement originated in an insight publishable as a paper.
Lastly, the proposed new architecture is non-obvious. Without going into unnecessary details, the problem underlying the new architecture seems to be circular. Breaking the tie requires devising a scheme that seems disconnected from Scala’s core language design. Only careful study and deep understanding of Scala’s specification convinces one that this scheme is workable.
Evidence for the opportunity
A popular argument for Scala’s slow compilation speed is that Scala’s compiler simply has to do more work. The language has sophisticated features and supporting them is computationally expensive. A simple experiment refutes that argument. One can generate an equally basic Scala and Java code side-by-side and compare compilation times between Scala and Java. The difference in compilation speed is 6x in favor of Java. None of the performance analysis work done to date explains that gap.
Another piece of evidence is performance of Kentucky Mule. Kentucky Mule implements a non-trivial subset of Scala’s type checking (the most expensive step in compilation) and performs at 40x the speed of the regular compiler. The logical conclusion is to expand Kentucky Mule’s architecture to regular Scala compiler and hope that performance gap (thus gain) remains large.
Duration: 3 months
Deliverable: Kentucky Mule successfully runs on 10 open source projects
Requires adding support for binary class file parsing and Java types support
Duration: 2 months
Deliverable: Validation of KM architecture’s compatibility with Scala’s As Seen From
This milestone addresses the central risk in adopting KM’s architecture. One possibility is that As Seen From can be supported but in an expensive way. Being able to test the cost on a real projects will be key in assessing that option.
Duration: 1 month
Deliverable: Validation of KM on MoneyedCorp’s code base
This milestone ties the work to the code base that’s ultimately validating both feature completeness and performance
Duration: 2 months
Deliverable: Integration of KM into Scala 2 with a dry run mode
Once this milestone is complete, Kentucky Mule runs as a separate phase in the compiler but its results are not yet consumed by other phases
Duration: 3 months
Deliverable: Broken up the coupling between namer & typer; namer is replaced by KM phase
Currently, the namer and typer phases are closely coupled together through an intricate web of lazy computations. In fact, the separation is more of a fictional aspiration. The result is known as “big ball of mud”. The tight coupling is one of the reasons why world experts of JVM performance profiling and tuning have a hard time making substantial leaps in improving Scala’s compilation performance.
Duration: 2 months
Deliverable: Profiling and optimization of the type checker
The new architecture will make the time distribution spent within the type checking phase more apparent and will unlock deep optimizations and scoping of future work
The first six months are intentionally dedicated to facing the biggest risks in the project. If the scheme were to collapse, it’s better to learn about it early saving both time and money.
The second part of the timeline is all about execution on a fairly straightforward plan with some wiggle room for Scala compiler’s internal complexity. By the time project reaches the second part, the risk of failure will be low.
Importance of architecture
One of Akin’s Laws of Spacecraft Design says
“(Shea’s Law) The ability to improve a design occurs primarily at the interfaces. This is also the prime location for screwing it up.”
The overreaching viewpoint behind this proposal is that some of the core interfaces in Scala compiler are ineffective at serving a language of complexity such as Scala. Fixing this problem yields a couple of benefits:
- much more clear attribution of cost by breaking the “big ball of mud” that takes up around 40–50% of the total compilation time into smaller pieces that can be analyzed separately; this is an application of the classic divide&conquer engineering approach to the problem of compilation performance
- the newly-found interfaces lead to code patterns that Java’s Virtual Machine (JVM) is much better at optimizing; the new architecture is better at leveraging the 20 years of engineering that went into JVM’s performance optimization
- the new architecture would enable massive parallelism as a future work
The proposed work is not only about the direct benefit of an immediate speed up but also about a second-order effect of unlocking future work such as parallelizing the compiler or simply an increased development velocity of the compiler.
Put another way, the proposal is to re-architect the compiler to make fast performance more natural by representing the work the compiler needs to do in a better form. Bret Victor makes this great point of importance of the right representation:
Have you ever tried multiplying Roman numerals? It’s incredibly, ridiculously difficult. That’s why, before the 14th century, everyone thought that multiplication was an incredibly difficult concept, and only for the mathematical elite. Then Arabic numerals came along, with their nice place values, and we discovered that even seven-year-olds can handle multiplication just fine. There was nothing difficult about the concept of multiplication — the problem was that numbers, at the time, had a bad user interface.
In similar vein, the current architecture makes Scala compiler performance optimization almost equally elite task.
- (moderate to high) the project might collapse due to technical reasons; mitigated by intentionally taking on these risks in the first phase of the project
- (low) re-architecting doesn’t really give any insights into performance, even once the compiler is broken into smaller, more understandable pieces it’s not clear where the time is spent.
- (low) the changes are too disruptive to be merged back to the main-line Scala compiler and result in a technology fork; mitigation is through value: if changes speed up the compiler in substantial way everybody will be motivated to get them. It would be the biggest news for Scala in years.
It’s important to point out that the integration of the work doesn’t require any company-wide changes. All work would preserve existing compiler external interfaces and can be consumed as a drop-in replacement.
Two more people would be involved full-time in the project. Both would be deeply experienced engineers/researchers with programming language design and implementation experience.
Although, I have specific individuals in mind, I leave them out from a public version of this proposal.
Except for direct improvements in Scala’s compiler performance, a few strategic upsides are worth enumerating:
- the new architecture would address some long-standing issues in internals of the compiler that greatly hinder its development velocity
- the insights from this work would transfer to Scala 3.0 that is currently under development
- In an event that the project fails, the most probable reason would be some deep technical reason related to Scala’s language design; the time to inform the design of the next version of Scala and fix the mistake is closing down in about 18 months
- lastly, if project succeeds it would generate a great deal of excitement in Scala’s community and likely increase Scala’s adoption which is a strategic asset to MoneyedCorp’s long-term technology choice
This proposal is written in spirit similar to how a technology startup would pitch for venture capitalist investment. The logic behind venture capital funding is that some technologies require a sizable amount of funding to be developed to a viable product. While the scale of funding asked in this proposal is different, the spirit of risk minimization through development with a high degree of conviction remains.
This project for me personally is an opportunity to capitalize on years of acquiring knowledge in this space and to solve an important problem. I don’t see it as lifestyle business opportunity and my priority would be speed of execution. There are many interesting things to work on.
If the project works out, people funding it and working on it will receive a significant career bump and recognition. If the project fails, both the capital and the time commitment will be modest at the scale of a large enterprise company. With the caliber of people involved in the project, the amount of thinking that already went into the idea behind the proposal, even in retrospect, it won’t be seen as a careless endeavour.
 Assuming 30 minutes per day is spent waiting for builds (either actively or passively), 5x speedup and an average $300k total cost per engineer (around $160 per hour). This yields 24 minutes saved per day per engineer = 400 hours across all engineers per day = $64k per day in labor cost. The estimate is of the cost, which includes but is not limited to the gross salary (includes office space expenses, etc).