A note on Kentucky Mule and Twitter’s Scala compiler announcement

At the top Hawk Hill, and in the middle of my after-work bike ride last Friday, a friend of mine observed “Greg, you seem to have much more fun riding today”. Undeniably, Twitter’s announcement that builds on ideas I explored in Kentucky Mule made my week.

There’s a real thrill in watching a fresh concept graduating from being an uncommon interest of one person to becoming a project other people want to work on.

Kentucky Mule’s checkpoint

In December 2016 I wrote a summary of what I learnt of exploring the limits of Scala’s typechecking speed. In the post I described a prototype that I built to validate my ideas for an alternative architecture for the Scala compiler that is focused on parallelism of the typechecking phase. Twitter’s announcement brought up my experiments to much wider audience and I noticed some confusion in comments online what Kentucky Mule means.

I drew a picture to give an overview of the overall architecture and point at the piece I researched in Kentucky Mule:

Kentucky Mule’s role in typechecking Scala, I left out the dependency arrows between phases for clarity

For comparison, in the current scalac implementation all these little boxes are arranged in one long sequence that is really slow to run.

The original idea behind massively parallel architecture is to carefully break down typechecking process into two major phases:

  1. symbol table computation (method/class signature typechecking)
  2. method bodies typechecking

Symbol table is the global resource that, once available, makes it trivial to parallelize the second phase.

Furthermore, the symbol table computation (phase 1) would be also parallelized internally in a lock-free fashion thanks to an explicit tracking of dependencies between entries in the symbol table. The dependency information allows for batch scheduling of method signature typechecking with a guarantee that all other information in a partially computed symbol table is available already. If you’re curious about the details, check my previous posts and notes in the Kentucky Mule repository.

The critical assumption in this architecture is that symbol table computation can be effectively parallelized. In other words, all arrows between boxes in the “symbol table” phase could be computed ahead of time. This is a very non-obvious premise because the problem appears to be circular at first glance: for parallelizing signature typechecking one needs dependency information and the dependency information is available only once typechecking is fully done.

To break that cycle, I came up with the notion of outline types that approximate real types in a carefully chosen fashion that allows for dependency extraction. Here’s the caveat: outline types must be orders of magnitude cheaper to compute than regular types to be useful as a vehicle for structuring real typechecking in a highly parallel manner.

The work I’ve done in Kentucky Mule is to explore the limits of how quickly I can compute outline types. I decided to cap my exploration at 15 days of deeply focused work. The capped time was enough for me to gain the confidence that the architecture I was considering didn’t have any obvious design flaws. And the performance numbers I observed exceeded my expectations.

Kentucky Mule’s design principles

Going into prototyping the outline types computation, I knew I had to be unusually thoughtful and disciplined in how I conducted my experiments and explored the design space. Compilers are known to be notoriously hard to understand the performance of. And I was about to explore fresh ideas.

Before I wrote any code, I came up with four design principles driving the development of Kentucky Mule:

  1. Transparent pricing
  2. Only the paranoid survive
  3. It’s a voyage
  4. Loss aversion driven development

I described what each one of them means in the README of the project.

The fourth principle is probably the most interesting one because it informed my decision to select a subset of language features I’m going to support at each step of prototyping.

I applied these rules to a pretty narrow problem: finding a performant outline types implementation. Outline types coincide with the regular Scala types for a small subset of the language. An interesting realization is that the design principles I listed could be applied to a different problem: finding a performant, single-threaded implementation of Scala type checking.

Next steps

I’m very curious how the Reasonable Scala compiler is going to unfold. In particular, I’m awaiting Eugene Burmako to share more plans for the project he is leading. I’d love learn more how Kentucky Mule inspired his thinking.

Meanwhile, let me comment on the status of Kentucky Mule. I mentioned that I gained confidence that the architecture Kentucky Mule explores doesn’t have any obvious flaws. I managed to implement enough Scala features to get Kentucky Mule process a real code base: the scalap project (2k lines of code) . However, I haven’t implemented all Scala languages features required for outline types to work. There’s one area were my prototype is really cutting corners: Scala’s As Seen From.

In my original blog post on highly parallel Scala typechecking I speculated how I’d deal with As Seen From algorithm but I never get around to try these ideas. The As Seen From is both core to Scala and it’s a tricky algorithm. Kentucky Mule’s architecture might still collapse in a subtle way, though. And the reason would be that my speculative treatment of As Seen From doesn’t pan out.

Adding support for processing Scala’s standard library in Kentucky Mule would be a good trial for the As Seen From treatment. The standard library makes a heavy use of mix-ins, abstract types that are refined multiple times in the inheritance chain and type constructors. These are the features that remain the most risky for the architecture to support.

I think adding support for Scala standard library to Kentucky Mule should take another 15 days of deeply focused effort. And I think it would be a really captivating project to work on. Once that is done, I think Kentucky Mule would become a complete proof of concept for a highly parallel Scala compiler architecture.

Recently, I started working on spinning up a new team dedicated to supporting the growing use of Scala at Stripe. I’m focused on building the team and I’m not readily sure whether I can combine it with getting Kentucky Mule to finish line. I’ll mull over it more, though.