The Recon
Published in

The Recon move to LLVM for major performance and usability improvements

At, we’re pioneering new ways for developers and data scientists to accelerate their applications. Finding our way at the cutting edge means being ready to change and innovate to ensure we’re always optimizing performance, and improving our user experience. The compiler engineering team are currently in the process of reworking our core technology to use the LLVM compiler infrastructure: “a collection of modular and reusable compiler and toolchain technologies”.

Incorporating LLVM is set to deliver significant performance and user-experience improvements, as well as increase the kinds of use-cases for which our service is of value. Our service takes user code, written in a subset of Go, and compiles and optimizes it to create a image to program an FPGA instance. The move to LLVM will let users write more natural Go within our subset, as well as adding new language features. Users will be able to rely more heavily on our compiler to leverage fine-grained parallelism available in the accelerator — effectively taking some of the pressure away from our users to introduce parallelism through writing concurrent programs. We will be able to provide improved tooling too and bring the way we work closer in-line with the language our users are dealing with: Go. LLVM is used widely both commercially and by open source projects, and the time and effort that have gone into its development mean it’s extremely valuable to startups like ours.

Performance blockers and choosing LLVM

In the early part of this year we were hitting some performance ceilings with our existing compiler version — long compiler runtime, difficulty supporting language features and complex optimizations — and so the team began looking for ways to solve these problems. Our CTO, Josh Bohde, had been aware of LLVM since was founded, but it wasn’t chosen as our core tech because it’s primarily designed with CPUs in mind, rather than FPGAs. However, for the challenges we were hitting in early 2018, it became clear that the benefits of LLVM would more than make up for the engineering costs associated with taking it on, as well as opening up improvements elsewhere through improved end-user tooling, performance measurement and application scope.

Fundamentally, LLVM’s optimizer is extremely good at what it does. It considers your program, written in your source language, to find the best way to express it as a whole, it’s so good in fact that even though it is optimizing for a CPU, and so has a very different world view to our previous specialised FPGA-focussed compiler, LLVM’s intermediate representation is extremely valuable to us as we transition user code to an FPGA image.

Comparing old and new

Our compiler, Rio, in its previous form, was highly-specialised for its FPGA target. This led to compounded problems, from long compilation times and incompatibility with some valuable Go language constructs. Previously, the first stage in our compilation process was to a graphical, concurrent intermediate representation, made up of data and control flow information in a non-linear format. We then translated this into a dataflow language called Teak, on which we then performed optimizations.

The LLVM intermediate representation is a more linear set of instructions to operate on our data, compared to what we were dealing with in our previous compiler model, and it’s this we then use to form our dataflow graph. This means we can optimize scheduling for designs to ensure we know how much computation we can fit within a clock. In our previous model, this level of control was not available; it was possible to optimize graphs to the extent that they would not meet timing on the target hardware.

The intermediate representation that LLVM gives us opens up possibilities for new front-end tooling for our users to get timing information on their designs, with super fast turn around. Moving to LLVM effectively cleans up our control flow, allowing a much wider use case scope.

Compiler development

We’re following a well-trodden path and the compiler team are working at a high pace. LLVM’s technologies are pluggable — they naturally expose an intermediate representation that we can integrate into our existing system. This has meant, once work started in earnest, it’s taken our small team only a couple of weeks to get our existing example code library working with our reengineered compiler. The new compiler architecture is simple and refined compared to our previous structure, in part because we’re using Go’s SSA (static single assignment) form to convert to LLVM, which is itself an SSA-based representation, and we can use Go’s existing machinery for the process.

Using LLVM makes our compilation process more harmonious with the Go language; there is a high level of correspondence between Go’s representation and LLVM’s representation of language concepts. This leads to simpler processes on many levels: type checking and package resolution are automatically handled in our new compiler model, because the semantics of the Go language are rigorously understood by the existing Go SSA machinery.

The new model

A lot of the benefits to our users through this work won’t require any workflow changes, but, as we touched on before, users will be able to write their FPGA-side Go code in a more natural, familiar way, and the new compiler structure will reliably deliver higher performance designs. For example, we will be introducing a simple static pointer model into our FPGA-side Go subset, which is something we couldn’t previously implement. This gives users a way to define mutable functions through passing pointers as function arguments. Also, we’re reworking how we handle goroutines on the FPGA-side, and it looks like this is going to be an area of significant change and higher performance. Because Go and LLVM are more closely aligned, we foresee being able to write a wider range of concurrency patterns which will give higher performance and parallelism in the resulting designs: higher parallelism means more acceleration and throughput.

Internally, this reworking means it’ll be much more straightforward to keep inline with things like language updates, and features such as Go Modules that are due to hit soon will be easy to integrate.

In terms of use cases for which is a good fit, our existing cloud service is a great choice for applications that need to get through big computation pipelines, fast. With our reengineered compiler, more low latency control flow sensitive use cases will be a good fit too, e.g. applications requiring network data processing, or heavy IO usage.

Want to try it out?

Our reworked compiler will be available for testing within the next few weeks, but we’ll keep supporting both platforms for a good while yet. If you would like to be one of the first to try it out, drop us a line on our forum.




A blog from the team - News, views and interesting discoveries.

Recommended from Medium

A Summary of Usages of My Synology NAS

An Introduction to Parallelization in Python

A holistic quality assurance strategy for OTT apps

CS373 Fall 2020: Divya Manohar

Python Programming Myths

| Engineering News-Record

| Engineering News-Record

Seven Essential Ingredients of a Metaverse

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rosie Yohannan

Rosie Yohannan

Documentation and UX at

More from Medium

Designing for Performance

Corda state reissuance: Break long transaction chain to improve performance and privacy

Why Google’s CrUX Results Are Not Reproducible With Your Real-User Monitoring

DAO Ecology Summary — Tools