Crash Course on the Kotlin Compiler | K1 + K2 Frontends, Backends

Quick detour on why there are so many versions and how it affects data transformations in the compiler

Amanda Hinchman
Google Developer Experts

--

Perhaps you’ve watched the recent KotlinConf 2023 Keynote on updates for K2 compiler. What is the K2 compiler?

Perhaps you might be waiting on Part 2 Crash Course of the Kotlin Compiler: before we can continue, we take a step back to cover a high level overview of different kinds of Kotlin compilers and its differences. Brief introduction to different kinds of data transformations that may occur with compilation will be an important primer for the next part.

For quick reference:

The basics

Source code is submitted through the Kotlin compiler to turn human-readable source code into machine-executable code targeted for whichever designated machine.

If we were to grossly oversimplify the compiler, we can think of the compiler as doing two things: compiling and lowering. Compiling changes the one data format to another data format, and lowering generally simplifies/optimizes the existing data format.

When Kotlin code is compiled, a set of configurations is chosen to run the Kotlin compiler: deciding which environment to run on i.e. CLI, Analysis API, processor options, which plugins to hook into the compiler - and which frontend/backend to choose from.

The Kotlin compiler has 2 frontends — K1 and K2 — and 4 backends: JVM, JS, Native, and an experimental WASM.

K1/K2 Frontends

The Kotlin compiler has two frontends — K1 frontend (denoted in the source code with Fe10-) and K2 frontend (sometimes called FIR frontend, and sometimes denoted in the source code with Fir-). Choosing a frontend determines what information is sent to the backend for IR generation and subsequent target generation.

Both K1 and K2 frontends share similar stages in the beginning, only FIR (frontend intermediate representation) frontend introduces an additional data format before sending off the transformed code to the backend — which will then immediately change the data format to IR, or Intermediate Representation, for further processing.

K1 Frontend

K1 frontend (Fe10-) takes human-readable source code, breaks down the text into Lexical tokens, creates a PSI tree and performs resolution to create additional data structures like descriptors and BindingContext to the backend.

K1 Frontend data format changes before it is sent to the backend: source code → tokens → AST/PSI

PSI stands for Program Structure Interface, which layers in IntelliJ to help parse files and create syntactic/semantic code models later on in compiling.

In the case of K1 frontend, resolution is performed on the PSI tree to generate descriptors and BindingContext, which is all sent over to the backend to be transformed into IR.

  • Descriptors, depending on element types, may hold context, scope, containing information, overrides, companion objects, and so on.
  • BindingContext is a big map where PSI is a key to a map of descriptors and other information used to infer upon the code later on.
K1 compiler sends PSI and BindingContext to the backend, which uses Psi2Ir to transform information into IR for the backend for further processing.

However, sending over PSI and BindingContext like this has led to performance issues in the compiler.

As explained by Dmitriy Novozhilov in Kotlinlang slack, resolution was also stored in BindingContext , so the CPU cannot cache objects quick enough. All descriptors were lazy — this resulted in the compiler jumping between different parts of the code, which killed a number of JIT optimizations.

K2 (FIR) Frontend

To improve compiler performance, JetBrains created the new K2 frontend compiler (also sometimes interchangeably called FIR frontend), which replaces the existing frontend compiler. Instead of just sending off the PSI and BindingContext off the to the frontend, an additional data format is produced to offload some of work it would have done backend anyway.

K2 frontend takes raw PSI as input, produces raw FIR, which transforms in different stages, filling the tree with semantic information. Then, the resolved FIR is sent to the backend.

K1 Frontend data format changes before it is sent to the backend: source code → tokens → AST/PSI → Raw FIR → FIR

FIR is a a mutable tree built from the results of the parser — the generated PSI tree. After raw FIR is built, we pass it to a number of processors , which resolves the code and represents different stages of the compiler pipeline. The details of resolution here will be further covered in Part 2 of Crash Course on the Kotlin Compiler (WIP).

Last, a checkers stage is run, which takes FIR and reports different diagnostics for warnings and errors. If there are errors, compilation stops since there is no need to send broken code to the backend. If there are no errors, the resovled FIR is then transformed to backend IR in for the backend.

K2 compiler sends resolved FIR to the backend, which uses Fir2Ir to transform information into IR for the backend for further processing.

Kotlin Compiler Backends

We’ve made it to the backend! Remember, there are 4 that we can choose from: JVM, JS, Native, and the new experimental WASM. For this article, we will discuss the more stable backends — JVM, JS, and Native.

Sletvana Isavoka’s talk What Everyone Must Know about the NEW Kotlin K2 Compiler best explains how the backend generally works:

Kotlin source code goes through the frontend. 3 arrows points out of the frontend to show that frontend outputs some kind of syntax tree and semantic information, and goes to 3 possible choices — JVM IR backend, JS IR backend, and Native IR backend.

What’s important to notice is that no matter which backend you go with, the IR generator and optimizer is always the same start for backend processing. THEN the selected configuration will run the necessary code generation mode needed:

  • JVM IR backend uses JVM bytecode generator + optimizer to produce.class files
  • JS IR backend uses Javascript generator + optimizer to produce .js files
  • Native IR backend uses LLVM bitcode generator + optimizer to produce .so files

Assuming we run with the previous K2 example being sent off to the backend, we can continue this example for data formats:

frontend: source code → tokens → AST/PSI → Raw FIR → FIR | backend: IR → Lowered IR → Target Code

After resolved FIR is sent off to the backend, FIR is transformed to IR, or Intermediate Representation. IR is generated as another form of abstracted representation for CPU level architecture. Analysis is done on IR for control flow and call stacks, and machine-dependent optimizations are performed to create Lowered IR. Usually, this means simplifying operations (i.e. 3^2might become 3*3 ), improving performance and quality of produced machine code, and resource and storage decisions are made.

Targeted code is generated in the end and optimizations are performed on the generated targeted code so that its resulting optimized target code is sent to whatever machine for execution.

Super high level overview, but I hope this article will be helpful as we continue for Crash Course on the Kotlin Compiler and dive back deep! See you for the next one —

Want more on the Kotlin Compiler?

--

--

Amanda Hinchman
Google Developer Experts

Kotlin GDE and Android engineer. Co-author of O'Reilly's "Programming Android with Kotlin: Achieving Structured Concurrency with Coroutines"