Interpreters, JITS, & Compilers (Oh My!)

Contrasting Code Executors & Translators via Concrete Examples

Gabriel Lebec
Fullstack Academy

--

“I had a running compiler and nobody would touch it. …They carefully told me, computers could only do arithmetic; they could not do programs.” —Grace Hopper

Students learning high-level programming for the first time often struggle with differentiating the concepts of compilation and interpretation. This comes up frequently in the context of learning that Node.js and Chrome both share the same engine V8— which “runs” JavaScript code. A common (though not entirely correct) beginner understanding often reads as: “interpreters run the code right away, but compilers output a binary which can be executed later.” This first-pass attempt at categorization suffices early on, but confusion may ensue when the student hears of a Just-in-Time (JIT) compiler: it’s a compiler which runs the code immediately? How is that different from an interpreter? And V8 used to feature two compilers, but now it has an interpreter too? Oh dear.

There are plenty of excellent resources in print and online for learning compilation theory. Their focus is usually on viable techniques for handling the translation of nontrivial languages: lexing a string into a token stream → parsing tokens into a concrete or abstract syntax tree, according to a formal grammar → generating or linearizing output from the tree.

Writing a working compiler using these techniques is satisfying and illuminating, but this article takes a different tack. We will define a trivial language “QuipScript,” which can be compiled with similarly trivial logic — no parse tree in sight. We will then compare & contrast running and/or translating a QS program via JS using:

  • An interpreter
  • A Just-in-Time (JIT) compiler
  • An Ahead-of-Time (AOT) compiler
  • An optimizing AOT compiler

With concrete and deliberately simple code examples, the intent is to make the differences between these approaches as intuitive as possible. Code for this article is located at https://github.com/glebec/int-jit-comp; let’s begin.

A Toy Language

QuipScript is a tiny language invented for this article that builds up, modifies, and prints a single stateful string value. Here’s an example QS program:

Bear in mind, a program is just a dead text file. It doesn’t necessarily do anything on its own — we need either a machine capable of using it directly, or else a system able to translate the text into something a machine can use.

This particular QS program is intended to first print “hello you”, followed by “hello world”. Take a look at the code and reason through how we get that result; it will be necessary to understand the next steps. When you’re ready, move on for a brief explanation.

Every QS program has a single implicit stateful string which begins empty. In this example, we first concatenate hello and then you into the string. Next, we print the current state of the string. Afterwards, we remove the last three letters, concat them, remove those four letters, concat world!, and remove the last letter. Our string state is now hello world — which we print.

Interpreter

Our QS code is certainly quite lovely (cough), but it’s useless without a way to make a computer execute it. As knowledgeable JavaScript programmers, however, it shouldn’t be too difficult to write a program which:

  1. Reads in the text file as a string
  2. Separates the string into lines
  3. Identifies the intent of each line
  4. Executes the specified logic in JavaScript, live

This is a QS interpreter, written in JS. Every statement in the QS code is read in and identified. When the JS code has determined the purpose of a QS statement, it executes the logic itself, in situ.

Running node interpreter.js produces the following output in our terminal:

hello you
hello world

Just as we wanted.

What if our source QS file had an error partway through — but after the first call to print? We might see output like this:

hello you
/Users/glebec/dev/compilers-article/interpreter.js:25
default: throw Error('unexpected token: ' + line);
^
Error: unexpected token: ERROR I AM AN INVALID LINE OF GSCRIPT CODE
at sourceCode.split.forEach.line (/Users/glebec/dev/compilers-article/interpreter.js:25:18)

Notice that hello you was printed before we reached the erroneous line of QS code. Interpreters merrily execute portions of source code as they go, meaning that they might perform some work before failing.

Interpreters generally have the following aims:

  • Begin running immediately
  • Produce results while analyzing source code
  • Fail “close to” the error location

Caveat

Our QuipScript interpreter, written in JavaScript, can run statement by statement. This was easy since every QS statement conveniently maps 1-to-1 with an equivalent JS statement. A less contrived source language however might first require parsing into an intermediate language or tree representation in order to be correctly and/or efficiently interpreted.

Just-in-Time (JIT) Compiler

Our interpreter worked by figuring out the semantic intent of our QS code, and then realizing that intent itself as it read through the source. But that is not the only way we could have generated our final result.

As an alternative, we could compile (translate) our source QS code into target (or object) JS code — building up a program string. We won’t run each statement live, but rather wait until we have generated an entire JS program:

Provided there are no errors, the entire QS program will be converted into a JS program, before any source logic is actually executed. We finally use JavaScript’s (very dangerous) eval function to run our generated JS code.

Running the above code with node jit.js logs the following to our terminal:

hello you
hello world
FYI, here is the compiled program:let string = '';
string += 'hello';
string += ' you';
console.log(string);
string = string.slice(0, -3);
string += 'them';
string = string.slice(0, -4);
string += 'world!';
string = string.slice(0, -1);
console.log(string);

It’s important to understand that the main output (hello you…) is not printed until the eval line runs. The generated JS program is also printed here just for demonstration purposes — in a real JIT we obviously would not embed the compiled code in the program output.

The main takeaway is that a compiler converts the input program into an equivalent target language program. A JIT compiler usually has these aims:

  • To compile the entire source code to object code ASAP
  • To subsequently run the object code, when it is possible to do so

Because our example compiler does not execute any source logic itself, if our source code includes a syntax error, the compiler will fail before any results are output (since the object code is never run):

/Users/glebec/dev/compilers-article/jit.js:31
default: throw Error('unexpected token: ' + line);
^
Error: unexpected token: ERROR I AM AN INVALID LINE OF GSCRIPT CODE
at sourceCode.split.forEach.line (/Users/glebec/dev/compilers-article/jit.js:31:18)

Notice, no hello you was printed this time.

Logically, it should make sense that an interpreter begins outputting results earlier than a JIT compiler. However, one would imagine that once the JIT has finished compiling, the generated JS program might theoretically run faster than the interpreter, which has to loop through and analyze each line. In our toy language example this might not be the case, but for more complex languages, interpreting the code is likely going to run slower than executing the compiled code.

Caveat

In reality, there are various more advanced flavors of JIT which combine the concepts of compilers & interpreters, compiling and running target code while parsing the source code. Some may also be capable of monitoring the running code and re-compiling source as necessary.

Many definitions of JIT outright require some or all of these additional capabilities. This article uses a more primitive distinction however, and separates JIT from AOT (see next section) based solely on whether the target code is immediately executed by the time it is done compiling.

Ahead-of-Time (AOT) Compiler

If we wanted to re-run our QS code, we could pass it through either the interpreter or JIT again. But then we’d be re-doing all the work from scratch. Wouldn’t it make sense to compile once, ahead of time, and simply save the generated JS program for future use?

This code is identical to the JIT above, except at the very end. Instead of immediately evaling our constructed JS program string, we save it to a file. So, running node compiler.js creates a new file (hello-world.js) for us to use in the future.

Subsequently entering node hello-world.js produces the expected output:

hello you
hello world

(Debug code truncated, as it is identical to the JIT example above.)

The advantage is clear: from now on, we don’t repeat work. We can run this output JS code as often as we want, and it will both start up immediately (like an interpreter) and run quickly (like a JIT). The downside is we had to produce the JS file before actually using it.

Optimizing AOT Compiler

If we are front-loading the work of compiling our source code (QS) to object code (JS), why are we allowing the JS to be so inefficient? Remember, our primitive compilers above generate the following JS program string/file:

let string = '';
string += 'hello';
string += ' you';
console.log(string);
string = string.slice(0, -3);
string += 'them';
string = string.slice(0, -4);
string += 'world!';
string = string.slice(0, -1);
console.log(string);

It isn’t hard to see that we could write a much simpler program in JS that accomplishes the same task. If only our compiler could intelligently do some of the work in advance, optimizing away inefficiencies:

This compiler is doing a bit of the work itself, finding ways of simplifying the program. When we use it via node opt-compiler.js, it saves the following code as hello-world.js:

console.log('hello you')
console.log('hello world')

Ah, much better! Running this code via node hello-world.js will give us exactly the results we want, with no superfluous string manipulation.

Doing the extra logic to generate this simplified code means an optimizing compiler might take longer than a JIT to compile, but the resultant JS code is even more efficient when actually run. In general, AOT compilers have the following aims:

  • Analyze the code as deeply as possible, looking for optimizations.
  • Generate output code which is fast (to run), small (on disk), and/or lean (in terms of memory use); priorities are not always mutually compatible.
  • Save the generated code for future use, baking in improvements.

Optimizations

The “optimization” shown in this example was practically interpretation itself, but optimizations take many forms, including:

  • Dead code elimination — if some source code could logically never be useful, remove it from the output. For example, if we did some more concatenation after the final print, that would be pointless work.
  • Register allocation — the most used variables might be stored closer to the processor (e.g. in explicit registers), so they do not have to be retrieved from higher layers (e.g. RAM).
  • Loop unrolling — if the number of passes of a loop can be analyzed ahead of time, a loop can be converted into a simple sequence of statements with no expensive “jump” commands or control flow statements.
  • Inlining — if a variable is used just to name some constant data reused across the app, the compiler can simply replace the variable with the data in the output code (saving a memory lookup).

That’s just a small sampling; modern compilers analyzing well-designed languages can perform other sophisticated optimizations and static checks.

Case Study: V8

Google’s V8 Engine is the core of the Chrome browser and Node.js runtime environment. It is the code which reads your JS file and executes the logic within. So, how does V8 work? Is it an interpreter, a JIT, an AOT compiler, or something else entirely?

As it turns out, this is an area in which V8 is actively evolving. Let’s take a simplified look at both how it used to work and also the more recent pipeline.

Old Way: JIT + Optimizing Compiler

From 2010 to 2017, V8 used two compilers: a lazy JIT named Full-Codegen, and an optimizing compiler (either the earlier Crankshaft or later Turbofan). When V8 executed some JS, it would go through the following steps:

  1. The JIT would analyze the JS and compile it to machine code for your particular platform. This takes a short period of time — longer than an interpreter takes to start, but generally short enough for humans not to care, often because the browser is still downloading other page elements. Interestingly, Full-Codegen would compile lazily — not converting everything at once, but targeting individual functions as needed.
  2. The machine code would then be immediately executed by the runtime environment. Running machine code is much more efficient than interpreting JS. In the interest of getting your code to run ASAP, however, the JIT made little to no optimizations, so the machine code is not as fast as it could be.
  3. As the machine code ran, V8 would profile it and identify hotspots — functions that are called frequently, potential bottlenecks to performance.
  4. V8’s optimizing compiler (Crankshaft or Turbofan) would then kick in and re-translate those functions (from JS to machine code) in a more intensive way. This process is slower than JIT compilation, but the resulting machine code is faster to run.
  5. V8 finally live-substituted in the newly-optimized machine code, straight into your running program. The performance of a script would improve while it was being used. Quite a neat trick!

New Way: Compiler + Interpreter + Optimizing Compiler

In 2017, V8 switched over to a new pipeline prominently featuring (drumroll)… an interpreter!? Yes, it may seem surprising, but the newest component, named Ignition, has some key advantages.

The trick is that Ignition does not interpret JavaScript per se, but rather a compiled intermediate language between JS and machine code — a platform-agnostic bytecode. Bytecode is essentially a formal pseudocode, taking the form of a low-level instruction set for a purely hypothetical processor. Because this deliberately generic language is similar to the assembly languages used in most modern computers, it is easy to take one last step and convert it into actual machine code for a given platform. One only needs to tweak the text to account for the local dialect, as it were.

So, V8’s newest pipeline actually features an (unnamed) JS-to-bytecode compiler, the Ignition bytecode interpreter, and the Turbofan optimizing compiler. When this system receives a JS file, the following occurs:

  1. V8 compiles the source JS into generic bytecode, identical for every platform (not specific to a given OS or processor). The compiler uses a few “easy” optimizations in this step, so the bytecode is already fairly efficient.
  2. Now, the Ignition interpreter begins executing the compiled bytecode. It may be slightly slower than running compiled machine code directly, but is still pretty fast.
  3. Again, hotspots are identified and re-compiled using the Turbofan optimizing compiler. However, this time the optimization doesn’t start from scratch with JS, but proceeds from the already-generated bytecode.

At first glance, it is perhaps not obvious what advantages this new system confers. Can compiling to semi-optimized bytecode, and then interpreting that bytecode, be comparable to just doing a JIT compilation to machine code?

In fact, Ignition is quite close in execution speed to Full-Codegen, and the V8 developers reap the following rewards:

  • Faster startup times, mostly because it is quicker to compile to bytecode than it is to fully compile the source JS to machine code.
  • Significantly less memory use, because most of the source JS can now be “thrown away” after conversion to bytecode. This is especially good for mobile platforms, a major motivation behind Ignition’s development.
  • Simpler integration with the Turbofan optimizing compiler, both speeding up the optimization step and making it easier to implement new JS language features (from ECMAScript) as they are published.

The moral of this story is that the archetypal view of interpreters resulting in slow runtime execution and compilers generating fast runtime execution can become muddied, or at least nuanced, when more complex hybrid solutions are considered.

Aside: WebAssembly

Having examined the differences between AOT and JIT compilation, you may very well ask: why don’t people compile their JS ahead of time, with an optimizing compiler? Wouldn’t that yield the best possible results?

The short answer is that it would be nice, and that’s how native apps work. A user downloads a precompiled binary for their OS and runs it directly. However, it’s not entirely feasible for the web, for the following reasons:

  • Not all browsers are built on Chrome’s virtual machine. If websites had to serve up pre-compiled JS code to clients, they’d need multiple versions for different browsers and operating systems. Instead, all that browsers currently need is a single JS file. Different engines, e.g. SpiderMonkey (Firefox), Chakra (MS Edge), and JavaScriptCore (Safari) may interpret or JIT compile that JS, and they might compile it to bytecode or machine code. But in every case, JS is the lingua franca from which more vendor- and platform-specific results can be generated.
  • For security reasons, it would be a bad idea to let browsers execute pre-compiled native code, which might read or write private data in the file system or perform other nefarious acts. By limiting browsers to JS, developers are restricted to only the APIs and capabilities that JS as a language affords (and which the browser grants).

However, the potential performance benefits of compiled code have not been overlooked. The W3C has designed a universal binary object code language for the internet, called WebAssembly. The promise of WASM is to provide nearly-native performance in a low-level, pre-compiled format, which is still secure and safe to be run by browsers (due to restricted capabilities).

Modern browsers are already supporting WASM, though JS is still needed e.g. to manipulate the DOM. Developers can write code in languages like C or Rust and compile to WASM modules which are loaded and invoked by JS. WASM therefore can greatly speed up certain kinds of calculations, working alongside JS to provide a more performant web.

Conclusion

This article is intended as a high-level introduction to lessen confusion among those encountering these concepts for the first time. The code snippets are not meant to demonstrate recommendable or effective techniques for use in real interpreters or compilers. However, it is my hope that these deliberately simple scripts might illuminate the core differences between these entities. Here is a summary of what we illustrated through examples:

It should also be evident that performing optimizations is not limited strictly to AOT compilers. A JIT can also perform some optimizations, so long as they do not slow down the compilation process too much. However, an AOT can afford to take as much time as it needs to generate the best possible result.

We also saw that more sophisticated pipelines may involve techniques such as lazy compilation, compilation to intermediate representations, dynamic re-compilation, and other hybrid approaches.

Additional Resources

If you want to learn more about this interesting and deep topic, here are some relevant resources:

Addendum

Thanks to Fedor Indutny for this additional resource suggestion!

--

--

Gabriel Lebec
Fullstack Academy

Instructor at Fullstack Academy of Code. Passionate about functional programming, nihontō (traditional Japanese arms), and anything that combines art and tech.