When pigs fly: optimising bytecode interpreters

Vladimir Kazanov
Bumble Tech


“No matter how hard you try, you can’t make a racehorse out of a pig. You can, however, make a faster pig.”
— A comment in the Emacs source code.

Everyone knows that pigs can’t fly — just like everyone thinks they know that bytecode interpreters, as a technology for executing high-level languages, can’t be sped up without resorting to labour-intensive dynamic compilation.

This is part two in my series of articles about bytecode interpreters, based on a small stack virtual machine which I have nicknamed PM (The Piglet Machine). In it, I will attempt to show you that, in the case of ambitious, hardworking “piglets” and working within the confines of standard C (for the most part), it is entirely possible to speed up the work of such interpreters by a factor of at least 1 ½.

Part 1 - Home-grown bytecode interpreters
Part 3 - Regex bytecode interpreter: looking for needles in session haystacks

Piglet Machine

Let’s start with some introductions.

The “Piglet Machine” is a middle-of-the-road stack machine based on the example I gave in part one of this series. Our “pig” only understands one kind of data — a 64-bit word — and performs all calculations (involving integers only) on a stack with a maximum depth of 256 words. Besides the stack, this piglet has a working memory with a capacity of 65,536 words. The result of running the program, namely a single machine word, can either be put in the result register or simply printed to standard output (stdout).

Everything in the Piglet Machine is stored in a single structure:

static struct {
/* Current instruction pointer */
uint8_t *ip;
/* Fixed-size stack */
uint64_t stack[STACK_MAX];
uint64_t *stack_top;
/* Operational memory */
uint64_t memory[MEMORY_SIZE];
/* A single register containing the result */
uint64_t result;
} vm;

The items listed above allow this machine to be categorised as a low-level virtual machine, in which almost all overheads are spent on servicing…