Tree-shaking versus dead code elimination

I’ve been working (albeit sporadically of late, admittedly) on a tool called Rollup, which bundles together JavaScript modules. One of its features is tree-shaking, by which I mean that it only includes the bits of code your bundle actually needs to run.

Axel Rauschmayer asked where the term came from…

…and Amjad Masad said that it’s really just a different name for dead code elimination

…which Sebastian McKenzie thinks is silly:

But they are in fact different things, even if they have the same goal (less code).

Dead code elimination is silly

Bad analogy time: imagine that you made cakes by throwing whole eggs into the mixing bowl and smashing them up, instead of cracking them open and pouring the contents out. Once the cake comes out of the oven, you remove the fragments of eggshell, except that’s quite tricky so most of the eggshell gets left in there.

You’d probably eat less cake, for one thing.

That’s what dead code elimination consists of — taking the finished product, and imperfectly removing bits you don’t want. Tree-shaking, on the other hand, asks the opposite question: given that I want to make a cake, which bits of what ingredients do I need to include in the mixing bowl?

Rather than excluding dead code, we’re including live code. Ideally the end result would be the same, but because of the limitations of static analysis in JavaScript that’s not the case. Live code inclusion gets better results, and is prima facie a more logical approach to the problem of preventing our users from downloading unused code.

It’s an idea that only really makes sense if you’re thinking in terms of modules and bundles though (since the contents of the ‘entry file’ can be thought of as roughly equivalent to the contents of the main method in other languages).

(As it happens, Rollup isn’t perfect — more on this below — so the best results are to be had by doing both steps, i.e. Rollup then UglifyJS or Webpack 2 with the Uglify plugin.)

Yeah, it’s probably not a great name

For one thing, it implies that you are indeed shaking off dead branches. But it has been used by people other than me to describe this ‘start with what you need, and work outwards’ technique (as opposed to ‘start with everything, and work backwards), mostly in the Dart community (though people have found other references). I can’t remember exactly where I picked up the term.

I thought about using the ‘live code inclusion’ phrase with Rollup, but it seemed that I’d just be adding even more confusion seeing as tree-shaking is an existing concept. Maybe that was the wrong decision?

Limits of Rollup’s tree-shaking

Rollup currently works on top-level AST nodes (e.g. function declarations — the sort of thing it’s likely to have to make decisions about whether or not to include), rather than a more granular level. So it might still include more code than is needed.

It also doesn’t currently remove things like unused methods from objects that are used, and there are times when it is forced to assume the worst in order to ensure the resulting program is correct. That’s because static analysis is difficult in a dynamic language like JavaScript. Look out for improvements in 2016 as we add type tracking and other techniques.

Rollup is about more than tree-shaking

The goal of Rollup is to produce maximally efficient bundles that look like a human wrote them. Tree-shaking is part of that, but there are other things Rollup does (or more correctly, doesn’t do) — it doesn’t wrap modules in functions, it doesn’t put a module loader at the top of your bundle, and it doesn’t generate the resulting code from an intermediate AST, but rather preserves your original code as far as possible. Because of that it’s a particularly good choice for writing libraries (it can be used for apps, though Webpack for example has a ton of features Rollup doesn’t, so YMMV), especially ones with few non-ES6 dependencies (though it can use CommonJS modules).

Anyway, that’s enough sales pitch. Hope this post sheds some light on the whole deal.