Tree-shaking? Let’s implement it!

Flo Sloot
Punching Performance
8 min readMay 29, 2018

I mean. Let’s literally implement the concept of live code inclusion!

What? This is Tree-shaking, right?

If you had been developing JavaScript applications lately you might have encountered an optimization technique called Tree-shaking.

Though let us tackle a few questions about it first:

  • What is it?
  • Why is it?
  • Why does it work?
  • How does it work?

TL:DR; in case you already know what Tree-shaking is and only want to implement the mechanism itself, you can skip the first sections! Start at how.
Or simply have a look at the
code.

What is it?

“The concept and name were popularized by the ES2015 module bundler Rollup.js” — Webpack

We should first get a good idea about the definition of Tree-shaking;
it’s a process in which the bundler includes only the code that is actually used.

What would that look like in code?
Well, have a glance at the following example:

// module_one.js
export const useMe = () => console.log('Yep, in the code! 😎');
export const dontUseMe = () => console.log('Urgh, didnt make it to the bundle');
// entry.js
import { useMe } from 'module_one';
someFunction();
useMe(); // Yep, in the code! 😎
// 🧙🏼‍main.bundle.js
import { useMe } from 'module_one';
someFunction();
useMe();

Dayum! Our bundle only contains those thing from the source code it actually will need at runtime! 🤯

Why is it?

Now, yeah, it is really cool… but why even bother to do the magic?

Well, imagine the following:

  • You are building some super duper frontend application with the latest framework.
  • You have some use-cases where you have to create some logic that you know somebody else has created before and put to npm.
  • You install this package into your project.
  • This package you just installed is offering about 100 functionalities each of them being super useful in different contexts.
  • But, you only need 3 of these functionalities offered by the very same library.
  • Guess what; prior to tree-shaking you would have included them ALL!

Makes sense to shake the tree and include only what you need, right?

Why does it work?

You want the short answer? Because of ES2015 Modules.
You want the long answer? Read here, it’s a small introduction into modules I gave at my company.
You want a really deep dive? Go here.

How does it work?

Finally let’s build something!!

# create folder and jump into it
$ mkdir myOwnShaker && cd $_
# create entry point, "logic file" and 3 modules
$ touch shake.js ast.js
$ mkdir modules
$ touch module1.js module2.js module3.js
# create package.json and use default answers
$ npm init -y
# install all the package we're going to use
$ npm i esprima escodegen

Things we imported:

We’ll be using Esprima to generate an AST (abstract syntax tree) from the code we write and then turn it back into code using Escodegen just after we excluded the things we won’t use at runtime.

we’ll parse code into a tree structure, do some elimination and turn in back to code

At this point (when turning JS into an AST) it’ll become obvious why static analyzable code (aka ES6 Modules) is so important to the concept of live code inclusion.
Let’s have a look at what the tree looks like for import vs require

// cjs modules
const x = require('module');
const y = anyFunction('');
// its AST as JSON representation
{
"type": "Program",
"body": [
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "x" // here we defined the const x...
},
"init": {
"type": "CallExpression", // ...using a simple fn()
"callee": {
"type": "Identifier",
"name": "require"
}
}

}
]
},
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "y"
},
"init": {
"type": "CallExpression",
"callee": {
"type": "Identifier",
"name": "anyFunction"
}
}

}
]
}
]
}

This is what your code looks like after parsing!

As you can see the x variable, which is supposed to be something from a external module is initialized NO DIFFERENT than any other variable initialized via a CallExpression!

What does an import look like then?

// Native ES Module
import { x } from 'moduleName';
// AST as JSON
{
"type": "Program",
"body": [
{
"type": "ImportDeclaration",
"specifiers": [
{
"type": "ImportSpecifier",
"imported": {
"type": "Identifier",
"name": "x"
},
"local": {
"type": "Identifier",
"name": "x"
}
}
],
"source": {
"type": "Literal",
"value": "moduleName",
"raw": "'moduleName'"
}

}
]
}

HELL YEAH! We can actually tell when the parser hits an import statement!

If you want to explore more and play around with what AST’s look like using different parsers, try out the ASTexplorer

Alrighty, now let us use this knowledge to our advantage

Suppose we’re building some sort of JS application.
We start writing all of our source code and happily importing all of the functionalities we might need.
Which might look something like:

// module1.js
import { namedExport1, namedExport2 } from './module2';
// module2.js
import { x } from './module3';
export const namedExport1 = () => console.log('I am an exported function');
export const namedExport2 = 'Get me imported!';
export const namedExport3 = 'whatever';
// module3.js
import { fn } from '';
export const x = () => console.log('foo');
export const y = () => console.log('bar');
export const z = () => console.log('baz');

Now we’re done and want to get rid of all the code that is somewhere hidden in the trillion modules we’ve installed.

To do so, we will:

  • create a Parser class that extracts the imports and exports of a module and follow the sources of its imports to get all the moving parts
  • create a Tree-Shaker class that will then remove everything that is exported but never exported

note: we will only implement a rudimentary, not efficient mechanism!!
This is only for demonstration.
One should implement proper tree traversal to visit all nodes in an efficient manner.

The next lines of code will get pretty meaty.
Let me give you a gentle introduction what’s going to happen!

  1. we need something to parse the entry module and traverse its nodes.
  2. if we find an ImportDeclaration, we will recursively parse its content and then start at #1 again.
  3. whenever we encounter a node with type: ImportDeclaration we naïvely take its imported name and store it in a Map.

The code could look something like this:

// ast.js
const esprima = require('esprima');
const escodegen = require('escodegen');
const fs = require('fs');
// parser class
class Parser {
// simply set some default values
constructor(entryModule) {
this.importedVals = new Map();
this.exportedVals = [];
this.modulesSet = [];
this.module = entryModule;
// bind bc no transform plugin
this.followImportSources = this.followImportSources.bind(this);
}
// pull in the path and parse its content
parseModule(relPath) {
const codeBuffer = fs.readFileSync(__dirname + relPath);
return esprima.parseModule(codeBuffer.toString());
}
// traverse the tree of module
// look for ImportDeclaration type
// follow imports recursively

extractImports(module) {
const extractedImports = this.traverseSyntaxTree({
AST: this.parseModule(`/modules/${module}.js`),
extractType: 'ImportDeclaration',
recursiveCaller: this.followImportSources,
extractor: (node) => {
// look for the imported key and return its name
return node.specifiers
.map(val => val.imported.name);
}
});
// put the extracted import into our hashmap
extractedImports
.forEach(imp => this.importedVals.set(imp, imp.toString()))
return this.importedVals;
}
// define the function to follow import sources
// either push the module name into the Modules Map
// or don't do anything

followImportSources({ source }) {
const followModule = source.value.replace('./', '');
followModule.length
? (() => {
this.extractImports(followModule);
this.modulesSet.push({
name: followModule,
module: this.parseModule(`/modules/${followModule}.js`)
});
})()
: undefined;
}
// traverse the AST and do whatever
//extractImports function told us to do

traverseSyntaxTree({
AST,
extractType,
extractor,
recursiveCaller = noop => noop
}) {
const { body } = AST;
let extractedNodes = [];
body.forEach(node => {
if (extractType === node.type) {
const extractedVals = extractor(node);
extractedNodes = [...extractedNodes, ...extractedVals];
recursiveCaller(node);
}
})
return extractedNodes;
}
// either return importedVals if we had them already
// or trigger the extractImports fn
get Imports() {
return this.importedVals.length
? this.importedVals
: this.extractImports(this.module);
}
}

uhi… a lot of stuff right? I’m also sorry for not streamlining things 😟

The only thing missing now is the Shaker!!

// right beneath the Parser class
class TreeShaker {
// store the unshaked modules
// shake the modules when initializing

constructor({ Imports, modulesSet }) {
this.unshaked = modulesSet;
this.modules = TreeShaker.shake(modulesSet, Imports);
}
// do static because... well 🤷🏻‍
static shake(modules, importedVals) {
// get all the values from the module map defined in Parser
// and turn them into an array

// btw make a copy of modules otherwise
// you won't be able to compare old vs new

return Array.from(modules.entries())
.map(([, { module: m, name }]) => {
const module = { ...m };
const { body } = module;
const shakedBody = [];
// traverse every body of every module
// look for export declarations &&
// if the exported name is in the array of imports

// we include it in our new body
// also we include everything else thats not an export
body.forEach(node => {
if (node.type === ‘ExportNamedDeclaration’) {
node.declaration.declarations
.forEach(({ id }) => {
if (importedVals.has(id.name)) {
shakedBody.push(node);
}
});
} else {
shakedBody.push(node);
}
})
module.body = shakedBody;
return module;
})
}
// make the original modules accessible
get Unshaked () {
return this.unshaked;
}
// and the optimized modules are of course accessible as well
get Modules() {
return this.modules;
}
}

Holy cow.. this is so much unreadable code 🤬 But in fact the only reason why it look so verbose is that the AST tree is so verbose itself.

Showtime!

// shake.js// create a new instance of Parser and TreeShaker
// initalized with our entry point: module1

const shakeItBaby = new TreeShaker(new Parser('module1'));
// make it one big bundle with new modules
const moduleStringOptimized = shakeItBaby.Modules
.map(m => escodegen.generate(m))
.join('');
// also make one bundled version with the old modules
// note: we have to specifiy the module prop on the module object

const moduleStringUnshaked = shakeItBaby.Unshaked
.map(u => escodegen.generate(u.module))
.join('');
// have a look at how different they look
console.log(moduleStringOptimized);
console.log(moduleStringUnshaked);
// let's count the characters
// do a naive comparison => in percent

const impr = Math.floor(
(
1 -
moduleStringOptimized.length /
moduleStringUnshaked.length
) * 100
);
console.log('IMPROVEMENT: ', impr, '% 🎉');
// IMPROVEMENT: 36 % 🎉

F%&$ YEAH! more than 1/3 improvement with only our very naïve optimization.

Image what we could’ve done to the bundle size if we had put some serious effort into our tree-shaker!

As you can see very powerful optimizations don’t need to be rocket science 🚀
You could go from this rather simple implementation to really awesome algorithms and build tools that will make the web even more efficient!

Thank you for reading!
Please let me know if there is something unclear or wrong. 🤓

--

--