Demystifying JavaScript Bundlers

The good, the bad, and the bundler.

8 min readFeb 18, 2023

In the previous post (Part 4: Module Resolution & Task runners), we went back to the future (actually in the past) and retraced the history of how the industry incrementally fixed the issues around stitching a bunch of modules together.

One might wonder what else the JavaScript ecosystem could invent because it looks too simple.

Here are a few ideas:

Code Splitting
Lazy Loading
Loaders for other files than .js or .ts (.csv, .jpg, .svg…)
Tree Shaking
Cyclic dependencies
Development Server
Hot Module Replacement
Module Federation
and much more.

All these features revolve around module resolution, but are either executed at build or run time depending on the nature of their objectives.

Some, like the HotModuleReplacement, are executed at build time more specifically for development purposes, every code change triggers a smart recompilation.
Some others, like Lazy loading, require both a build and a run time action. Bundles are split beforehand and inserted in wrappers that load them under particular circumstances at runtime.

This is where the bundlers kick in.

️️🛠 Bundlers

To better grasp what bundlers are and what they provide, let’s dive into the internals and architecture of one of the most known and used bundler: Webpack.

Most of the other bundlers implement a similar architecture, so even if you use a different tool than Webpack, knowing this might at least help you build a clearer mental model of how your bundler works.

Webpack Architecture

Webpack has an event-driven architecture where most internal pieces can be customized through plugins or loaders (we’ll detail those later). Most of the Classes used in Webpack internals extend a Tapable class that implements hooks that can be triggered at almost any step of the compilation lifecycle.
So keep in mind that each internal built-in step we’ll highlight can be enhanced with custom behavior through pub/sub events (and that’s one of the reasons why Webpack has been so popular).

The tool has been implemented with a “configuration over code” philosophy. Indeed, previously task runners were about coding the intended behaviors yourself. Here, the event-driven/plugins/loaders architecture is coupled with a configuration-based implementation where a file defines the wanted behaviors and building steps.

Webpack expects a webpack.config.js configuration file to be found at the root of the repository.

From there, let’s break down Webpack’s building steps:

Validation: Checks if the provided configuration file is valid and respects its predefined JSON schema.
Compiler: Initiate a compile process, where the given configuration entry-point path is given as input. (compile(entrypoint))
Path resolver: The provided path is resolved with additional metadata (context, request …).
Module resolution: Injects additional metadata.
Module Factory: Collects the source code from the file found by the path resolver and creates module objects.
Lexer/Parser: An Abstract Syntax Tree is built from that source code.
Template: The AST is consumed to generate the final JavaScript bundles through several types of templates (bundle, module, dependency).

Let’s walk through all that a bit slower

In order to untangle all this, we’ll walk through the whole process through a tangible example.

Consider a source code composed of the following files:

fileOne.js

import { functionTwo } from 'fileTwo'
import { data } from 'data.json'

functionOne() {
  return functionTwo(data)
}

export default functionOne

fileTwo.js

import styles from 'styleSheet1.css'

export function functionTwo(data) {
  <h1 className={styles.h1}>data</h1>
}

styleSheet1.css

.h1 { font-size: large; }

data.json

{
  "lorem": "ipsum"
}

webpack.config.js

const path = require('path');
const HtmlWebpackPlugin = require('html-webpack-plugin');

module.exports = {
  entry: {
    main: "./fileOne.js"
  },
  output: {
    chunkFilename: 'bundle[id].[contenthash].js',
    filename: 'main.[contenthash].js',
    path: path.resolve(__dirname, "dist")
  },
  plugins: [
    new HtmlWebpackPlugin({
      title: "My shiny app"
    })
  ]
};

Here’s what happens to those sources:

webpack.config.js is validated against Webpack’s config JSON schema
A Compiler instance is poped and is given as input the entry.main: “./fileOne.js”
The Path resolver transforms it into an absolute path: “complete/path/to/fileOne.js”
The Module resolver adds metadata to that path: “{ path: complete/path/to/fileOne.js, context: Context, request: Request…}”
The Module Factory loads the file’s source code:
import { functionTwo } from 'fileTwo'
import { data } from 'data.json'
functionOne() {return functionTwo(data)}
export default functionOne
It then parses the file to find imports, in our example:
import { functionTwo } from ‘fileTwo’
import { data } from ‘data.json’
The Module Resolver will now recursively re-iterate this process until the whole dependency graph is built up. If we skip a few round trips this process ends up with the following imports list:
import { functionTwo } from ‘fileTwo’ -> complete/path/to/fileTwo.js
import { data } from ‘data.json’ -> complete/path/to/data.json
import styles from ‘styleSheet1.css’ -> complete/path/to/styleSheet1.css
And the following dependency graph:
[fileOne.js -> [fileTwo.js -> [styleSheet1.css], data.json]]
Note that this Module resolution step uses Loaders to inspect and extract the correct metadata in files that match any rules defined in the configuration file.
At this step, the Compiler (consider it as the main or the whole build-process orchestrator) has access to the dependency graph and the modules list, out of which it builds up an Abstract Syntax Tree.
Since Webpack understands only JavaScript, any other file extension than .js will be translated into JS and injected in the AST (for instance, images will be cast into base64 objects and inserted in the AST as such).
3 types of templates (chunk, module, and dependency templates) are used to generate the outputted code which is afterward inserted in the /dist folder.
Depending on the module type, the corresponding template type is hydrated with the corresponding module source code.
This is also the step where Plugins are executed.
In our case, the HtmlWebpackPlugin creates an index.html file in the /dist folder and dynamically injects the newly created JavaScript main bundle (quite powerful to always have an up-to-date index.html when the hashes are dynamic):

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8" />
    <title>My shiny app</title>
  </head>
  <body>
    <script src="main.4hijz093jfe.js"></script>
  </body>
</html>

And voilà, after that eventful journey, the build process outputs a shippable program runnable by a web browser.

But what about the other features we mentioned earlier, that are provided by modern build systems?

Code Splitting

This feature aims at cutting a front-end application into multiple pieces of different sizes called bundles.

These bundles form an oriented graph of interconnected bundles, that optimizes the browsers’ cache invalidation to only re-fetch the bundles that actually changed between two production deployments.

Another benefit of code splitting lies in page loading time enhancement (see how code splitting is awesome when combined with HTTP2 response multiplexing).

From there, you can easily define loading priorities depending on the criticality of your bundles’ content and drastically improve your frontend app’s vital metrics (TTI, FCP). Be creative, and adapt your loading priorities strategy depending on your application's key features.

Lazy Loading

The upgraded version of code splitting is the ability to load bundles only when required: on-the-fly lazy loading.

Indeed code splitting is great but will in lots of cases pre-fetch and fetch much more code than what’s actually needed.

When users interact with your application, they usually use a limited set of features as they’ll have a specific goal. They rarely click on every button or navigate to every page of an application during a single session.
This means that all the code that represents the unused features is loaded in the browser for nothing. Multiply this by the number of users of your platform and you’ll get a glimpse of the magnitude of wasted network traffic and CPU usage.

To fix that issue, Lazy loading allows loading bundles on demand when they’ll be requested by a user action.

In the above example, a route-based splitting strategy is used. Each route of the app is a bundle, and those routes are then subdivided into components.

At load time the main router is loaded and loads the default route1. When a user navigates to route2 by clicking a button, the bundle2.route2.js bundle is lazy loaded by the browser before being immediately executed by it.

Once route2 is rendered (let’s say it contains a form), the last bundle (bundle3.table.js) is loaded only on form submission and won’t be if the user deiced not to complete it or cancels the action.

Code is loaded on-the-fly when needed.

Tree Shaking

Another awesome feature delivered by builders is the Tree Shaking.

Long story short it removes all the dead code from being included in a bundle. During the AST build-up step, the compiler only keeps the functions or variables that may be executed.
In other words, if a function is conditioned to the resolution of an if statement that is always false the Tree Shaking will remove it from the final bundle code (of course there are much more advanced use cases than if checks).

This allows highly optimized bundle size for your application.

Development Server

Webpack also provides a development server out of the box. It does every step of the bundler except the last one which persists to disk the generated JavaScript bundles. It simply keeps the whole Abstract Syntax Tree in memory and serves the bundled source code as an API.

During development, it allows performing surgical re-compilation of any file change and keeps the rest of the app compilation cached. This enables a very short feedback loop during development which is key to quality and creativity.

Module Federation

This feature deserves its own post such that it is (in my humble opinion) revolutionary.
I’ll surely write something about it soon, but in the meantime take a look, it’s worth the ride! (Did you say MicroFrontend?)

Conclusion

Bundlers were born as a way to fix the module resolution and inclusion order but ended up providing much more than that.

Lately, for every new tool appearing in the ecosystem, an avalanche of benchmarks pops out the next day to evaluate how well it does and if it can compete with the top dogs (and by the way often compare apples with bananas…).

Webpack does a lot of things for a single-threaded piece of software. The overall building process performances would greatly benefit from being run in a multi-threaded environment. I’m sure you’ve noticed the release of a fairly big amount of tools built on top of Go or Rust (Turbopack or esbuild), well these languages haven’t been randomly picked.

Some other tools propose clever optimizations through internal architecture upgrades like ViteJS which compiles required modules on-the-fly.

Let’s take a look at the tooling landscape in the next and last post of this series! Part 6: Tooling Landscape

Thanks for reading!
👏🏻 Give me a clap and “follow” if you enjoyed this series.