Bundle / code splitting revised

Kim Gysen
6 min readMay 16, 2019

--

Although these optimization patterns have been popularized by bundling tools like Webpack, the idea is not library specific. The intention of this article is to clarify the patterns and their use cases; then the implementation is simply a matter of syntax.

Default case: without splitting

In the default case without splitting anything, the entire Javascript app is generated and downloaded as a single bundle. The standard procedure looks in general something like this:

Default bundling process

The bundling process is usually performed based on a defined entry, which is a single file from where the dependency tree is walked to see which files need to be included in the bundle.

The benefit of this approach is that creating a single bundle with minimal configuration boilerplate is usually an easy process to set up. For small apps with a limited number of Javascript source / vendor files, this approach may be just fine.

For larger apps that require more source / vendor files, this approach has its limitations:

  • Bundle.js will be cached by the browser (we’ll revise how caching is generally performed in the next section). Even a slight change in the source code will require the visitor of your web app to download the entire bundle again, as it invalidates the previous version of the bundle.
  • When your bundle.js contains a multi-page (and thus multi-route) client app, then you’re downloading a lot of routes (rendered components) that the user is not currently visiting on initial page load.

Caching revised

Let’s quickly revise the general flow of browser (aka HTTP) caching, as this will help the understanding on how bundle splitting works.

On initial page visit, an HTTP request is sent to the server to download the bundle (bundle.js).

Request bundle.js

In the HTTP response, the server sends HTTP headers that serve as instructions for the browser to cache the sent resource (e.g. expiration info how long data should be cached).

The ETag that is generated and sent by the server in the HTTP response header is pretty much the unique identifier for the bundle resource: i.e. it doesn’t change if the bundle content hasn’t changed. For example: if you make some change in the source code, a new bundle needs to be generated that includes the modification, and a new ETag will be generated for that newly created bundle.

On the subsequent page visit, if the cached version of the requested resource has expired, Etag and the cache meta headers are sent to the server again in the HTTP request:

Take the requested bundle from browser cache

Based on the headers sent in the request, the server will decide whether or not it should send a 304 HTTP status code, which the browser uses to decide if the bundle.js should be taken from the cache. Or if not, the browser should download the bundle again.

Bundle splitting

Let’s see what all this has to do with bundle splitting.

Bundle splitting means that you split up your app’s full code into multiple smaller bundles, because some scripts within your app will require more caching than others.

For example:

Source & vendor files are split into different bundles

In the example above we assume that vendor libraries are relatively stale and don’t require often to be re-downloaded after initial page load. Source files on the opposite may change more frequently (added features / bug fixes) and their caching requirements are therefore different. In the context of Webpack, these splitted bundles are referred to as chunks. There is an additional HTTP request that is made, but since the HTTP 2.0 protocol supports parallel loading, this won’t be much of an issue.

About chunkhashes

In build tools like Webpack you will find that a hash was added to the filename. So referring to the previous example, we would have filenames that look like this instead:

bundle-source.d587bbd6e38337f5accd.js

bundle-vendor.dc746a5db4ed650296e1.js

This hash (which can have variable length and is referred to as a chunkhash) is a representation of that bundle’s (chunk’s) content as the result of a hashing function. So when the content of the bundle changes, then the hash changes as well, and so does the bundle’s filename.

Page render 2 has an updated bundle-source.<hash>.js file

When these bundles are retrieved on initial page render (first image above), they are stored in the browser cache.

When a user visits the web page for the second time, it will look for these bundles in the cache. If the file hasn’t been expired by the max-age Cache-Control header and the filename hasn’t changed (like the source bundle in the second image above), it can use the cached file and omit a round trip to the server to download it again (unless in case of a 304 http status code as mentioned). In the second example above we can see that the bundle containing the source code has changed. When the browser doesn’t find the requested bundle filename in the cache, it forces a new download from the server.

Conclusively, the source bundle will be downloaded on second page visit because its contents have changed, whereas the vendor bundle will be retrieved from cache because the data it contains remained stale.

Note that bundle splitting does not perform any optimization on the initial page load, as all bundles will still need to be downloaded. The actual benefit lies with reduced resource downloads on subsequent page visits.

With code splitting

Code splitting means that you can get chunks of Javascript code asynchronously through network requests on the fly (on demand during runtime), which means that you don’t need to include these chunks in the initial bundle that is downloaded on page load.

Lazy-load ‘some-module’ from the server

You want to use this when you have scripts that you may require at some point, but that are not immediately needed when the page is loaded. This could concern vendor libraries, but also custom components.

An example could be when you would have some component in your web app that may as well only rarely be needed, but that your app will require to have nevertheless. Let’s say an informational popup. Your original app’s bundle (or bundles in case of bundle splitting) may not download that popup component on page load (thus reducing the bundle size), but fetch it asynchronously from the server when the user decides to click a button to show the popup.

When you use a bundler like Webpack, it recognizes the import syntax and creates a separate bundle for the lazy-loaded requested resource (using chunkhashes if so preferred). Webpack will make it available in the public output directory as defined in the Webpack configuration file (and which the web server uses to make resources available to the external clients).

Opposite to pure bundle splitting without lazy-loading, code splitting (what’s in a name) does optimize the initial bundle size.

But bascially code splitting requires bundle splitting by default (well, your bundles have to be splitted if you want to download them dynamically); you can use the same policy of caching and chunk hashes to force downloading new versions when code has changed.

Conclusion

If you’re starting a new project, this is definitely something to look into if you care about optimization. Whether or not you need it depends on different factors, and so does the strategy that you would choose to implement.

I hope that this article helped you in the conceptual understanding on the topic, and that you may find the best strategy that suits your needs.

--

--