Vendor and code splitting in webpack 2

Webpack is an ambitious, powerful tool for bundling modern web applications. Unfortunately, its complexity can make it daunting to learn. While the team has made incredible strides improving the docs, there are still a few places which remain counter-intuitive. In this post I’ll introduce and discuss those features which I found most difficult to learn, namely code splitting, and managing bundle sizes and contents. Then I’ll finish with a smattering of webpack tips, and solutions to some of the things which tripped me up when first getting started.

This post will only superficially cover things like minification, transpilation, the webpack development server, etc. These areas have been covered in depth elsewhere, in particular the fantastic SurviveJS book.

Bundling and code splitting

Let’s start by getting webpack to bundle an application, and see how we can tweak and optimize its default code splitting mechanism.

Preliminary setup

As with my prior posts, this will use my Booklist project for the code. See the repo’s readme for a fuller description, but essentially it’s a fully-functioning book-tracking web application I use to try out new tools.

The site listens for changes to the URL hash, and loads the part of the application the user is on; for example, the user might click from the books-list section, to the section where she can enter new books, etc. As the user browses to these different sections of the application, the required code is loaded via System.import. It looks like this

Note that System.import was recently changed to just import, but since much of the tooling hasn’t caught up, and since webpack still supports either, I’m sticking with the deprecated form for now

When weback builds the application, it’ll search out all calls to System.import, and create split points for each; each created split point will only be loaded when the corresponding System.import is executed in code. In addition to these split points, webpack will of course also start at the application’s entry point, and walk all its dependencies, creating a single, bundled entry point. This will be the primary code for your application, which will need to be loaded with a regular script tag in the main htm page.

Before I start with the webpack code, note that I won’t go over every path to every npm utility I use. Check out the package.json file in the react-redux directory to see everything that’s needed.

Basic build

What would the simplest possible code look like that would bundle this application to something that could be executed?

Let’s break that down:

  • entry — this is the entry point of the application. It’s where the application starts. It’s what pulls everything else in, and gets the application doing things. For me, that’s reactStartup.js.
  • output — this tells webpack where to create the resulting bundle. Again, you’ll need to create a script tag in your site to load this script.
  • resolve — this is some basic housekeeping. My simple-react-bootstrap has a main field in its package.json with un-transpiled files with .es6 extensions. I should probably fix this, since it confuses some tooling, but for now I’ll alias that away. And I’m also aliasing an old — not-on-npm — JavaScript color picker called jscolor.
  • resolve.modules — this tells webpack where to go searching to resolve import statements it finds while parsing the application. node_modules is of course for npm utilities, but more interesting is that I needed to add ./. This allows webpack to find modules with an absolute path. For example, rather than link to ../../applicationRoot/components/button I just link to applicationRoot/components/button. Unfortunately I was unable to find a single configuration item that would simply tell webpack “this is the base for all absolute paths.”
  • module.loaders — this sets up my Babel transpilation.

What does the output look like?

We can see bundle.js at the very bottom, and the files 0–5.bundle.js representing the 6 split points I added before. Ignoring for the moment that the main bundle is huge (remember there’s no minification), it’s not really clear what’s in our bundles.

Improving the output

Let’s add the webpack-bundle-analyzer, like this

(note the new require at the top, and the new plugins section at the bottom)

Now, just like that, after we run webpack, a new browser tab should open showing this

This is a fully interactive data visualization of our build. It shows the bundles in order of size, with our massive main bundle up top, followed by our async chunks 1, 0, 2, and then 3–5 are sized too small to see. The visualization lets you zoom in as close as you want to see every individual bundle anywhere, and displays info for each, such as size, gzipped size, path, etc.

Breaking up our main bundle

As a crude, simple, easy win for making our main bundle smaller, I’ll pull out all npm modules with the CommonsChunkPlugin — see the plugins array at the bottom, in particular the minChunks function, in which I filter in everything that comes from node_modules.

Which turns the build into this

So now the node-static bundle is the largest, by far. Admittedly this is because we’re running an un-minified version of React. Things would look much better with minification, but since this isn’t preventing us from seeing how webpack works, I’ll just press on.

Why are there still npm items in the code split bundles?

Notice that async chunks, like 0.bundle.js, still have things from node_modules hanging around. Why didn’t our prior bundle pick them up?

CommonsChunk will pick up either chunks from the initial bundle, or chunks from the code split bundles, but never both. This makes perfect sense when you think about it: 0.bundle.js uses things like react-dnd. There’s absolutely no reason to have this library loaded initially, before that code split bundle is even loaded; in fact, the user may never choose to load this part of the application, so preloading it could turn out to be a complete waste.

So how DO we use CommonsChunk with code split modules?

The CommonsChunk config has an async property. We’ll use that property to provide the name, which will cause webpack to only search through our code’s async split modules.

The code below creates an async commons chunk with react-dnd, and its helpers.

Which produces

Excellent. The react-dnd chunk was created with what we wanted. Notice how I chose to build the react-dnd build based on specific npm utilities. Lastly, I’ll add a catch-all async chunk, which bundles everything that’s used in two different code-split bundles by simply checking the count argument passed to minChunks.

Which produces

Which works as expected. Our react-dnd bundle has what we asked for, and the used-twice bundle has everything that would otherwise sit in two separate code split bundles.

What’s especially interesting here is that these same results are obtained no matter the order the async CommonsChunk plugins are listed. Specifically, if a module is used in two locations, and one CommonsChunk instance grabs it via context-path, other CommonsChunks will be smart enough to, conceptually at least, treat that module as having count 1, no matter the order these CommonsChunks are listed. What’s even more interesting is that if you manually create two async CommonsChunks containing the same module, which you pull in by path, webpack will automatically de-dupe it, and leave that module in the last CommonsChunk that asked for it.

This means, as far as I can tell, that webpack runs the list of these CommonsChunks in order, resets the count to 1 on each module it adds, and yanks qualifying modules from any prior CommonsChunk that currently has it.

Refining some things

Rather than providing a manual filename for each CommonsChunk, webpack allows you to just specify a general pattern, which is then applied by name, automatically. It’ll look something like this.

So each static bundle will have a file name of [name]-bundle.js where name is the name we provide in CommonsChunk. Async splits will be given the name [name]-chunk.js. Unfortunately, for these async chunks the name will be auto-generated numbers, so you’ll have 0-chunk.js, etc. There are currently open cases to improve on this.

Also note the publicPath property. This tells webpack what path to look for async chunks in, at runtime. For example, the 0-chunk.js chunk will be requested from react-redux/dist/0-chunk.js.

Splitting out the webpack runtime

There’s a neat trick listed in the webpack docs here, whereby we can extract the webpack runtime, which contains references to all bundles and chunks anywhere in the build, into a separate bundle. The reason we might want to do this is because this code will change frequently, as anything in the app changes, and if our react code is residing in the same file, we might unnecessarily invalidate the cache of that file.

It looks like this

new webpack.optimize.CommonsChunkPlugin({ name: 'manifest' }),

Be sure to actually load your static build files

Be sure to add script tags for each of these static build files, with the manifest file being listed first (the webpack runtime needs to load before anything else). Just add regular script tags to whatever htm file is the root of your SPA.

Where to, from here?

As it is, my static, shared node_modules bundle has every npm utility the main, initial bundle of my application needs. My code split bundles are all separate already, with anything that’s used in more than one place automatically pulled into a shared chunk. And I have react-dnd, and its dependencies pulled into its own async chunk.

As this application grows, I’ll periodically just run the build and analyze the bundles visualization. As my static node_modules bundles grows (RxJS, D3, react-router, who knows) I’ll likely break it apart further using the same approach as above. The same goes with the async chunks, particularly the catch-all used-twice chunk. I imagine more and more things will wind up in there until it, too, needs to be split up further. Webpack, and the nice visualization plugin make this simple.

Odds and ends

Here’s a smattering of tips I had to sort through, which may be helpful if you’re just starting out with webpack.

babelHelpers is undefined

I eventually switched babel from the es2015 Babel plugin, to es2015-rollup to make the most of my ES6 modules. This caused the babel-external-modules to automatically be applied, which presumed that a babelHelpers global variable existed, with various helpers like classCallback, etc. Follow the instructions here to create this file, and be sure to load it from a script tag (first) as well.

webpack-dev-server

Be sure to install it globally. Starting it from a local installation didn’t seem to work, no matter what I tried.

Be sure to set up proxies for it in your webpack config. The dev server will serve your (webpack-created) js assets from port 8080 by default, but you’ll still want all your ajax requests, requests for css files, etc., to be processed as normal. If your “real” application is running on, say, port 3000, just proxy the relevant requests like this.

Bundling for production

Webpack ships with a nice -p flag that will flip the process.env.NODE_ENV flag to 'production' (libraries like React use this to produce a production build), as well as minify for you.

For some reason, on Windows 10 at least, process.env.NODE_ENV was always undefined in my webpack.config file, so I needed to give it a nudge. My Windows-based production npm build script looks like this

"react-redux-build": "cd react-redux && rm -rf dist && set NODE_ENV='production' && webpack -p"

Why did I want to read process.env.NODE_ENV in my webpack config file? For now it was just to prevent the bundle visualization from running. How did I do that? …

Remember, the webpack.config file is “just JavaScript”

If you’re wondering how to do something in the webpack.config file, remember, it’s just JavaScript. If you want that BundleAnalyzePlugin to run only when not doing a production build, there’s nothing stopping you from just throwing a ternary operator in the array of plugins, and then filtering out the possible null at the end.

I think the define plugin may automate this, but for a simple one-off, don’t be afraid to just use some simple JavaScript. Besides, in the end, my check wound up being more complex than this: I don’t want the visualization if NODE_ENV is true, or if there’s a -p command line argument, or if I’m running from webpack-dev-server. I don’t know if there are native webpack configuration tools that can handle all this, but even if so, for me, a few lines of JavaScript are simpler and cleaner.

This advice also applies to the rest of the config file. If you have repetitive bits of configuration — webpack-dev-server proxy entries, CommonsChunk plugins, etc — there’s nothing stopping you from creating helper functions which generate these bits of configuration, and calling them inline, just as you’d refactor any other bit of repetitive code. It’s just JavaScript.

Final Webpack Config

Here’s my final webpack config. It’ll probably change some more — check out webpack.config.js in the react-redux directory of the github repo to see the absolute latest.

Just from above, some changes I’ve already made were to add stage 1 and above transpilation, add a separate babel entry just for my simple-react-bootstrap, so I could integrate the raw ES6 code from the project, rather than the bloated transpiled version (which included all the babel helpers discussed above), and I’ve refined the static node build to just have react in it.

Conclusions

I’ll end by briefly describing what this application’s build was like prior to webpack. I was using SystemJS to load my scripts on demand, and I was using its related bundler for production deployment. While ostensibly more convenient, SystemJS as a loader meant that I could only use npm utilities by manually setting up paths to UMD builds, which were often far, far larger than needed. As to the manual building, it was an immense pain for even an application as small and simple as my booklist. Here’s what it looked like just prior to switching to webpack

That build was my best attempt at creating bundles for all split points, while keeping shared utilities in their own, on-demand bundles. How did all this manual work compare to webpack?

The 30–40K figure admittedly includes the 18K for systemjs itself (which is no longer needed).

Still though, without webpack I was pushing down 18K for a script loader just for the privilege of pushing down an extra 12–22K of code to my users.

What was that 12–22K? Part of it was the more efficient builds that webpack creates. Because of how SystemJS works, it would create bundles that looked like this

System.registerDynamic('full/path.js',['full/path/A','full/path/B']

while webpack would create something more like

webpack=webpackJsonp([4,10],

With lots of small modules that can add up. Also, my hand crafted attempt to pull out shared code into on-demand utilities was terribly naive, and resulted in far too much code being prematurely sent down.

However frustrating webpack can be, I promise you the alternatives are worse.

Did I miss anything?

If you know of a better way to do any of this, please either leave a comment, or send me a tweet.

If you have questions about how something works, or why your code won’t work, you’ll be far better off asking on Stack Overflow. I’m still a beginner at webpack, so I likely won’t even know the answer, to say nothing of how much faster you’ll get a response there.

Happy coding!