By: Regis Wilson
Webpack can be used to bundle assets for websites so that browsers can load resources at runtime to display data. There are many things that Webpack can do for you, including the following:
- Loaders can transform, compile, transpile, minify, uglify, and compress static assets.
- Plugins can do almost anything beyond loaders, such as optimize bundles, manage assets, and inject environment variables.
This GitHub page includes a subset of the most popular and “awesome” loaders and plugins available for Webpack, to give you a flavor of what’s available.
The difficulty for anyone who isn’t an expert in Webpack is to understand what it does, what it’s capable of doing, and how to apply it to what you’re doing, and to find the correct magical incantations that are required to make it work the way you want.
To make an analogy, let’s say that we really like to drink boba milk tea. If you say to your friends, “Let’s go drink some boba,” they’ll understand what you mean. But when you get to the restaurant, you might say something like, “I’d like a large boba without boba.” This confusion stems from the fact that an actual order of, say, “large black milk tea, no ice, 30% sugar, and without toppings” gets shortened to simply “boba.” Expanding the short order “boba” to “boba milk tea” now makes sense when adding the exclusion “without boba.” The same thing happens with computer configuration files.
At TrueCar, we use Webpack with our React front-end to manage assets for our web pages. We have been struggling with several issues related to our bundle and asset management, including the following:
- Every build would generate unique asset names, which broke the cache in production and caused lots of transfer data costs for us and download time for end users.
- Static assets that never change, such as logos, images, and fonts, were being regenerated each time, breaking the cache.
- Bundle sizes were quite large: there was seemingly a lot of duplication of code in the bundles, there didn’t seem to be any rhyme or reason to the bundle splits, and there wasn’t any reproducibility of splits between builds for comparison.
- Any changes we made to upgrade to the latest versions and features of Webpack ran into serious errors and crashes that we couldn’t diagnose.
- Build times were slow.
Step 1: Use [contenthash] instead of [hash]
The first steps we took to fix our issues was to upgrade Webpack to the latest version. As of the time of this writing, we have upgraded to Webpack 4.41.2. We’ve found it is important to stay up to date with the latest version to get all the good features we want. But we also found that upgrading and having the latest version of Webpack isn’t enough. You also need to keep all of your plugins up to date. This seems simple, but we found it enough of a challenge that we were stumped for months trying to use the latest features available.
There’s a very subtle catch to this, however, because using the [hash] variable includes a lot of inputs that cause it to be unique even when the contents of the file have not changed. What this meant in practice for us was that each build of our CI/CD pipeline created completely unique filenames for each file bundle, regardless of whether the contents had been changed or not. Given that we have hundreds of file bundles and assets, multiplied by the number of software engineers pushing changes every work hour of every workday, we had thousands and thousands of assets being written to S3 and downloaded by all the visitors to our site every single time we pushed out a deployment.
This is like saying, “I’d like a boba” and having to answer questions like, “What size?”, “With or without milk?”, “How much sugar?”, and “Black or green tea?” There’s a lot of hidden complexity in the defaults, and sometimes it’s better to be explicit.
After digging through a lot of outdated, cryptic, and sometimes misleading blog posts, we found an excellent document on enhancing cacheability with Webpack. These steps were clear, simple, and straightforward, and we eagerly applied them.
And then, we immediately ran into problems:
We were very confused, because we were using the latest version of Webpack and even a few of the latest versions of plugins. We tried using all the tricks we could think of to make NPM update the packages related to Webpack. We scoured the issues related to the error message and couldn’t find any link between the error and our use case. Eventually, we were able to figure out by reading between the lines and putting a global picture together in our minds of what the problem was. We discovered by trial and error that the problem was related to an older version of file-loader plugin we were using, which has a seemingly undetected dependency but was indeed incompatible with the [contenthash] directive in older versions of the plugin.
The way we considered this problem was similar to ordering boba for two people and having to answer questions for one order versus the other, even though they are not directly related. An example is if I order “Without ice, 30% sugar” and then have to answer “You want ice?” and “How much sugar?” for the next order. Then, to pour salt into the wound, the first drink might come with ice (instead of no ice) and the second order might come with 30% sugar (instead of full sugar), only because the order was input together in one batch, allowing for contamination of specification labels for parameters in the bundled order.
It genuinely took us a few months of investigation, trial, and error, talking to people, scouring the internet, and ignoring the problem until we found the ridiculously, blindingly obvious solution:
Results for Step 1
With the first step of using only the contents of each bundle in the unique identifier, we had brought our cache-busting CI/CD pipeline down to reasonable levels. We carefully watched our traffic outflows from the S3 bucket and CloudFront distribution that served the assets for any changes. You can see a dramatic difference when the changes were deployed, both in terms of bytes served from cache misses (which drops by over 90%) and the cache hit ratio (which goes up from 98.5% to 99.9%).
Improving the cache hit ratio is important for several reasons:
- Browsers get faster time-to-first-byte by hitting the CloudFront cache closest to them wherever they are in the United States.
- End users do not have to download all new bundles and assets every time we make a one-line change to our site.
- We save money on transfer charges and delays accessing S3 buckets where the images live.
Step 2: Make Bundles Smaller and Reduce Churn
Getting the first step out of the way was making a lot of progress over our previous config, but we still had some outstanding concerns about our bundling process. By analyzing the bundle sizes with the remarkable Node.js bundle analyzer tool, we spotted a lot of opportunities to make individual bundle sizes smaller by using chunk deduplication. You can see from the CI/CD tool that reports on bundle sizes that we had few bundles, but a very large set of them:
The reason was obvious when we inspected the contents of the bundles: the vendor packages were being bundled together with our actual code. This means that we still had an opportunity to cut down on churn in our bundles by separating out vendor packages (which rarely change) from our own code packages (which do change often). We tweaked the configuration somewhat from the documentation to arrive at a solution that we believe strikes a good balance between more bundles (generally, bad) and smaller sized bundles (generally, good):
The few differences from the documentation that we implemented, and the effects they had, are discussed below:
- minSize: We found that this number doesn’t really affect anything for us, but it does give a hint to Webpack that we prefer not to make too many chunks that are too small (it will split them into chunks smaller than this number if it needs to anyway). Setting it too small (like the default of 8KB) fragmented the chunks too much, but mostly we found that this setting didn’t actually do much for us.
- maxSize: Similarly, this metric seems to be largely ignored after a certain size by Webpack (at least in our testing), but it does give a hint that we don’t want chunks to be too large. With splitting, most of the chunks were not large anyway in our testing. Making the number too large seems to do nothing, but making it too small seems to increase chunk count significantly more than we expected. This is similar to ordering boba and specifying “no ice” and then getting penalized with a smaller serving to make up the difference where the ice would be.
- reuseExistingChunk: This setting is on during production mode builds, but we really have no idea what it does when it is on versus off.
- minChunks: This option has a dramatic effect if you touch it, so be careful, and test when you do. We tested with the default value of 1 and found that there were way too many chunks being split out. Having too many small chunks is just as bad as having chunks that are too big individually, and maybe even worse. Setting the value to a higher number than 2, however, seems to make Webpack go in the original, larger-chunk direction again. So we set it to 2 and told everyone not to touch this value again. Please test it for your own use case.
Results for Step 2
The first great signs of an improvement came when we rebuilt our branch with these changes several times. We found that each build was identical, and all chunk names and contents were the same each time. This was excellent, and meant that the cache hit benefits from Step 1 were preserved. Additionally, as we rebased our branch off of new code from the default branch over the course of a week, we were seeing only very small bundle changes based solely on changes to our own code!
This meant that vendor modules and untouched chunks of code that were the same continued to be unchanged. It also meant that our hit ratios would continue to get better, because churn was dramatically reduced. Lastly, visitors to our site who might have some bundles already cached in their browser will not have to download whole new bundles when they return, because most of the chunks would be unchanged. This is like going to a boba restaurant and getting your boba drink served in the perfect way, exactly like you like it in every way.
However, most of the other results can best be described as “mixed bag” to “neutral.” Everything is a tradeoff, and sometimes you make gains in one area only to take losses in another. In the first case, we have many more chunks, but they are significantly smaller. Reading the waterfall charts from the browsers shows an ever-so-slight advantage to showing above-the-fold content. In the second case, however, overall chunk size increases due to some duplication and compression efficiency loss. So complete page load times might increase. This is what our bundle analysis looks like after the code-splitting changes.
Before you jump to conclusions, though, think carefully about the tradeoffs involved. For example:
- More chunks: bad.
- Smaller chunks: GOOD!
- Less churn in chunks: GOOD!
- Chunk cache efficiency increases: good.
- Compression efficiency is lost: bad.
- Total byte size overall actually increases: BAD!
- Dependencies are staggered: good.
- Complex, opaque, and volatile settings in Webpack: BAD!
- Above-the-fold content loads slightly faster: good.
Step 3: Build Times
We did see a decrease (lower times are better) in upgrading from Webpack 3 to Webpack 4, as advertised. However, the amount of decrease and the percentage of decrease was not dramatic enough for us to notice any significant overall improvement to our deploy velocity. In our testing, all of the settings changes listed above did not affect build times for our CI/CD pipeline. Overall, Webpack build times are still a large portion of our build time and are significantly longer than we would like, with no relief in sight. We wouldn’t be happy if we didn’t have more problems to solve, though.