Optimizing Turo’s web-app build and deploy process

Published in

Turo Engineering

18 min readApr 18, 2018

By Francesco Crippa (DSC_7175.jpg) CC BY 2.0, via Wikimedia Commons

TLDR; We made some improvements in the build and deploy process of our webapp leveraging some of Webpack 3 new features and focusing on generating consistent and long term cacheable builds.

Turo’s web-client, is a React / Redux based application which we are migrating from our legacy web application written in JSP + jQuery. Our React application is rendered in a minimal container inside the legacy application, simplifying dealing with some issues while migrating such as authentication. Our first React page was deployed in Spring 2016 and had a simple build-deploy process to match.

Fast-forward to today, it is a much larger app that houses the majority of our core product flows including search, checkout, reservations or the listing flow. That initial production build which was originally meant to serve a few private pages, no longer meets the needs of our current web app.

With our previous build process, we deliberately created a less efficient build for the sake of simplicity and because performance was not a major requirement. It is not still a first-class citizen in our roadmap, but there are some tiny changes we can implement and that could bring huge benefits to users. It was high time for a pit stop to tune our build and deploy process.

Before digging into what has changed in the deployment process, it is interesting to know what the state was prior to the changes.

Old build and deployment process

The React build

The webapp code base is mainly composed of four things:

JS files that contain the application logic.
CSS files.
JSON files with translations.
Images (PNG, JPG and SVG files).

All these files are assembled, compiled and minified using Webpack, which is one of the de facto module bundlers for the web.

This compilation process is done by Team City every time we deploy the webapp.

The build process generates a bundle and a assets folder.

The bundle folder will contain a CSS file and a set of JS files:

A common.css file that contains all the style rules for the web app. It also contains the base 64 representation of certain images that are too small to be served as an individual file (as it would require an extra HTTP call for just a few bytes). This file is not being minimized.
A chunk.<id>.js file that contains the code for a route in the application. We configured Webpack to create a different file for every route. This way we keep a balance between the number of files a user has to download on the first load to run the app and the amount of unused code there is on those files. We treat each route as an independent and isolated micro webapp.
A common.js file that is the entry point to the application. It contains a set of common libraries such as react or redux and part of the code that is shared among different routes.

The translation files end up embedded in the JS code of the route they belong to, or in the common file if they are generic locales.

The assets folder contains all the images that the application is going to use (those that are not inlined in the CSS file). These images are optimized for production and most of the times combined into sprites to reduce HTTP calls. The name of these files will contain the hash of its contents, allowing browsers to cache them, and if a new version of an image is created, its hash will vary.

Once the build has been created, it will be uploaded to an S3 bucket. Depending on the environment (production or test servers) a different bucket will be used. The structure of this bucket is as follows:

There is an assets folder where all the assets for all the builds are placed. Since names contain hashes, if there is a conflict of a file between two builds, it means that the file is identical.

There will be one folder for every build that will contain the JS and CSS files mentioned before. The name of this folder contains the build id from Team City and the commit hash that this build represents.

The JS files cannot be placed together because they don’t have hashes in their names, which means that every build is going to create the same files but with probably different content.

These S3 buckets contain the code of every build that has been produced by team city since the React application was first deployed. In the production bucket, the only builds that are used are the current deployed one and the build that is generated during a deployment process before it gets promoted.

This is what the build looks like:

A close up of the common, checkout and listing flow modules where we can see repetition of some node modules and some of the UI components:

The build size:

If we look at those images we can spot a couple of things:

It is quite hard to know what each chunk represents.
There are parts of the UI module that are spread in all chunks.

Loading from our legacy application

We now have a new build present in S3. But this doesn’t do much. As of today, the React application is being loaded from our back-end Java application in a JSP page. This means that there must be some code in the legacy application able to know where to find the latest webapp version.

In order to do this, there is one more asset that the Team City build process will upload to S3. After promoting the latest build, a version.json file will be added to the root of the S3 bucket. This file contains the name of the folder associated with the latest valid React build.

Whenever a React page is loaded, the first thing that the JS code in the JSP container does is download this JSON file to construct the correct URL where to download the rest of the webapp files, that is common.js and common.css files. The chunk download is being handled internally by Webpack, so we don’t have to worry about doing that manually.

Deployment issues

This deployment process has been useful for us for the past year and a half. It is quite quite simple, easy to deploy and switch versions in production. The configuration files that create this build are also pretty straightforward.

In spite of all this, this deployment process has some issues that are starting to become more important now that we have some public facing pages in React. This is important because, as you will see, most of the issues are related with long-term caching:

The latest React build will always point to a different URL after every deploy, making it impossible for browsers to cache any of the JS files.
The file names for the chunk files do not have a consistent naming and just use an auto-generated id. If a new chunk is created, nothing guarantees that the chunk.2.js that represents the checkout page will still be the chunk 2 in the next build.
The file names are always the same. We are not using hashes in the file names, which means that two source files with the same name and different content will produce the same output file name, making it again impossible for browsers to cache that data.
The common.js contains both third-party libraries and a subset of our source code. The third-party libraries are very unlikely to change over time, except for version updates. Even so, we are forcing every user to download heavy libraries such as react or moment every time we deploy, even if these libraries didn’t change at all.
The builds are inconsistent. One change in a locale file will make all the JS files in the build change (more about this in the next sections).
The React and legacy applications are tightly coupled. The script that takes care of loading the webapp has the names of the files that need to be loaded hardcoded, which means that if we want to change the React build, we have to make changes in the JSP pages.
The way we do code split doesn’t allow to test a route definition. This piece of code uses a Webpack proprietary API to indicate when to generate a new chunk. Our unit tests don’t use Webpack (and this should stay like this) and thus, we cannot test this part of the application (we’ve had some bugs inside these files and we were only able to catch them in production).
Even though having a revert process would be pretty straightforward (just updating the version.json file to point to the previous build), we don’t have one yet.

New build and deployment process

Because of all the aforementioned issues, it was clearly high time to rethink the React webapp build and deployment process.

Using dynamic imports

One of the coolest features of Webpack is that it allows code split, that is, it gives the developer the ability to programmatically specify points in the code where to create a new javascript file in the final bundle.

Not only does this avoid having just one build file with all our codebase, but we don’t even have to care about loading these files whenever our execution reaches the split point specification, Webpack does it for us.

Until Webpack version 3, the only way to do this was via a Webpack proprietary API based on callbacks (require.ensure). However, with the latest version of the library, they switched to the dynamic imports ECMAScript proposal.

This new way of specifying dynamic imports is standardized and is no longer a proprietary API, but its behavior changes a bit, so we decided not to update to the new syntax when we updated to Webpack 3.

The new dynamic import syntax is Promise based and only allows you to dynamically load one dependency as the entry point. The previous version allowed you to specify all the needed dependencies in the code split callback.

This is what a sample route code split using the previous version would look like in our React project:

export default store => ({  
  load(cb) {
    require.ensure(
      [],
      require => {
        const {checkout, paymentMethodPlugin} = require('./CheckoutReducer');
        const {CHECKOUT_PAYMENT_FORM} = require('./CheckoutForms');
        injectAsyncReducer(store, 'checkout', checkout);
        injectAsyncFormPluginReducer(store, CHECKOUT_PAYMENT_FORM.NAME, paymentMethodPlugin);
        const CheckoutView = require('./CheckoutView').default;
        cb(CheckoutView);
      },
      'checkout'
    );
  },
  path: '/checkout',
  private: true,
  routes: [onboardingRoute(store)],
});

One of the main issues with this approach is that any code that is inside the require.ensure callback cannot be easily tested, because we don’t use Webpack when we run our tests.

Inside this callback, we are also forced to use require instead of import, which is the way we load modules in the rest of our code base.

Every call to require inside the callback is telling Webpack that the imported module should belong to the split code chunk.

When using dynamic imports, we can just specify one dependency, so what we do is move all the callback code into a load.js file:

import {injectAsyncReducer, injectAsyncFormPluginReducer} from '../store';  
import {checkout, paymentMethodPlugin} from './CheckoutReducer';  
import {CHECKOUT_PAYMENT_FORM} from './CheckoutForms';  
import CheckoutView from './CheckoutView';

export default function load(store) {  
  injectAsyncReducer(store, 'checkout', checkout);
  injectAsyncFormPluginReducer(store, CHECKOUT_PAYMENT_FORM.NAME, paymentMethodPlugin);
  return CheckoutView;
}

The code is very similar, but we now have the advantage that this can be tested.

Now the route file becomes very very simple:

import onboardingRoute from './onboarding';

export default {  
  loader: () => import(/* webpackChunkName: "checkout" */ './load'),
  path: '/checkout',
  private: true,
  routes: [onboardingRoute],
};

The functionality is still the same, but now we have a few very important advantages:

We can test the route initialization code.
We can call import(‘/checkout/load.js’) from any other part of the application giving us the ability to intelligently prefetch content ahead of time, reducing the perceived loading time for final users.

Analyzing the Webpack dependency tree

Before even thinking about how to improve our previous build process, we needed to have more detail of what is going on with it: how much space the final build takes and where every dependency and file goes to.

Webpack can be configured to create a JSON file with metadata about the build process. There are a few visualization tools that can help navigate and understand this file (webpack-bundle-analyzer is one of my favorites).

This is what our previous build looked like:

We can observe a few things:

The common.js file, which is the entry point of our application is HUGE and most of the things that it contains are third party libraries that don’t change that much.
The components inside src/ui are distributed over all the chunks.
Even though we have code split, there is some availability code that gets added to the common.js file.

There is definitely room for improvement. So let’s get to it!

Putting the build in one folder

One curious thing about our previous deployment process is that the assets of every build are stored in just one place and the JS code is going to be in a different folder every time. It would be better if every build generates a folder with all the content it needs. If assets don’t change, their name and hash won’t change either, and as long as you guarantee that the latest version of your code is under the same URL (e.g s3-bucket/react/latest), then the caching of the assets doesn’t get affected.

When we compile the webapp the following folders are created:

dist, the root folder for the build.
dist/assets, contains the build assets.
dist/bundle, contains the build JS files.

Instead of that, we could have move the same JS files be added to just dist and we could just copy that folder to S3 into one place, instead of having to copy two folders to two different places.

The existing deployment process also forces us to run the application differently than in the local environment when referencing assets:

name: `${isPROD ? '../' : ''}assets/[hash].[ext]`,

Also, the name we give to the entry point of the application doesn’t feel totally right. It should be something like main.js or index.js. The common.js file feels like it points to some kind of library more than the entry point of the application.

I believe in the principle of least surprise and that things should be named after what they actually do. Reading common.js could be confusing.

A vendor chunk

As we saw before, the entry point of the app didn’t just contain the entry point and the backbone needed to run our webapp. It also contained a lot of third-party libraries. It happens that these libraries are used all over the application and it also happens that these libraries hardly ever change. It makes a lot of sense to combine all these libraries (with the exception of some libraries such as D3 or Google Maps that are used in very specific points of the app) into a different and cacheable file. This is how the vendor.js is born.

Extracting the UI package

Another thing that we discovered after analyzing the build is that the components inside src/ui are spread all over the chunks. This folder contains our React UI library, that is, the set of components that we build the app on top of. In practice, it could be treated as an external library, and that is what I ended up doing: telling Webpack to put all the files in this folder inside a ui.js file in the final build.

Name all the things

Up to this point what we have is:

a main.js, a file with the bare minimum to run the webapp.
a vendor.js, a file with the most important third-party libraries.
a ui.js, a file with our component library.
a chunk.id.js, a file for every code split that we do.

This is starting to look good. I didn’t mention it in the file names, but in the final build every file name gets its SHA1 hash appended to it, so we can enable long-term caching.

However, there is still one problem with the name that we give to each chunk. They are not consistent. Webpack assigns them an id which we can’t control and that can change from one build to another and that is not good at all. We want things to be consistent. Imagine the checkout page gets the id 3 in our build, but we now add a new code split point (a.k.a a new page route) and all of a sudden checkout gets the id 4. Even though we didn’t change the checkout code at all, now after deploying the new version, a user that had the checkout code cached cannot use it, because it now has a different id.

We should be using names instead of ids. It seems that the only advantage of ids is that we ’obfuscate’ our file structure, but really, showing chunk.4.js instead of chunk.checkout.js and sacrificing long term caching seems like a really big tradeoff. What’s more, this chunk is going to get downloaded when the user is in the /checkout URL, so we are not really disclosing anything new and we get a huge win in terms of efficiency.

Webpack offers a way to keep the id assignment to chunks consistent, using the recordsPath setting, but it requires us to keep track of the output state of the previous build, keeping our builds from being stateless. The fact that we have to keep track of previous states means that our simple build process would get complicated. It introduces a set of new issues to solve a problem that we can solve using named modules anyway, which allows us to keep the compilation process as simple as it was before.

Also, it looks like the recordsPath option has some known issues as well; (here and here).

The Webpack manifests

We are getting there and things are starting to look great, but there is one important detail that was initially really hard to catch. With our configuration so far, if we compile the code, change a file that belongs to a chunk and compile again, we will see that the hashes of all the JS files that belong to the build have changed. How is that possible?

Doing a little bit of research about achieving long-term caching in Webpack will tell you that the first thing one has to do is split the manifests from your code.

If you don’t do that, every time you make a change to a file, the hash of all the chunks in the entry point is going to be updated, so you are not really taking any advantage of long-term caching, as these chunks will still be re-compiled every time.

There are three things that we have to take out of our build in order to have true long-term caching:

The Webpack manifest. This is a very tiny piece of code that bootstraps Webpack and that depends on every build. It should be independent of our vendor chunk and our entry point (which doesn’t change that much)

new webpack.optimize.CommonsChunkPlugin({  
  name: 'manifest',
  minChunks: Infinity,
}),

This file needs to be loaded before main.js so everything works fine.

Since we use code split, it means that inside our build, we are going to have some code that references the names of the file to be downloaded whenever a new route is visited. Until now our chunk files names were consistent, however now they have hashes, which means that if we make changes to a chunk, our main.js file is going to change too. In order to fix this, what we do is create a JSON file with a map between a file name and its hashed name. This way, main.js always references the same names and uses this chunk manifest to reference the real file. So before loading any JS file we have to load this manifest into a global variable.

This is the webpack-chunk-manifest.json

{
  "vendor": "chunk.vendor.ddd5c4c8c6199efc58ab.js",
  "main": "chunk.main.61ff6dfece4ed049bb3b.js",
  "checkout": "chunk.checkout.9bbb45b23d1ef386cf88.js",
  "dashboard": "chunk.dashboard.4072144f4c31886f1f5c.js",
  "earnings": "chunk.earnings.f6d305555a1d252f1a83.js",
  "error": "chunk.error.8a045c2dfef43e0b7548.js",
  "host": "chunk.host.41ce2b2537f03ab133e4.js",
  ...
  "yourCar": "chunk.yourCar.8c7fe298f8289871ba3e.js",
  "yourCarDetails": "chunk.yourCarDetails.2329c8d4930cfe55e21f.js"
}

This is the initialization of the global variable:

window.webpackManifest = webpackManifest;

In our previous build process with non-hashed files, referencing the React entry point from a JSP page was rather easy. We just point to common.js and common.css and it works every time. This process tight couples React and the legacy application a lot: we can’t change the structure of the React application without modifying a JSP page. Now that we have to load several JS files to run React and given that the name of these files will change with every build (the hashes), the current approach doesn’t work anymore. The solution to this problem is to create an asset manifest for every build that indicates the name and order of every file that needs to be download so the webapp can work. This way the legacy application is less coupled to React as it only depends on a manifest file and its structure, but since this file is created from the webapp, we can change it the way we want.

This is what the webpack-asset-manifest.json looks like:

{
  "js": [
    "manifest.e0d6aa4d32e9a707a2ba.js",
    "vendor.ddd5c4c8c6199efc58ab.js",
    "main.61ff6dfece4ed049bb3b.js"
  ],
  "css": [
    "styles.2584da067bb2877cbc3030511e75ab8a.css"
  ]
}

We now have everything in place to achieve long-term caching.

Minimizing the CSS

During this process, I realized that the webapp common.css file wasn’t being minimized, so adding a line of code in our config file saved us a couple of KBs:

if (isProd) {  
  plugins.push(
    new OptimizeCssAssetsPlugin({
      cssProcessorOptions: {discardComments: {removeAll: true}},
    })
  );
}

Loading from the legacy application

Now we no longer have a common.js and common.css file that we can always reference from a JSP page. Instead, we have two manifest files that contain all the information needed to load React. We had to update the JSP code that bootstraps the webapp so it reads these two files and loads all the dependencies from there:

function load(base) {  
  window.webpackPublicPath = base;
  loadJsonFile(base, 'webpack-chunk-manifest.json', function(webpackManifest) {
    loadJsonFile(base, 'webpack-asset-manifest.json', function(webpackAssets) {
      window.webpackManifest = webpackManifest;
      webpackAssets.css.forEach(loadStyleSheet(base));
      loadJSAssets(base, webpackAssets.js);
    });
  });
}

load(getClientBase(VERSION, CLIENT_URL));

This code is still run in the user browser. We could move this logic server side, by fetching the manifests and avoiding two HTTP requests to the users. Instead of storing the manifests as files, we could have them in the database so it can easily be read by the server can read them from there.

This is an interesting problem that we could try to solve once the new build system and deployment process are in place.

The new deployment process

All the changes that have been described in this post are incompatible with the previous deployment process in Team City and as I mentioned before, it doesn’t allow us to have long-term caching, since every new build points to a different URL (except for the assets).

Another thing that we have been missing in the webapp is a revert process that allows us to go back to the previous version after a failed / invalid deploy.

Because of these two facts, we also updated our deployment process.

Our previous deployment process consisted of two jobs in Team City:

Build and deploy. This job would check out the latest webapp version from Github, compile it and upload it to a folder the S3 bucket (the name of this folder is the build id + the commit hash). It would also put the images in a shared assets folder in the bucket. It would generate a version.json file with the name of this folder. This file is shown as a Team City artifact.
Update version. This job would upload the version.json file to the root of the S3 bucket. This file is what tells the legacy application where the latest build of the webapp is.

Our new deployment process Team City jobs look like:

Build and deploy. This job is very similar to the old one, with the difference that it will upload the whole build to the same folder in the S3 bucket (there is no distinction between images and JS code). The builds are going to be saved in the /builds folder. At the same time, this build will upload the Webpack manifest files to the /manifest/next folder. It will also upload a version.json file to this folder so we know what git commit and Team City build this code represents.
Promote. This job replaces the Update version job. The files for every React build will always be in the /builds folder of the S3 bucket. Since every file has a unique name (the hash), there will not be any conflicts. This job will take the contents of the manifest/current folder and place it in the manifest/previous folder. After this is done, it will put the contents of the manifest/next folder into the manifest/current folder.
Revert. If after promoting something is not working correctly, we can always go back to the previously deployed version. This job will take the contents of the manifest/previous folder and place it in the manifest/current folder. The manifest/current folder is where the legacy application is going to try to download the webapp manifests by default.

This process allows us to easily go back to the previous stable version and enables long-term caching as the latest webapp build will always be under the same URL.

Conclusion

This blog post has described the thought process behind the implementation of the new webapp build and deploy process:

We now have a more organized build, less coupled with our legacy application and that embraces long-term caching.
We can now test the load process of every route, a critical part of the webapp code base that was not possible to test until now.
We have a revert step in our deployment process.
We have support to prefetch content ahead of time improving the user experience.

This is what the final build looks like:

And this is a closeup where we can see that there is less repetition between checkout and the listing flow, as now we have a vendor and a UI module:

The build size stays almost the same, but the size of the modules gets redistributed:

Resources

While working on this, I used a few articles and resources that were really helpful and provided a lot of insight. Apart from all the libraries’ documentation, I found these three articles very useful: