Speeding Up NerdWallet 🚗💨

Our recently spun-up Frontend Infrastructure team now has more than 6 months under our belt and site performance has been one of our main focuses.

This post summarizes a handful of the macro level things we’ve done to improve and maintain page load performance of nerdwallet.com.

Huge shout out to the product teams that did the brunt of the work to consume these changes as well as implement their own optimizations on individual parts of the site.

Time To Visually Complete (3G)

Removing Unused Global CSS/JS

I’m not going to point fingers but ☝ is a problem. A real, real problem.

NerdWallet was originally created as a PHP monolith. During my time one of the largest overarching initiatives has been to move from this monolithic stack to microservices mostly built in Python and micro-apps built with Node/React.

The large amount of CSS we used to serve globally was intertwined in these micro-apps and even in our base React components. This effort was tedious tech debt cleanup but here’s the net impact after removing this unused code that was being served globally:

  • 50kb (gzipped) of JS (190kb parsed) removed
  • 30kb (gzipped) of CSS (200kb parsed) removed

Web Font Optimization

I wrote a full blog post about this so you can read more about it there.

TL;DR: Font optimization is quite complicated nowadays but a concerted effort can have meaningful reduction in time-to-first-paint (in our case, as much as a 30% drop). In our case we are subsetting critical fonts and loading them as soon as possible via <link rel="preload”>.

Server Side React Render Cache

Psuedocode of a least-recently-used cache around renderToString

When we first migrated to React, one of the must-haves was that we couldn’t sacrifice the risk of losing search ranking by client-side rendering our applications — we had to use server side rendering to produce the HTML needed for easiest GoogleBot consumption.

However the React render can take a non-trivial amount of time. If you have a really large tree, it’s not uncommon for the render function to near 100ms. And 100ms per request spent on a synchronous, blocking call means a maximum of 10 requests per second per instance. That’s not good.

Since React’s render is a pure function, we can memoize based on all the inputs (e.g. redux’s store state and the react-router location). In a cache hit this can be an order of magnitude or more reduction in time to render, improving performance for that page as well as throughput capabilities.

Nearly 50% reduction in overall server side render pre/post

Centralized Build Tool

Webpack configuration can be fairly complex. Since we have dozens of individual web-apps that all run webpack themselves we devised a solution to share webpack configs across all of these applications so they could be centrally managed. We landed on using a light wrapper around Electrode’s webpack-config-composer because it can be expressed as a plain object.

With this tool in place, whenever we make a performance optimization via our webpack config, we just update in one place and the benefits are propagated out to all apps when they upgrade / redeploy.

Example of the interface producing a ready-to-go browser webpack config

The above would produce a webpack config with sane defaults to be consumed in a browser, while allowing complete customization, in this case specifying a custom value for the minChunks option to be passed into CommonChunksPlugin.

Upgrading Webpack

As a part of creating this centralized webpack interface, we upgraded our version of Webpack. One challenge we ran into when upgrading to webpack 3 (this was prior to webpack 4 being released) from webpack 1 is that the dedupe plugin was removed and thus our bundles got bigger.

This was unexpected since we assumed the later version would be better for performance. We ended up rolling our own webpack dedupe plugin to produce the same functionality.

Babel

When transpiling with Babel, we use babel-preset-env, browserslist, and our site’s google analytics data to compile our JavaScript for supported browsers based on traffic usage.

When we update our traffic usage from google analytics, as apps re build/deploy the JS supported will be automatically transpiled to reflect the browsers we support.

Images

Lazyloading an image with a low-res placeholder inlined.

We built a React Image component to codify best practices (e.g. <picture> , srcset, sizes) and support lazy loading to improve perceived performance.

By lazy loading we can ensure we are only downloading images when the user will see them, and by using an aspect ratio box we can avoid image reflows.

Codesplitting

Godspeed those who attempt the server-rendered, code-split apps

This is a real quote from react-router that was live on their site until very recently. Codesplitting a server side rendered app is not as battle tested and only recently has there been solid tooling to do such a thing.

We ended up leveraging the suite of react-universal-component packages to handle CSS and JS based codesplitting. This allows us to achieve CSS and JS codesplitting that will work server side or client side.

We’ve had some challenges around setting this up, codesplitting from within a nested package, and codesplitting / css modules not playing nicely. However this has allowed us to do route-based or component-based codesplitting, and split out things like large visualization libraries to their own bundles that are lazy-loaded.

At this point `CodeSplitRoute` is just a normal React component 👍

CDN in front of nerdwallet.com

This was probably the single biggest optimization that improved our page load performance site-wide.

Thanks to a large effort from our DevOps team we put Amazon’s CloudFront in front of all traffic on our entire website. This was a huge win because CloudFront has many data centers from which it operates and our customers are now opening up connections with the closest CloudFront location to them rather than going all the way to nerdwallet.com which is not hosted in nearly as many locations. Additionally, even after the connection has been established, since nerdwallet.com is hosted in Amazon’s S3/ECS, we are able to leverage Amazon’s internal routing rather than the public internet to get content to users as fast as possible. Overall, this helped dropped our site-wide time to first byte by ~20%.

Time to first byte (milliseconds)

The second large change associated with this is that we were able to consolidate all of our assets to our top level domain. For example, instead of referencing assets on the CDN at cdn.nerdwallet.com we now can use www.nerdwallet.com/cdn. This results in faster time to download assets in the critical render path over HTTP2 as we don’t need to handle a new DNS lookup/TCP connection/SSL handshake — the browser can leverage the existing connection opened from the top level domain.

All critical assets are downloaded faster 🚗💨

Asset Prioritization

CSS / Fonts / images

<link rel="preload" as="font" type="font/woff2" href="..."/>

For assets in the critical render path impacting SpeedIndex, we leverage the preload attribute where we can and have moved our JavaScript out of the <head> so the browser can potentially paint as much of the page without having to make many requests in serial.

JavaScript

<script src="..." defer />

We use defer for our core JS bundles over async because defer guarantees order and guarantees the main thread won’t be blocked until the DOMContentLoaded event. In practice we saw async scripts sometimes being evaluated much earlier than we’d like to see — ahead of image paint or similar.

Challenges

We have seen some unexpected behavior with the HTTP2 implementation of Chrome and/or Cloudfront. In particular, Chrome specifies assets of a lower priority in a linear dependency. What this means is, as an example, that there might be tiny image files that are waiting for much larger JavaScript files to completely finish downloading — these assets are downloading in serial not in parallel.

This required us to not preload our JavaScript bundles since they can be large and we didn’t want them to block the loading of much smaller images, and Chrome gives a higher priority to preloaded JS assets than images.

Building a Culture of Performance

In order to maintain and improve our site performance, we needed to build a culture that values site performance and measures it on a regular basis.

SpeedCurve

This is probably the biggest change we’ve made to ensure we are on top of our performance. We’ve adopted SpeedCurve and made SpeedIndex one of the key metrics we monitor across all of our frontend teams. If you don’t use SpeedCurve, we highly recommend it.

Site-wide 3G SpeedIndex (via https://github.com/parshap/speedcurve-data)

Automated Analysis of Performance Impacts of PRs

Vlad Silin, an intern on the FEI team for the past quarter built an awesome tool to provide performance related feedback in the form of a Github comment when developers make PRs. Read more about this in Vlad’s post.

Culture

We’ve held performance-related workshops, talks, captured performance data in our data warehouse to correlate page performance to business performance, and in general been advocates where possible to make performance a core part of frontend engineering at NerdWallet.

Upcoming

We still have a long list of things we want to do. Some things we plan to work on in the near future

  • Inline CSS under a minimum threshold via Google’s Nginx pagespeed module
  • Improve our image processing pipeline — support requiring an image require('../my-image.png')from within JS and generate responsive image sizes either at build or on the fly
  • Service Workers for better offline support
  • Upgrade versions of some of our key packages (webpack 4, React 16)

Enjoy this type of work? We have an opening on the Frontend Infrastructure team particularly to work on our internal design system.

Thanks to Parsha and Vlad Silin