Twitter Lite and High Performance React Progressive Web Apps at Scale
A look into removing common and uncommon performance bottlenecks in one of the worlds largest React.js PWAs, Twitter Lite.
Creating a fast web application involves many cycles of measuring where time is wasted, understanding why it’s happening, and applying potential solutions. Unfortunately, there’s never just one quick fix. Performance is a continuous game of watching and measuring for areas to improve. With Twitter Lite, we made small improvements across many areas: from initial load times, to React component rendering (and prevention re-rendering), to image loading, and much more. Most changes tend to be small, but they add up, and the end result is that we have one of the largest and fastest progressive web applications.
Before Reading On:
If you’re just starting to measure and work toward increasing the performance of your web application, I highly recommend learning how to read flame graphs, if you don’t know how already.
Each section below includes example screenshots of timeline recordings from Chrome’s Developer Tools. To make things more clear, I’ve highlighted each pair of examples with what’s bad (left image) versus what’s good (right image).
Special note regarding timelines and flame graphs: Since we target a very large range of mobile devices, we typically record these in a simulated environment: 5x slower CPU and 3G network connection. This is not only more realistic, but makes problems much more apparent. These may also be further skewed if we’re using React v15.4.0’s component profiling. Actual values on desktop performance timelines will tend to be much faster than what’s illustrated here.
Optimizing for the Browser
Use Route-Based Code Splitting
After much wrangling, we were finally able to break up common areas by routes into separate chunks (example below). The day finally came when this code review dropped into our inboxes:
Adds granular, route-based code-splitting. Faster initial and HomeTimeline render is traded for greater overall app size, which is spread over 40 on-demand chunks and amortized over the length of the session. — Nicolas Gallagher
Our original setup (above left) took over 5 seconds to load our main bundle, while after splitting out code by routes and common chunks (above right), it takes barely 3 seconds (on a simulated 3G network).
This was done early on in our performance focus sprints, but this single change made a huge difference when running Google’s Lighthouse web application auditing tool:
Avoid Functions that Cause Jank
Over many iterations of our infinite scrolling timelines, we used different ways to calculate your scroll position and direction to determine if we needed to ask the API for more Tweets to display. Up until recently, we were using react-waypoint, which worked well for us. However, in chasing the best possible performance for one of the main underlying components of our application, it just wasn’t fast enough.
Waypoints work by calculating many different heights, widths, and positions of elements in order to determine your current scroll position, how far from each end you are, and which direction you’re going. All of this information is useful, but since it’s done on every scroll event it comes at a cost: making those calculations causes jank–and lots of it.
But first, we have to understand what the developer tools mean when they tell us that there is “jank”.
Most devices today refresh their screens 60 times a second. If there’s an animation or transition running, or the user is scrolling the pages, the browser needs to match the device’s refresh rate and put up 1 new picture, or frame, for each of those screen refreshes.
Each of those frames has a budget of just over 16ms (1 second / 60 = 16.66ms). In reality, however, the browser has housekeeping work to do, so all of your work needs to be completed inside 10ms. When you fail to meet this budget the frame rate drops, and the content judders on screen. This is often referred to as jank, and it negatively impacts the user’s experience. — Paul Lewis on Rendering Performance
Over time, we developed a new infinite scrolling component called VirtualScroller. With this new component, we know exactly what slice of Tweets are being rendered into a timeline at any given time, avoiding the need to make expensive calculations as to where we are visually.
By avoiding function calls that cause extra jank, scrolling a timeline of Tweets looks and feels more seamless, giving us a much more rich, almost native experience. While it can always be better, this change makes a noticeable improvement to the smoothness of scrolling timelines. It was a good reminder that every little bit counts when looking at performance.
Use Smaller Images
We first started pushing to use less bandwidth on Twitter Lite by working with multiple teams to get new and smaller sizes of images available from our CDNs. It turns out, that by reducing the size of the images we were rendering to be only what we absolutely needed (both in terms of dimensions and quality), we found that not only did we reduce bandwidth usage, but that we were also able to increase performance in the browser, especially while scrolling through image-heavy timelines of Tweets.
In order to determine how much better smaller images are for performance, we can look at the Raster timeline in Chrome Developer Tools. Before we reduced the size of images, it could take 300ms or more just to decode a single image, as shown in the timeline recording below on the left. This is the processing time after an image has been downloaded, but before it can be displayed on the page.
When you’re scrolling a page and aiming for the 60 frame-per-second rendering standard, we want to keep as much processing as possible under 16.667ms (1 frame). It’s taking us nearly 18 frames just to get a single image rendered into the viewport, which is too many. One other thing to note in the timeline: you can see that the Main timeline is mostly blocked from continuing until this image has finished decoding (as shown by the whitespace). This means we’ve got quite a performance bottleneck here!
Now, after we’ve reduced the size of our images (above, right), we’re looking at just over a single frame to decode our largest images.
Make use of the
Here’s an example of a component that was always updating: When clicking the heart icon to like a Tweet in the home timeline, any
Conversation component on screen would also re-render. In the animated example, you should see green boxes highlighting where the browser has to re-paint because we’re making the entire
Conversation component below the Tweet we’re acting on update.
Below, you’ll see two flame graphs of this action. Without
shouldComponentUpdate (left), we can see its entire tree updated and re-rendered, just to change the color of a heart somewhere else on the screen. After adding
shouldComponentUpdate (right), we prevent the entire tree from updating and prevent wasting more than one-tenth of a second running unnecessary processing.
Defer Unnecessary Work until componentDidMount
This change may seem like a bit of a no-brainer, but it’s easy to forget about the little things when developing a large application like Twitter Lite.
We found that we had a lot of places in our code where we were doing expensive calculations for the sake of analytics during the
componentWillMount React lifecycle method. Every time we did this, we blocked rendering of components a little more. 20ms here, 90ms there, it all adds up quickly. Originally, we were trying to record which tweets were being rendered to our data analytics service in
componentWillMount, before they were actually rendered (timeline below, left).
By moving that calculation and network call to the React component’s
componentDidMount method, we unblocked the main thread and reduced unwanted jank when rendering our components (above right).
In Twitter Lite, we use SVG icons, as they’re the most portable and scalable option available to us. Unfortunately, in older versions of React, most SVG attributes were not supported when creating elements from components. So, when we first started writing the application, we were forced to use
dangerouslySetInnerHTML in order to use SVG icons as React components.
For example, our original HeartIcon looked something like this:
Analyzing the flame graphs above, our original code (left) shows that it takes about 20ms on a slow device to mount the actions at the bottom of a Tweet containing four SVG icons. While this may not seem like much on its own, knowing that we need to render many of these at once, all while scrolling a timeline of infinite Tweets, we realized that this is a huge waste of time.
Since React v15 added support for most SVG attributes, we went ahead and looked to see what would happen if we avoided
dangerouslySetInnerHTML. Checking the patched flame graph (above right), we get about an average of 60% savings each time we need to mount and render one of these sets of icons!
Now, our SVG icons are simple stateless components, don’t use “dangerous” functions, and mount an average of 60% faster. They look like this:
Defer Rendering When Mounting & Unmounting Many Components
On slower devices, we noticed that it could take a long time for our main navigation bar to appear to respond to taps, often leading us to tap multiple times, thinking that perhaps the first tap didn’t register.
Notice in the image below how the Home icon takes nearly 2 seconds to update and show that it was tapped:
No, that wasn’t just the GIF running a slow frame rate. It actually was that slow. But, all of the data for the Home screen was already loaded, so why is it taking so long to show anything?
It turns out that mounting and unmounting large trees of components (like timelines of Tweets) is very expensive in React.
At the very least, we wanted to remove the perception of the navigation bar not reacting to user input. For that, we created a small higher-order-component:
Once applied to our HomeTimeline, we saw near-instant responses of the navigation bar, leading to a perceived improvement overall.
const DeferredTimeline = deferComponentRender(HomeTimeline);
Avoid Storing State Too Often
While controlled components seem to be the recommended approach, making inputs controlled means that they have to update and re-render for every keypress.
While this is not very taxing on a 3GHz desktop computer, a small mobile device with very limited CPU will notice significant lag while typing–especially when deleting many characters from the input.
In order to persist the value of composing Tweets, as well as calculating the number of characters remaining, we were using a controlled component and also passing the current value of the input to our Redux state at each keypress.
Below (left), on a typical Android 5 device, every keypress leading to a change could cause nearly 200ms of overhead. Compound this by a fast typist, and we ended up in a really bad state, with users often reporting that their character insertion point was moving all over the place, resulting in jumbled sentences.
By removing the draft Tweet state from updating the main Redux state on every keypress and keeping things local in the React component’s state, we were able to reduce the overhead by over 50% (above, right).
Batch Actions into a Single Dispatch
In Twitter Lite, we’re using redux with react-redux to subscribe our components to data state changes. We’ve optimized our data into separate areas of a larger store with Normalizr and combineReducers. This all works wonderfully to prevent duplication of data and keep our stores small. However, each time we get new data, we have to dispatch multiple actions in order to add it to the appropriate stores.
With the way that react-redux works, this means that every action dispatched will cause our connected components (called Containers) to recalculate changes and possibly re-render.
The best way to illustrate the benefits of batching actions is by using the Chrome React Perf Extension. After the initial load, we pre-cache and calculate unread DMs in the background. When that happens we add a lot of various entities (conversations, users, message entries, etc). Without batching (below left), you can see that we end up with double the number of times we render each component (~16) versus with batching (~8) (below right).
While Service Workers aren’t available in all browsers yet, they’re an invaluable part of Twitter Lite. When available, we use ours for push notifications, to pre-cache application assets, and more. Unfortunately, being a fairly new technology, there’s still a lot to learn around performance.
Unfortunately, this can be a burden when users come back to the application and need to re-download a bunch of script files just to view a single Tweet.
In ServiceWorker-enabled browsers, we get the benefit of being able to have the worker automatically update, download, and cache any changed files in the background, on its own, before you come back.
So what does this mean for the user? Near instant subsequent application loads, even after we’ve deployed a new version!
As illustrated above (left) without ServiceWorker pre-caching, every asset for the current view is forced to load from the network when returning to the application. It takes about 6 seconds on a good 3G network to finish loading. However, when the assets are pre-cached by the ServiceWorker (above right), the same 3G network takes less than 1.5 seconds before the page is finished loading. A 75% improvement!
Delay ServiceWorker Registration
In many applications, it’s safe to register a ServiceWorker immediately on page load:
While we try to send as much data to the browser as possible to render a complete-looking page, in Twitter Lite this isn’t always possible. We may not have sent enough data, or the page you’re landing on may not support data pre-filled from the server. Because of this and various other limitations, we need to make some API requests immediately after the initial page load.
Normally, this isn’t a problem. However, if the browser hasn’t installed the current version of our ServiceWorker yet, we need to tell it to install–and with that comes about 50 requests to pre-cache various JS, CSS, and image assets.
When we were using the simple approach of registering our ServiceWorker immediately, we could see the network contention happening within the browser, maxing out our parallel request limit (below left).
By delaying the ServiceWorker registration until we’ve finished loading extra API requests, CSS and image assets, we allow the page to finish rendering and be responsive, as illustrated in the after screenshot (above right).
Overall, this is a list of just some of the many improvements that we’ve made over time to Twitter Lite. There are certainly more to come and we hope to continue sharing the problems we find and the ways that we work to overcome them. For more about what’s going on in real-time and more React & PWA insights, follow me and the team on Twitter.