Performance Optimizing A React Single Page App — Part 2
This is a brief followup to my past article on the same subject.
I’ve been working with React for nearly 12 months, full-time for the last 8, and over that time have learned a ton about performance optimization and how it relates to functional programming concepts. In my past article, I chose to leave out these ideas to make the article specific to React, but lately have been itching to write about this stuff. Hopefully you will find this article and the linked resources useful!
Immutable Data
In general, immutable data structures are less performant than their mutable brethren. However, there are several incredible performance enhancements that are completely dependent of the use of immutable data structures.
According to Lee Byron, the author of Immutable.js, a mutable push operation with a million items was benchmarked at 83 ms, whereas an immutable data structure (without CS magic) was benchmarked at 288ms.
It might seem like immutable data structures are doomed, but I assure you they are not. The idea here is not to think in terms of micro-optimizations, but rather to attempt to eliminate the most computationally expensive operations in a UI based application, which of course is rendering and change detection. Below we will examine how immutable data structures can be used to our benefit to performance optimize the React rendering pipeline.
Structural Sharing
Six months ago, I came across a very famous white paper on Persistent Immutable Data Structures, written as a thesis by Chris Okasaki. This paper, written in September of 1996 is way ahead of its time. It took me a whole 6 months to read through it and understand it, but it was well worth it. This white paper holds many of the secrets that I will discuss here, so it is well worth the read if you want to take a deep dive into this subject.
Thanks to Lee Byron, we now have access to persistent immutable data-structures in JavaScript. In Lee’s Using Immutable with React talk at devConf, he breaks down a key architectural choice of his Immutable.js library: structural sharing.
By using an Directed Acyclic Graph (DAG) data structure, namely an indexed Trie, immutable data structure mutations can share the structure of the initial data and only copy over the nodes that were changed in the operation. One of the main benefits of using a Trie is that we can perform efficient graph traversal algorithms, specifically Depth First Traversal, which gives us essentially the same performance as an Array lookup.
Lee’s Immutable library uses these performance enhancements behind the scene’s while maintaining a similar API to JavaScript Arrays. Using similar data structures, such as a Hash Trie, we can also mimic the behavior of JavaScript objects with Immutable data structures.
We have yet to talk about how this benefits a React application, but if you bear with me we will get there shortly. The important piece to take away is that using computer science, we can optimize immutable data structures to have a similar performance overhead to their mutable counter-parts.
Memoization
Another fantastic performance optimization that we can deploy is memoization. There is one important concept that you first must grasp before memoization can be use effectively and that is referential transparency. Often referred to as pure functions, a function is said to be referentially transparent if it can be replaced with its corresponding return value without changing the software’s behavior. Put differently, given the same input, you will always get the same ouput.
By maintaining this rule, we get a very convenient side-effect. Not only do pure functions make it easier for humans to reason about their functionality, but they also make it easier for tooling to reason about them. Memoization is one of the main performance enhancements that we can use when we employ pure-functions.
Memoization, according to Wikipedia, is defined as an
…optimization technique that can be used to speed up computer programs by storing the results of expensive function calls and returning the cached results when the same input occurs again.
With pure-functions we can always be assured that given the same input, a function will return the same output, so we can use our tooling to speed up our applications by storing the results of these functions in memory. Any time the same function is called, instead of re-running the function, a potentially expensive operation, we can instead use the cached value.
The problems with Mutable state
The main problem with mutable state is that it makes it very hard to keep track of how your values change over time. It makes it very hard to keep all of your code in your head, which leads to bugs and the loss of performance.
For me, when I gave in to immutable data, I had a pink cloud over my head for months. I told all of my friends and pushed it on everyone, trying to get them to switch to this paradigm. The reason for this is that I had eliminated the biggest source of bugs and complexity, all in one fell-swoop. It made my code easier to write, easier to maintain, and cut runtime exceptions almost completely out of my programs.
All this is really theoretical, so why don’t we take a look at how React uses these optimization techniques and also talk about how these ideas can be used in other domains.
Putting it to Good Use
There are a few significant performance gains that we can get right out of the box by employing immutable data structures and memoization.
For one, using the React.PureComponent (formerly the pure-render-mixin), along with immutable data, we can optimize the rendering of our entire component hierarchy. The default implementation to the shouldComponentUpdate lifecycle method in React is to return true always.
What this means is that by default, React will continuously update your UI anytime data in your entire application changes. In fact, this has been a very difficult problem in the past in web frameworks. That is, how do they keep the UI updated with data when it changes. Angular 1 had the digest cycle, along with two-way data binding. With React, our views are pure functions of data, so why do we have to be constantly re-rendering? Can’t we use memoization?
By comparing the data (props and state) in the componentWillUpdate function, we can optimize the entire rendering cycle of our React applications. The only problem, however, is that the equality check to determine if our data has changed at the component level is only possible with immutable data. With mutable data, we miss out on an incredible performance optimization because it is impossible to guarantee that our shallow-check is accurate.
According to the React Documentation, the React.PureComponent
only shallowly compares the objects. If these contain complex data structures, it may produce false-negatives for deeper differences. Only mix into components which have simple props and state, or use
forceUpdate()
when you know deep data structures have changed. Or, consider using immutable objects to facilitate fast comparisons of nested data.
With immutable data structures, we can avoid expensive re-renders by doing a shallow comparison of the old vs. new state within components and avoiding re-rendering entire branches of the component hierarchy when no data has changed.
Another huge performance gain can be seen when using Redux, along with Reselect. Before we get into that, let me first just say that when using Redux, it’s incredibly important to structure your state correctly. According to the Redux documentation
For maximum rendering performance in a React application, state should be stored in a normalized shape, many individual components should be connected to the store instead of just a few, and connected list components should pass item IDs to their connected child list items (allowing the list items to look up their own data by ID). This minimizes the overall amount of rendering to be done.
Now, back to our big performance optimization. By combining the idea of having a normalized state tree with the subject of memoization, we can seriously optimize the rendering pipeline of our React Redux applications. Because much of the performance drain in an application comes from the expensive rendering cycle of the UI, especially when complex computation to the state needs to occur, it’s not hard to see how memoization can be a killer performance enhancement.
By applying the Reselect library to our React Redux applications, we can avoid complex re-rendering cycles by caching the output of the derived data manipulations. It is often the case that the structure of our data in the Redux store doesn’t completely match what we need for the UI. For example, we may need to combine values into a template string, adding labels, or we may need to filter or sort a list in the UI. Rather than calculating this derived data for every render, we intelligently cache it using a selector.
I am not going to give an example here, but I implore you to read the Redux documentation and also take a look at this article on the same subject, as it contains some awesome examples.
The final benefit, of course, to using Redux and Reselect is how darn easy it is to keep track of what is happening in your application. A picture tells a thousand words.
Summing it Up
Hopefully this article has convinced you that it’s worth it to use immutable data and memoization, along with pure functions. Not only do these concepts make your programs easier and more fun to write, but they have exceptional performance benefits.
Up until a year ago, I had never considered all of the implications of using functional paradigms in my projects. Now, there is just no way that I can go back to writing applications with mutable state and imperative mechanisms. If you are working in a domain that is still predominantly imperative, I implore you to consider all of the benefits of the functional paradigm. If the JavaScript ecosystem can transition, any language or domain can.
Coming Next
In Part3, I will take a deep dive into how we can optimize server round-trips using GraphQL and some intelligent tooling. The gist is that by collocating data-fetching with GraphQL, we can optimize data-fetching to the extreme. I will look into how Facebook does this with Relay and the similarities and differences to how ApolloStack optimizes GraphQL data-fetching.