Improving the performance of updates to large objects in a Redux store

Published in

Accurx

13 min readNov 17, 2023

Photo by Linda Perez Johannessen on Unsplash

Recently at Accurx we faced an interesting performance challenge: how can we keep frequent updates to an object in a Redux store performant as the number of values in that object grows large (on the order of 10s of thousands). We came up with a solution that we’re pretty happy with and wanted to share. But first, some context.

A “Unified Inbox”

Accurx is a communication platform for healthcare. Our products are used by healthcare professionals to communicate with their patients, and with each other, across a variety of care settings. An important part of any communication platform is some form of “inbox”. For historical reasons, as a result of the way the company has grown, Accurx has three:

a desktop application, written in WPF, used by GPs to communicate with their patients;
a second, written in React, used by healthcare professionals in secondary care and other care settings to communicate with their patients;
and a third, used by healthcare professionals to communicate with each other.

Having to maintain three separate inboxes is obviously far from ideal, so we set out to build a single “Unified Inbox” that would satisfy the needs of healthcare professionals across all care settings. This inbox was to be written in React and made available as a desktop application via WebView2.

The challenges of managing state

Our new “Unified Inbox” has a lot of state to manage: some of our users can have tens of thousands of unread conversations, all of which — due to our backend architecture — we have to keep in client state. (To minimise risk, a constraint of the project was that we had to make as few changes to our backend architecture as possible. This will likely change in future.)

Not only could the amount of state grow large, but — as the inbox could be shared between all users at an organisation, all of whom can send, reassign, or respond to messages — the frequency of updates (coming in via both API responses and websocket events) can be high.

We had already seen the performance degradation this could cause with our WPF desktop application: certain actions in inboxes with very large numbers of unread conversations could be noticeably more sluggish. We needed to avoid replicating this problem.

We needed a way of managing state that would be performant as the number of conversations in the state grew.

Why Redux?

I think it’s fair to say that Redux has gone out of fashion. React has better tools for managing state in useState and React Context, and libraries like React Query provide a much cleaner way of interacting with APIs. At Accurx, we are, in fact, moving away from Redux in most of our codebase. So why did we opt to use Redux for managing our inbox state?

Initially, we didn’t. We wrote our own code for state management using RxJS — an Observables library similar to the one used in our WPF desktop application. Unfortunately we found that this led to some very complex frontend code that was both hard to reason about and hard to onboard new engineers to. The code’s behaviour was already inherently complex; the addition of a significant chunk of accidental complexity pushed it into the “this is going to be a maintenance nightmare that people are scared to make changes to” category.

How can we make complex logic easier to reason about? One great way is to model it as pure functions over immutable state, which is exactly what Redux asks you to do.

Redux also has the advantage of being familiar to most frontend engineers, having a simpler mental model, and having great dev tools that make inspecting the state much easier.

Even though it felt like a good fit, we were a bit hesitant to use Redux at all. We were, after all, trying to move away from Redux elsewhere, and it has a tendency to get its tendrils into everything. However, we had recently discovered that it was possible to create private Redux stores in such a way that parts of our app could use Redux internally without this implementation detail leaking out to the rest of the app. It felt worth a spike, at least.

The spike ended up being a success, and we went on to rewrite our new inbox’s API interaction code to use Redux, removing RxJS in the process. We’ve ended up with a significantly smaller amount of significantly easier-to-reason-about code, and were able to fix some bugs and performance issues in the process.

However, one challenge still remained: due to the Redux model of pure reducer functions operating on immutable state, certain operations — in particular, updating values in a large object — did not have the performance characteristics we needed.

The problem

The core of the inbox’s state is a Javascript object containing all the conversations the client knows about, keyed on ID — which I’ll describe as our “map” of conversations. As described above, the number of conversations the client knows about can get very large — into the tens of thousands. Each time an update to one of these conversation is received we update its value in this map. This could happen multiple times a second as a result of websocket events coming in in the background.

The data in the Redux store is immutable (or at least we have to treat it as if it is, because Javascript doesn’t have proper immutable collections 😢). This means that whenever we add or update anything in the map of conversations, we have to create a new copy of that map. We don’t have to copy everything inside it — e.g. the content of the conversations — because they haven’t changed. We just have to create a new, empty map, and one by one copy the keys from the old map into the new one. In Javascript, we do this like this:

const newConversations = {
    ...oldConversations,
    [updatedConversation.id]: updatedConversation
}

What’s the problem here? Well, because that spread operator copies every element from the old map into the new one, one by one, it means that updates to a conversation have O(n) time-complexity as the total number of conversations in the store grows.¹

Why is this bad? It means that updating the map of conversations will take twice as long when there are twice as many conversations in the state, ten times as long when there are ten times as many conversations in the state, etc etc. Which means that the inbox will become sluggish for users with lots of conversations in their state, and it will get more sluggish the more conversations get added to the state. Even though with most updates we’re likely only adding or updating a single conversation, the time it takes to do so is proportional to the total number of conversations in the store.

Here’s an example — imagine that the circle labelled store is the map of conversations, and the circles containing an ID of the form c-xxx are individual conversations.

When an update for c-125 comes in (denoted as c-125') we have to create a new store object (store') and copy each of the conversations over to it in turn.

That’s 1, 2, 3… 8 operations to copy the conversations from the old store to the new. Not a big deal at this scale, but it becomes a big deal at a large, busy organisation, when you have tens of thousands of conversations in the store and websocket updates are coming in every few seconds (or even more frequently).

What can we do about this? As the vast majority of the map is unchanged when a single conversation comes in, it would be nice if we could just leave it in place and not have to copy it at all.

How do functional languages deal with this?

In functional languages like Clojure and Haskell, where collections are fully immutable, this is pretty much exactly how maps (and vectors) work. When you add or update an element in a Clojure map, it leaves the original map in place, and creates a new map that contains just the key and value you added along with a pointer to the old map for the rest of the keys (this is a very simplified version, but the gist of it is true).

Clojure can do this because its collections are immutable — the contents of the old map can never change, so it’s always ok for the new map to point directly to it. This also makes Clojure’s immutable datastructures (called “persistent” datastructures) much more memory-efficient.

If only we had something like that in Javascript 😢.

Well, we don’t², but we can kind of hand-roll something that sort of, if you squint, gets some of the same benefits in a kind of semi-adjacent way except not really. But it works.

A nested/”sharded” map

Thinking about all of this, we landed on an idea: instead of keeping one big map of conversations keyed on ID, we could keep a nested map, two levels deep, where the keys are two digit strings and the values are maps containing all conversations whose IDs end with those two digits, keyed on ID.

{
  "23": { "c-123": {...}, "c-223": {...} },
  "24": { "c-124": {...}, "c-224": {...} },
}

How does this help? Well, here’s what the example above looks like with this change (using just the last digit of the conversation ID as the top level key for simplicity):

This time, when an update comes in and we have to create a new copy of the store, we only have three keys to copy. However, we also have to make a copy of the nested map 5. When we create a new copy of 5 (denoted 5’), we have three keys to copy again. 3 + 3 = 6, which yes, is less than 8, but it’s not a huge difference.

But this is for a very small map of conversations. As the number of conversations in the map grows and grows, the savings of this approach become bigger and bigger. This is because this new approach is (kinda, sorta, not really) O(log n) as the number of conversations in the state grows.

With 10,000 conversations in the store, we’ll have an object with ~100 keys pointing to objects which themselves contain ~100 keys. Updating one conversation in this nested object would require copying the keys of the object it’s in (so 100 keys) as well the keys of the top level object (another 100 keys). This makes for a total of 200 keys copied, instead of the 10,000 keys that would need to be copied in a single-level object of conversations.

And it works! Here’s a graph to prove it. The y axis is the time it takes to update 100 conversations in the store. The x axis is the number of conversations already in the store when we perform these updates (ranging from 10 to 1000). The red line is the original approach, with one big map of conversations, and the blue line is the improved approach, with a nested map.

Key takeaways

There are a couple of takeaways from this graph: one is that we start to see the benefits of this new datastructure very quickly, with only a couple of hundred conversations in the store. The other is that our new datastructure is also more memory-efficient — see how the red line starts getting very jagged about halfway through? That’s because after that point, it’s using up enough space to trigger a garbage collection, which slows everything down. (We also manually trigger a garbage collection in between each run to try to make the timings as consistent as possible). No such problems on the blue line — at least not yet.

The only real downside of this new nested datastructure is that it makes our code a bit more complicated and harder to understand. This is often the case with performance improvements — and it’s why we should be sure that our performance improvement is worth it before sacrificing the simplicity and understandability of our code. In this case it’s clearly worthwhile, and we’ve minimised the cognitive cost of the complexity by creating some helper functions for manipulating our nested map that hide its implementation details from all but the curious.

One other nice thing about this approach — we don’t do this by creating a custom collection class; it’s just a normal Javascript object (albeit in a different structure), and some functions to make working with them easier. Because they’re normal javascript objects, they’re serializable, so our Redux state remains serializable, which keeps the door open to SSR and things like that later down the line.

Why the last two digits?

Why do we use the last two digits of the conversation ID? Because this is most likely to give us a uniform distribution of conversations — i.e. a roughly equal number of conversations in each group. If you take the numbers between 100 and 250 and group them by their first digit, you’ll have 2 groups, one of 100 and one of 50. If you group them by their last digit, you’ll have 10 groups, each with 15.

We decided to take the last two digits based on a back-of-a-napkin calculation that with 10,000 conversations in the store, using the last two digits would give you 100 maps of 100 conversations each. This feels pretty balanced and roughly maps onto our expected larger usage sizes. If we were to use the last three digits, we’d have 1000 maps of 10 conversations, which gives us much less benefit. We’d need 1,000,000 conversations in the store for the tree to be balanced — i.e. 1000 maps of 1000 conversations each. As this seemed very unlikely, we figured that using the last two digits was the sweet spot.

How did we measure performance?

We didn’t find any great tools to help us figure out the time complexity of our code, so the graph above was generated in a very rough-and-ready fashion, using a jest “test” that repeatedly called our reducer with increasingly large store sizes, printing how long it took each time. Then we copy-pasted these numbers into Google sheets and used it to draw a graph. Quick and dirty, but worked for our purposes.

Two things of note we ran into while doing this:

We use factory functions to generate test data based on our zod schemas (using the zod-mock library). Our tests initially ran very, very slowly, even though the timings they were logging were much quicker. We eventually realised that generating huge amounts of data with the factory functions was what was taking up most of the time, so we changed our code to read a pre-generated set of factory data that we had written to a JSON file. (Having all our factory functions automatically write to/read from JSON files might be an interesting avenue for us to explore in the future if we find they’re significantly slowing down our test suite as a whole.)

Our initial attempts at drawing a graph were very jagged and all-over-the-place, because it was unpredictable when garbage collection would be triggered (which would dramatically slow down the run on which it happened). To mitigate this, we ran the test with the --expose-gc flag like this: node --expose-gc node_modules/.bin/jest and then manually triggered garbage collection before each test run using global.gc();. This made our graphs much smoother.

Conclusion

Redux (and more broadly, modelling things as pure functions over immutable data) can make complex logic easier to reason about and easier to test. However, in languages without first-class support for immutable datastructures, this can come at the cost of a hit to performance. Most of the time in frontend code this isn’t an issue. In the cases where it is, a few tweaks to how your structure your store can have a big impact. In particular, “sharding” a large object on an evenly distributed key can improve update performance significantly at minimal cost to read performance.

For those who haven’t come across Big O notation before, it’s a high-level way of talking about the performance characteristics of code, rather than the raw numbers (which aren’t as helpful as they will be e.g. very machine dependent). It describes the shape of the graph where the X-axis is the size of the input and the Y-axis is either the length of time it takes your code to run with that input (for time-complexity) or the amount of memory used by your code with that size of input (for space-complexity). Copying a Javascript object has O(n) time-complexity,which means that the graph of time taken as the object size grows will be linear: a straight line.
Well, technically we do, via Immutable.js, a library that implements immutable, persistent data structures in Javascript. We did consider using Immutable.js, but decided against it for a number of reasons. First, Immutable.js requires you to use its own collection types, which would then either bleed through the whole app (or require clear boundaries where they were converted from and to native javascript data types (a process which could itself be slow)). Our assessment was that the cognitive load of using a whole new set of datastructures would likely be more than having to understand a few key performance optimisations where they’re needed. Second, because it’s good practice to keep things serializable in the Redux store (it enables SSR, and also makes things simpler when using Redux DevTools etc), and we’d have to either give this up or convert the ImmutableJS datatypes to javascript objects every time we added them to the store, defeating their purpose. And third, because Redux toolkit already uses a library for immutable javascript called “Immer”, which works by wrapping normal javascript objects in a proxy that lets you make updates that look like they mutate the object, but actually are transformed into immutable updates under the hood, and it felt better to go with the grain of using Immer rather than against it by either disabling it and using Immutable.js in its place, or trying to use both at once…