Optimizing Memory Usage in Single Page Apps: A Kustomer Case Study

Published in

Kustomer Engineering

11 min readOct 17, 2023

How high your frontend web application’s memory gets and how that impacts your users comes down to many questions such as:

Does navigating throughout your application force users to fetch, process, and store large amounts of data?
Does your application allow users to customize their experience with third-party integrations, custom content, etc.?
How much time do users spend in your application? Is it 8 minutes or 8 hours?
Do your users typically have a large amount of available RAM on their machine (8 GB+)?

For Kustomer’s frontend monolith SPA built with React and Redux, the answers to the first two questions are YES. Our users are depending on Kustomer to perform their customer support duties during their shifts using machines that do not have a plethora of RAM available.

We were receiving more and more complaints that our web application was gradually getting slower as their work day went on. We even had some reports of the app freezing and users encountering the dreaded “Page(s) Unresponsive” pop-up.

We first thought that if we can just find and fix one or two leaks, then all our users should see an immediate improvement. We took hundreds of heap snapshots and scrutinized all of the event listeners, timeouts, global variables and other common culprits of memory leak in our JavaScript codebase. However, in our case, we ended up finding memory optimizations that were not so obvious to us in the beginning. Read on to find out what those optimizations are and how we went about discovering them.

Getting Data for Analysis

Before embarking on the hunt for memory leaks, we first needed to establish a reliable method for quantifying the impact of our code changes. To achieve this, we planned to measure our app’s memory usage over the course of a user’s session and log that data for subsequent analysis. At this initial stage, pinpointing the source of any leaks wasn’t our primary concern; rather, we sought to gather raw numbers that would serve as a baseline for assessing user experience in terms of memory usage.

How did we track these metrics? We used Datadog RUM (Real User Monitoring) for monitoring web performance stats and sent each memory log as a “custom event”. We then defined custom attributes on events that allow us to aggregate data and segment data by company, browser, available RAM, etc. In theory, any other RUM tool should do, but here is an example code snippet using Datadog’s RUM SDK:

import { v4 as uuidv4 } from 'uuid';

const FIVE_MINUTES = 1000 * 60 * 5;
const MEASURE_INTERVAL = FIVE_MINUTES;

export function measureMemory() {
  if (window.navigator?.deviceMemory) {
    // note this estimate is capped in the browser at 8GiB to protect user privacy of high-end devices
    // @ts-expect-error this is just a safeguard for unexpected browser behavior
    const memEst = parseFloat(window.navigator.deviceMemory);
    // eslint-disable-next-line no-restricted-globals
    if (!isNaN(memEst)) {
      const jsHeapSizeLimit = window.performance?.memory?.jsHeapSizeLimit;
      const totalJSHeapSize = window.performance?.memory?.totalJSHeapSize;

      const usagePct = totalJSHeapSize && jsHeapSizeLimit ? (totalJSHeapSize / jsHeapSizeLimit) * 100 : undefined;
      const memoryObj = {
        ram: memEst * 1024 * 1024, // RAM in KiB
        heap: {
          jsHeapSizeLimit,
          totalJSHeapSize,
          usagePct,
        },
      };

      window.DD_RUM.addAction('MemoryMeasureEvent', memoryObj);
    }
  }

}

// any function that your app calls when it initializes
const initializeApp = () => {
  // DD_RUM attached to window via cdn script
  window.DD_RUM.setDatadogGlobalContext('usr.id', 'someUserId');
  setInterval(() => measureMemory(), MEASURE_INTERVAL);
};

Now we have a log that will log some browser memory metrics every 5 minutes during a user’s session using the Performance API (note: this API has since been marked as deprecated, but the replacement measureUserAgentSpecificMemory is experimental. Without going too deep into it, it’s not compatible with our app’s configuration).

To summarize the attributes that we were logging:

jsHeapSizeLimit (bytes) — total bytes available to your application in the user’s browsing context.
totalJSHeapSize totalJSHeapSize (bytes) — total heap size used by your application in the browsing context.
usagePct — This is equal to totalJSHeapSize / jsHeapSizeLimit. This was useful since we could not reliably log when a browser tab crashes due to memory. However, we could see that if users had a usagePct >= 90%, then we knew they were getting close to a crash.

With our logging deployed, we are able to build dashboards in Datadog that charted these metrics over time. Here are examples of success metrics that we used:

median (p50), p75, p90, p95 and p99 aggregates for jsHeapSizeLimit:
– This raw byte number was the metric that we started with. It would peak in the late afternoon/early evening, which made sense based on the reports of our app getting slower as the day goes on.
– Rolled up for a given day, our p75 and p90 values started at ~300MB and ~500MB respectively in the morning. Max values were over 1 GiB by the evening.
– As we made our initial memory patches, we saw improvements in the lower percentiles, but the p90+ aggregates remained very high.
Number of users that logged a MemoryMeasureEvent with usagePct > 90. We also referred to this as the “number of potential crashes” since a high percentage indicates the user is close to experiencing an out of memory crash.
Percentage of users that logged a MemoryMeasureEvent with totalJSHeapSizegreater than a goal number.
– For example, if greater than 60% of your users are logging heap sizes above 410 MiB each day, then we should expect that percentage to gradually decrease as we release optimizations.
– Tracking this percentage was an effective way to observe that we were at least making incremental improvements in our application’s memory usage.
– You might also consider segmenting your users by your own custom attributes to isolate if a specific group of users with a specific configuration is using more memory than others.

Diagnostic Tools

We established our metrics and now have live data that indicates how much memory our app is using. What are we doing to actually find out what’s taking up memory?

Heap Snapshots

Taking heap snapshots while testing in your development environment is the classic approach to diagnosing memory issues. Unfortunately, when you are dealing with a highly customizable application that fetches a lot of data and loads third-party integrations, manually taking heap snapshots becomes less practical for a few reasons:

Taking a single heap snapshot after initially loading our application took about 5 minutes.
If we wanted to take multiple snapshots in sequence and then compare them, then the whole process can take 15+ minutes.
Snapshots are unlikely to be consistent unless you can guarantee the same navigation and access patterns you are trying to test.
It’s even harder to simulate the exact conditions that your users are under such as network performance, navigation patterns, custom app configuration, how much data they are fetching, etc.
We had spent hours diving into 300MB+ snapshots and it was very difficult to determine if something was being held in memory due to a leak or if a feature in our application depended on that data

Analyzing heap snapshots in DevTools is a legitimate way to diagnose memory issues, but it was not practical for the scale of our application.

Memlab

We tried memlab next, which is an open source CLI tool that provides tools for heap snapshot analysis. Memlab’s features were a game changer for us since it greatly simplified heap analysis for us. The features we used were:

Passing the heap snapshots we already took using the memlab find-leaks command. The CLI would then return a list of leaked objects, their allocation stack, and how much memory those objects occupy:

MemLab found 1 leak(s)
--Similar leaks in this run: 4--
--Retained size of leaked objects: 2.3MB--
[Window] (native) @33651 [2.3MB]
  --20 (element)--->  [InternalNode] (native) @216691968 [2.3MB]
  --8 (element)--->  [InternalNode] (native) @216691168 [2.3MB]
  --1 (element)--->  [EventListener] (native) @216563936 [2.3MB]
  --1 (element)--->  [V8EventListener] (native) @216563776 [2.3MB]
  --1 (element)--->  [eventHandler] (closure) @160711 [2.3MB]
  --context (internal)--->  [<function scope>] (object) @176463 [2.3MB]
  --myArrayOfElementsWithListeners (variable)--->  [Array] (object) @176465 [2.3MB]
  --elements (internal)--->  [(object elements)] (array) @176490 [2.3MB]

It’s recommended that you take the snapshots in a non-minified build of your application so that the allocation stack uses names that are recognizable in your code or dependencies.

You can define E2E test scenarios to open an initial page in your application, navigate to another page and/or perform some action and navigate back to the initial page. Memlab takes the heap snapshots and runs the leak analysis for you. You can also run the tests in CI and potentially catch new leaks somewhere in your release pipeline.
Memlab also comes with analysis plugins that go beyond “leaks” and check your heap snapshots for the largest objects, duplicated objects, large arrays, global variables and much more.

With memlab, we were not just able to spot leaks, but also just inefficient memory usage in general. We shall expand on what we mean by “inefficient memory usage” in the next section.

queryObjects

Chrome’s Console Utilities API contains the queryObjects(constructor) function, which returns an array of objects created with that constructor. If you know or suspect which class of objects might be leaking, then you can open your Chrome console and run this function to verify if the array output is growing.

For example, if you suspect IntersectionObserver objects are being created and not garbage collected later on, then you can run queryObjects(IntersectionObserver) in the console after when you expect those IntersectionObservers to be disconnected. If the array keeps growing, then you might have a leak. If you are adding new IntersectionObservers or ResizeObservers to your code, then queryObjects can also help you manually test that these objects are being garbage collected later on.

You do have to know the constructor you are trying to fix a leak for, but this is a great way if you need a faster feedback loop when manually testing instead of painstakingly taking more heap snapshots.

Tip: Clear your console before running this function. Objects from previous queryObjects calls are still retained for inspection in the console.

Results

Using the diagnosis and analysis tools described above, we were able to find and fix a melange of memory issues and observe their impact. We did not see an immediate improvement in our metrics, but as we chipped away at it, we did start to see the metrics slowly come down.

The first leaks we solved

Turning off lazy loading for email attachment thumbnails and user avatars in a Conversation thread because of this Chromium bug. We later used this as motivation to change how we were fetching avatars when users opened a Conversation thread.
Setting an interval to clear the console every 10 minutes. Objects logged to the console would be retained in memory for later inspection, even if the user never opens the console. Although we avoid deploying console.logs to our own code, we allow our users to embed custom integrations into their instance of Kustomer, which might have large logs of their own.
Removing unneeded data from large responses from the Redux store. Cleaning up Redux helped, but users reporting the highest heap sizes ironically had relatively small Redux store sizes (< 30 MiB).
Updating our API responses to omit fields that users did not need for the web application so that users were downloading less data over the network. This also helped improve the latency of these API’s, so win-win.
Upgrade react-intersection-observer to the latest version. We were on an older version of this library and upgrading made retained IntersectionObservers go away.
We had a single Google Chart chart mounting on every Customer load and it was not being unmounted. Calling clearChart when unmounting this component fixed the leak.

We saw a modest decrease of about 75 MiB in p90 aggregate heap usage and a 10 percent decrease in users going above our goal of 410 MiB in a given week.

For all the work we put in, we were expecting a much sharper decline, but our data still showed users reporting heap sizes way above 500 MiB.

Discovering Inefficient Memoization

At this point, memlab traces did not report any obvious leaks. Going beyond the leak analysis tools, we tried running memlab’s analysis plugins, particularly their plugin for finding duplicate strings in a heap snapshot. We found something new that piqued our interest. We recognized the duplicated strings as either stringified slices of our Redux store OR parameters from our usage of lodash memoize for some of our helper functions.

We also memoize slices of our Redux store using the reselect library when deriving state data in our components, so memoization as a culprit to our memory issues makes sense. We memoize functions in our SPA for two reasons: to cache the result of potentially long running operations like scanning an array OR to prevent rerenders in our React components by returning objects that are equal by reference and using them as component props. Where did we go wrong then?

Memoizing functions with high cardinality. We had a few older memoized functions where the input parameters were the text from an email body. There’s no point in memoizing a parameter that is going to have a different value every time a user opens a different email in our platform. In these cases, we were able to refactor the function to use less parameters or remove memoization entirely.

Creating a memoized Redux selector when we did not need it. Let’s take this example Redux store:

{
    users: {
        user1: {
            name: "Leonardo",
            lastLogin: "2023-06-10T00:00:00Z",
            // ...bunch of attributes for users
        },
        user2: {
            name: "Donatello",
            lastLogin: "2023-06-12T00:00:00Z",
        }
        // ...potentially thousands of more users
    }
    // ...more state slices
}

Our codebase had dozens of selectors like this:

import { createSelector } from 'reselect';

const selectUsers = (state) => state.users;
const selectPropUserId = (_, props) => props.userId;

const selectUserById = createSelector(selectPropUserId, selectUsers, (userId, users) => users?.[userId]);

By using createSelector, we were memoizing the last result and its input parameters, which means we were essentially duplicating the entire slice in memory with this selector. Memoization is unnecessary since getting a value in an object by its key should take constant time (O(1)) and the underlying result should return the same object (e.g. not a copy) when passed to a component as props. For some businesses using our application, this slice of the store was a few megabytes large to begin with, so we had the theory that all these selectors were making memory usage snowball.

Our solution was to go through our codebase and simply rewrite selectors without createSelector if we did not need the memoization:

export const selectUser = (state, props) => {
    const userId = selectPropUserId(state, props);
    const users = selectUsers(state);
  
    return users?.[userId];
  };

After deploying these changes, we saw significant decreases in memory usage across the board. After deploying a single change, we observed a decrease in 40 MiB of aggregated heap sizes across our users. After the end of one of our development cycles, we brought down p90 memory usage by around 120MiB total. The “potential number of crashes” metric that we mentioned earlier dropped from hundreds per day to less than 50. There was no memory leak bug with reselect or lodash memoize. These libraries were working as expected and just doing what we told them to do, which was to store large unbound parameters in memory when called.

While we solved some relatively small memory leaks, we later learned that these leaks were eclipsed by our inefficient memory usage from our use of memoization libraries. Kustomer is a single page application with many brilliant features released over the years. In parallel, customers onboarding to our system were handling higher volumes of support inquiries while pushing the limits of what Kustomer can do. It made sense that some inefficiencies did not become apparent until later on. We also learned a lot from our memory investigations including new memory analysis techniques and what metrics made sense for measuring our effort’s success so that we were not flying blind. With our new tools and experience, we have a stronger understanding of browser memory usage than ever before.

Further reading and resources about browser memory and web performance that helped us write this article:

Memlab

Chrome Console Utilities

Memory leaks: the forgotten side of web performance

How to Metric

Memoization Forget-Me-Bomb