Reacting to Change: Tale of a Web Developer Improving Startup Performance for a large React Native App | udaan

Two years into my stint at udaan, I was tasked with “improving the startup performance” of our react-native app. I asked myself, what does that even mean and how do I even begin tackling this?

Published in

engineering-udaan

29 min readAug 15, 2023

AI-Generated image of a person scratching their head while looking at a large billboard with React’s atom logo at night with a starry sky

Join in my journey of understanding react-native’s startup flow, identifying bottlenecks, using simple optimization techniques, discovering common pitfalls with startup time telemetry, and realizing the importance of perceived performance. This isn’t a “one trick to boost your app performance by 10x”, but rather a long journey of figuring things out, finding patterns in a mess, measuring impact and delivering a much improved experience to your users. My hope is that this helps you build intuition and tackle daunting problems with your software, and at the same time learn from the larger community about similar challenges faced by developers in the react-native ecosystem.

It’s a long one, but don’t worry, there’s a tl;dr at the bottom.

Background

I know, as a web developer, what optimization of page load performance feels like. You go to web.dev/measure, get a score for various metrics, and follow the recommendations mentioned over there. The usual suspects are large asset files, lack of good compression (brotli/gzip), lack of code splitting, lack of appropriate resolutions for images, lack of font compression, lack of inlining of assets, lack of inlining of HTML (SSR), lack of preloading/prefetching, lack of a CDN, lack of optimization of waterfalls, and so on.

If you squint hard enough, everything I just mentioned is basically trying to fit as much relevant information to that page’s initial render in the tiniest network footprint. Just taking care of that usually takes care of everything else. Even alternatives to ReactJS, like Preact, are touted for their 3kb size, and not necessarily for their runtime performance improvements.

Fast 3kB alternative to React with the same modern API
— Preact’s tag line

Of course, there are caveats to the above, you do find fine-grain reactivity winning over lately thanks to efforts by the team behind Svelte and SolidJS. You also see a plethora of ReactJS-like alternatives that regularly outperform the runtime performance of ReactJS. In fact, ReactJS without hooks stands at ~95th position as of the time of writing on Krausest’s js-framework-benchmark, while ReactJS with hooks is further behind, yet that’s the default choice.

Speaking of React, React Server Components, the next big thing in the React world also tries to primarily solve for the network bandwidth and bundle sizes.

Put another way, the server, far more powerful and physically closer to your data sources, deals with compute-intensive rendering and ships to the client just the interactive pieces of code.
— Understanding React Server Components

This tells us that improving the roundtrip performance matters far more when it comes to the web world rather than improving runtime/interactive performance. Maybe it’s thanks to browser engine vendors who consider the real-world performance of these frameworks when optimizing their hot code paths. Maybe it’s the fact that web-apps are usually very simple compared to native apps in terms of interactivity.

Needless to say, react-native is completely different (or is it? vsauce.mp3). There is no one “engine” per se. Host platforms have their own complex web of APIs, that introduce breaking changes annually. React Native adds its own layers of abstraction. And finally, there’s your application code using all sorts of third-party libraries. All the assets are bundled with the app and are accessible locally, so there’s no network roundtrip optimization to consider here.
When I was tasked with this, very little of my web performance knowledge could easily translate, or so I thought. I had to dive deeper.

The article is divided into three parts that demarcate three distinct parts of this journey. They deal with identifying the root cause via measurements and benchmarking, deploying fixes and measuring the impact, and lastly further improving the perception of startup loading times.

Part 1: Let’s dive deep

Quick Google searches left me with even more doubts. There’s no one answer to why your particular app is slow to start. Could it be the react-native version we were using (we already had Hermes enabled so everything post that wouldn’t necessarily bring large improvements)? Could it be the react-navigation version we were using? Should we migrate to react-native-navigation instead (yes, they’re two different things)? Maybe we should move to Flutter? What about running a PWA inside a native shell? All of these were wild guesses with heavy costs. Unlike websites, you can’t just upgrade every single thing about your app and expect to not have to worry about your old code base at all. Even though we heavily use CodePush, upgrading the react-native version and pushing a new native release to respective application stores would still mean we would have to push bugfixes and maybe even major features to the old version until the new version is fully adopted, which can take anywhere from 1–3 months. Upgrading native dependencies would also lead to similar caveats as their JS APIs may not always be backward compatible. And moving to Flutter is a different story altogether. Even after going through all these steps, there’s no guarantee that we would’ve solved the problem.

Here’s how I see it, you seemingly have very little control over how things boot up when you use a complex stack like this. You can’t really remove all your modules, you can’t really delete all our JS logic, and you can’t really remove all your UI code. Largely, what you do between JavaScript VM kicking in and your first component rendering is where your control lies as someone who’s proficient only in JavaScript.

An Excalidraw screenshot that shows 3 rectangles, “MainActivity”, “react-native” and “App Code”, left to right. “react-native” has two more rectangles within it; “modules” and “JS Bundle” left to right, implying that “react-native” phase has a module initialization phase and a “JS Bundle” execution phase. “App Code” has two rectangles within it; “Boot” and “<UI />”, implying that “App Code” has a bootstrap phase where things happen before React components are mounted, and a “<UI />” phase where React components and their lifecycles (componentDidMount, useEffects) dictate where time is spent.

A more formal graphic would be something shared by react-native’s own team, Ram Narasimhan, who has excellent videos on the same in their Chain React talks (2018) (2019) (2020).

A screenshot from the “Chain React 2019 — Ram Narasimhan — Performance in React Native” video around 9-minute mark, where Ram is describing all the steps that take place to go from app launch to first View to be rendered. It has a yellow bar, denoting the app startup time timeline. Beneath there are two bars horizontally laid down; “native” in blue and “JavaScript” in green, implying the first bar is comprised of these two bars. Then there are furthermore bars beneath “native” and “javascript”, like a flame chart. “native” has “init” taking 1/4th of space, “native modules” taking half space, and “jsvm” takes the remaining quarter. “Javascript” has “init”, “network” and “components”, all taking roughly a third of space. “components” has unlabeled bars underneath, implying various application components would be taking their individual times.

A screenshot from React Native EU 2020: Parashuram N — React Native Performance — Take 2 around the 2-minute mark. Parashuram is recapping his previous talk with the same screenshot mentioned above, this time it has labeled arrows to summarize the previous talk. The yellow bar is labeled as “app start”, this time around the bar is much shorter from its original length, and the original length is expressed as a dotted rectangle, implying we have reduced the startup time from what it was originally. The leftmost edge of the bar has an arrow labeled “ReactMarker”, denoting that it can be measured using React Markers, as explained in the previous talk. The rightmost edge has two arrows pointing to it. One says “End Marker”, denoting when we stop measuring the “startup time”, this could be the presence of a particular <View nativeId=’1234’ /> component, measured by an event listener on the native side where the nativeId matches. The other arrow says “Better App Startup time”, implying that we’ve reduced the bar length via optimizations discussed in the previous talk. This time around “network” has moved from being underneath “JavaScript” to being right beneath the “native” bar, but starting at the same time as “init”. The arrow pointing to “network” says “Native network, Mock Data, Native Loading Screen”, implying we can move the “network” required for app launch from “javascript” and do it in “native” land, to prevent stalling. “init”, “native modules” and “jsvm” are right where they were in the previous screenshot, just that “network” is now parallel to these three, implying concurrency. “TurboReactPackage” labeled arrow points to “native modules”, and “Hermes, JS Profiler” points to “jsvm”, implying strategies to improve these phases of startup. “init” underneath “javascript” has an arrow pointing towards itself labeled as “inline requires”, a strategy used to reduce module resolution times. “React Profiler” points towards the bars underneath “components”, implying “React Profiler” can be used to optimize these.

If I’m being honest, I don’t understand every single thing here. It was a bit intimidating at first. Don’t get me wrong, Ram explains it really well and it makes great sense. However, I could see large gaps in my understanding of the stack and domain knowledge required to tackle this. And I think that’s okay. It’s okay to be intimidated by something that runs your regular looking JavaScript and ReactJS code on a custom highly optimized JavaScript VM, on a framework that glues native platform APIs and the said VM, with fancy stuff like TurboModules, Hermes, Bridge, or JSI, in handful of languages that you may have very rusty familiarity with. Yup, it’s supposed to be hard.

However, one clear takeaway for me was that I need to measure before I can optimize anything. I need to understand how my app actually boots up, and where all the time is spent. Before I understand ReactMarkers, maybe I can take a hard look at the diagram I drew above and find the bottlenecks there. At least we’ll be somewhere in the ballpark. And trust me, the kinds of optimizations our app would require might not really reside in the native side of things, otherwise I’ll be having a hard time writing this post.

That’s my first learning to share. If your solution to the problem is to upgrade various dependencies, then the locus of control is never on your side. Sure, you may find great improvements every now and then (Hermes single-handedly saved so many react-native projects!), but at what opportunity costs? What if you’re waiting for react-native’s new architecture for years and even when it lands, you may still not necessarily get 5x improvements in startup performance? Was it worth the wait?

Measure twice, cut once

This is when I realized I shouldn’t approach this problem by finding packages to update in my package.json. I need to understand how our app even boots up. I was also tasked to evaluate react-native’s viability for our use case, and see if moving to alternate technologies is indeed a smart move for us or not.

Before I looked at any code, I wanted to truly see how slow our app is. Previous developers (Kartik Ukhalkar, Bhavya Rawal) had already done great work by adding excellent telemetry. We already knew how much time our app spent in the real world to go from MainActivity to the first screen. The numbers looked something like this; 4.2 seconds on P50, 12.2s on P75, 59s on P90, and ~7 hours on P99 (Press F for our poor users waiting for hours to load our app). Yeah, something isn’t right here. Let’s put a pin on it for now.

Now that I know real-world performance, I wanted to compare our app to other apps. I launched all remotely resembling apps, and even highly performant ones, and tried to compare their performance Digital Foundry style by making a video and comparing the start and end manually. I warmed up my phone by opening them all first after a fresh install and bypassing their onboarding step. Not the most scientific, but the most user behavior-like I would say.

This is what I found on my mid-range Android phone (Realme X, Snapdragon 710);

A video with 9 side by side screen recordings of an android phone launching individual apps. “Amazon” 1.91s, “BigBasket” 2.86s, “Slack” 2.86s, “udaan” 3.31s, “Flipkart” 3.52s, “BlinkIt” 4.65s, “Swiggy” 4.87s, Ajio 10.45s, Notion 10.54s. There’s a dropdown with “JFM 2022” selected, implying these tests were performed in January-February-March 2022.

Never expected our app to actually be on the top side of the list! I was surprised to know Swiggy was actually technically slower to load, even though I regularly use it and never had any issues with its loading times. Clearly, the designers and engineers at Swiggy understand the importance of perceived performance, something we’ll talk about later in the article. Anyway, coming to our question of the viability of react-native, I know from my previous stint at Swiggy that it is not a react-native app, and if our app could actually “beat” it, then we can say that react-native is a fairly safe bet. Flipkart is also known for using react-native and their incredible engineering team for building one of the finest solutions in the react-native world, so if our app is able to match its performance as is in my unscientific test, surely we can ooze out a bit more performance staying with react-native. Note, unlike most b2c apps, our users are business owners, not necessarily with high-end phones. Another thing to note is that not all apps are comparable. BigBasket shows a relatively simple first page, notion’s shell is interactive much earlier just that their webview takes its time to load the page, and Slack shows a list of channels by default (which is much easier to cache). The intent of user also plays a role here, when you open Swiggy, you probably have very high intent to order something and your fingers quickly tap the right buttons right away, while when you open slack, you are probably dreading to see a list full of red bubbles. So please don’t draw incorrect conclusions out of this, the goal is simply to understand how apps load and how they feel and where udaan stands, at least in this unscientific test.

Coming to udaan, despite absolute numbers showing udaan to actually be quite competent, why is it that our team and users feel our app is really slow? I guess it’s a mix of perception, our user base, and a lack of optimizations. No matter how fast the app loads in this test, the median performance is still 4.2s, some ~30% slower than what we observed above, this probably means the majority of our users have even slower phones than my mid-range phone.

Taking matters into your own hands

So far I’m convinced our app is actually fairly performant and our bottlenecks are probably specific to our app code and not the technologies we are using. This meant I needed to dig deeper. I spent the next few days adding logs, right from MainActivity to the first meaningful screen we show to our users. I could figure out integrating several debugging tools, learn about systrace and make sense of flipper, but nothing beats the simplicity of the mighty logger. I can add a logger between two timestamps in any language and call it a day. By understanding the mechanics of how our app loads, I could truly understand the bottlenecks in the startup journey. This is what I came up with in the debug mode;

A Gantt Chart with the title “App Startup Timeline (Before)”. The raw data can be found under “Appendix” at the bottom of the post in tabular format.

Legend:

Native Package Registration is the time spent for each packages.add() in react-native template code for non-autolinked packages.
JS Bundle is a step in a debug mode where Metro bundler would build and stream a bundle to the emulator. This would be a file-read operation in the production build. This took ~15s on my setup, but I’ve trimmed it to 0.5s to prevent it from overshadowing everything else.
CodePush Wrapper + Redux is one of the first react components we render, the time is representative of react-native giving control to JS execution and the time spent to go from AppRegistry.registerComponent to render the function of these components.
<AppBuilder />, <App />, <CodePush /> etc. are just small react components in the initial react tree.
<LoadingBanner /> is the first react screen visible to user after splash screen. It is visible while we “load” our app.
Configuration.init() is the first operation outside of ReactJS where we await various promises and refresh our session tokens with our auth service.
<ImagePerfTracker /> is another small react component.
<NavigationContainer /> is the component that builds our react-navigation stack, tabs, drawers navigators.
<InitialRoute /> is the first react-navigation screen. It’s componentDidMount waits for navigation focus event, and a DeepLinkHelper.init() to finish before navigating to the actual first screen (HomePage/Login).

This activity, while tedious, helped me truly understand our app better. It doesn’t even matter if it’s accurate or not, we only need to find relatively large tasks that we could move around or even eliminate here. And I could see that most of the time is spent AFTER control flow reaches JS, another sign that the problem lies in our app code side. Most of these components are part of critical path, and it doesn’t seem like we’re doing anything wasteful here. CodePush and Redux are needed to be initialized before anything else, a LoadingBanner is shown right when a long running task is executed, NavigationContainer has to setup all our screens during boot, and we need to handle deep links before showing the first screen to avoid splash of incorrect screens. Everything looks relatively alright.

Next, I tried to breakdown Configuration.init() and InitialRoute’s mounting phase.

A Gantt Chart with a title “Configuration.init() Breakdown”. The raw data can be found under “Appendix” at bottom of the post in tabular format.

A Gantt Chart with a title “<InitialRoute /> breakdown”. The raw data can be found under “Appendix” at bottom of the post in tabular format.

I have simplified the pieces of the code to just a simple component with useEffect, however this was buried behind 5 components and different helper functions. So depending on your codebase and its age, expect finding such a pattern to not be so easy and obvious.

/**
 * Imagine this is how Configuration.init
 * NavigationBuilder and LoadingBanner are
 * used in the app startup
 */
function App() {
  const [status, setStatus] = useState({
    loading: true,
    config: null,
    error: null,
  });

  useEffect(() => {
    let mounted = true;
    Configuration.init()
      .then((config) => {
        if (!mounted) return;
        setStatus({ loading: false, config, error: null });
      })
      .catch((error) => {
        if (!mounted) return;
        setStatus({ loading: false, config: null, error });
      });
    return () => {
      cleanup();
      mounted = false;
    };
  }, []);

  if (status.loading) return <LoadingBanner />;

  if (status.error) throw status.error;

  return <NavigationBuilder config={status.config} />;
}

Hmm, at this point these charts started resembling the Network Panel of a website. I asked myself, can’t I just “prefetch” or “preload” some of these tasks and reduce the stalling phases in the pipeline? Clearly Configuration.init() is a critical task that blocks everything else, doing this as the very first step would speed up everything. Similarly, DeepLinkHelper.init() can be done before even React components are mounted. These functions are promise returning functions, so I thought it would be easier to fire up these functions and “cache” these promises in their pending state, so that by the time they’re actually required, they might as well be resolved. There are other obvious optimizations, such as loading session on native side and send it over the bridge right during boot, or to consider TurboModules to reduce package registration times, or refactor how we use react-navigation. All of these appeared to be large breaking changes, that may come with their own set of bugs, and may require segmenting our users and annoy development team to maintain two branches of our app for months. And they didn’t even seem to be that much impactful. Loading Configuration.init() ~500ms sooner seems much larger win than shortening a ~180ms <NavigationBuilder /> step to say ~100ms. So I shortlisted these two init functions to begin with. Here’s how you can go about “caching” these promises early:

// before
export class Configuration {
  static init() {
    return somePromiseReturningFunction();
  }
}

// after
export class Configuration {
  private static promiseValue = null;

  static init() {
    if (this.promiseValue) {
      return this.promiseValue;
    }
    this.promiseValue = this.actualInit();
    return this.promiseValue;
  }

  static actualInit() {
    return somePromiseReturningFunction();
  }
}

The above changes allow us to call Configuration.init() at any moment. So we can do this right before ReactJS is even in the picture, without changing its actual usage in App.

/**
 * Warm up Configuration.init for later use in <App />.
 * Make sure it is idempotent to avoid side-effects.
 */
Configuration.init();

AppRegistry.registerComponent(appName, () => cp(Root));

Say what you may, but classes are sometimes good tools encapsulating data and behaviour. If you’re more of a functional purist, I guess one could write it like this:

// before
export function initConfiguration() {
  return somePromiseReturningFunction();
}

/**
 * Should be called once per session
 */
function createConfigurationInitializer() {
  let promiseValue = null;

  return () => {
    if (promiseValue) {
      return promiseValue;
    }
    promiseValue = somePromiseReturningFunction();
    return promiseValue;
  };
}

export const initConfiguration = createConfigurationInitializer();

Moving on, we can do the same with DeepLinkHelper.init(), even though it depends on Session to be loaded, it’s still possible to “warm up” this async task before it’s required in <InitialRoute />. It was also waiting on three other DeepLink handlers (CleverTap, AppsFlyer and react-native’s own Linking package), but sequentially. I made them concurrent to optimize it further, though keep in mind, the first few seconds of app launch tax the hardware quite a bit, CPU, GPU, RAM, everything is kick started to dump huge payloads all over the place, so it may not always be easy to find gains via concurrency.

I’ve simplified the code once again, so expect it to be nested and not so easily laid out in the real world. Nobody wants to write unoptimized code, in this particular case it probably started off with Linking.getInitialURL() where using async/await probably made sense, but with new requirements, developers just followed the already established pattern and added one more async call.

It’s simply impossible to anticipate and assume problems of future to write complex optimized code prematurely that actually scales well, but it is relatively very easy to identify unoptimized and simple code once a problem actually arises as you scale.

// before
const linking = await Linking.getInitialURL();
const cleverTap = await CleverTap.getInitialUrl();
const appsFlyer = await AppsFlyer.getInitialUrl();

/* once all promises resolve, one with a url gets picked */
const url = linking ?? cleverTap ?? appsFlyer;
if (url) {
    handler(url);
}

// after
let handled = false;
const handleOnce = (url?: string) => {
    if (!url) return;
    if (hanlded) return;
    handled = true;
    handler(url);
};

/* first promise to resolve with a url gets handled */
Linking.getInitialURL().then(handleOnce);
CleverTap.getInitialUrl().then(handleOnce);
AppsFlyer.getInitialUrl().then(handleOnce);

And yup, just these two pipeline reordering steps were enough to get substantial performance wins in my local tests. We went from 2335ms to 1862ms, a ~20% improvement without actually touching any business logic! Also observe how Configuration.init() and DeeplinkHelper.init() are now running concurrently to other tasks. Even though some of the task durations have actually increased, there’s still a net improvement in overall startup time. A ~470ms win is quite noticeable even in the debug mode and should stand out in production as well. Though I wouldn’t read too much into the absolute figures on a local setup in debug mode, the story may be different on production. foreshadowing.

A Gantt Chart with a title “App Startup Timeline (After)”. The raw data can be found under “Appendix” at bottom of the post in tabular format.

How to deploy scary changes?

I was afraid of releasing it as is. What if some race condition somewhere could break something completely different? Sure our app loads ~500ms slower presently, but at least it loads! Also, how do I truly quantify these results in production? Can we do some sort of A/B?

Well, this all happens before session is loaded, so we need to bucket our users, or more accurately user sessions, on device. Fear not, Math.random() is here!

I refactored the above code something like this to begin with:

/**
 * Housekeeping before we mount and render the root react-native component.
 * Use this to eagerly prefetch/initialize/warm critical I/O sources.
 * Make sure these initializations aren't blocking main-thread,
 * as that can lead to delays in startup times.
 */
bootstrap.init();

AppRegistry.registerComponent(appName, () => cp(Root));

This allowed me to do my shenanigans in an encapsulated bootstrap.init() method. I created a getLocalExperimentByPercent utility function that uses Math.random() to segment each session. I pushed my changes with conservative experiment values like 1% and regularly monitored its impact until I was confident about its stability. I actually found some bugs along the way, like this optimization breaks when the app is started by a notification due to some race condition. It wasn’t hard to disable this optimization in this manner, and we anyway don’t care about startup performance for background notifications.

/**
 * This is where we boostrap all startup related dependencies.
 * Goal is to fire things asynchronously ASAP and then use them when they're actually needed.
 * To do this, we'll save the reference to promise and keep them idempotent.
 */
export class bootstrap {
  static config = {
    /**
     * warms up Configuration.init and DeeplinkHelper.init.
     * Observed benefit has been around 20-40%
     */
    shouldWarmUp: getLocalExperimentByPercent(1),
  };

  static init() {
    // ... other init tasks like setting up telemtry module, error reporting module, etc.
    const { shouldWarmUp } = this.config;
    /**
     * We don't need to optimize background app launches or shouldWarmUp is disabled
     */
    if (AppState.currentState !== "active" || !shouldWarmUp) {
      return;
    }
    /**
     * Warm up Configuration.init for later use in <App />.
     * Make sure it is idempotent to avoid side-effects.
     */
    Configuration.init();
    /**
     * Warm up DeeplinkHelper.init for later use in <InitialRoute />.
     * Make sure it is idempotent to avoid side-effects.
     */
    DeeplinkHelper.init();
  }
}

/**
 * Simple util to provide local experiment system using Math.random
 * This won't be sticky, i.e. each session is unique as oppposed to each user.
 */
const getLocalExperimentByPercent = (percent: number) =>
  Math.floor(Math.random() * 100) < percent;

Was it worth it?

So far whatever I’ve shared with you has very little to do with react-native, platform APIs or third party dependencies. One may even wonder if any of this was even worth it, after all how efficacious these ~100 lines of code could truly be. Thanks to the telemetry already put by my predecessors, it wasn’t hard to put it to the test. The changes were deployed in late May 2022 as a CodePush update (no native release required!). And I couldn’t have been happier with the results. We actually see a doubling of the performance on P75. This directly translated to 35% increase in conversions to our search page!

When I think about it, I guess for our users every latency would have exaggerated effects. If my auth handshake took ~400ms on my WiFi connection, their highly congested marketplace with appalling 4G performance might even be ~1200ms. Similarly, if my mid-range phone took sub 500ms for most tasks, their phones would’ve greatly benefited from this pipeline reordering to better utilize the constrained hardware.

A bar graph showing msApp for 75th Percentile. Jan 2022–12.59K, Feb 2022–12.98K, Mar 2022–13.15K, Apr 2022–10.37K, May 2022–12.49K, Jun 2022–6,637, Aug 2022–5,841, Sep 2022–5,722, Oct 2022–5,443, Nov 2022–5,354, Dec 2022–5,335.

I performed my Digital Foundry-esque test once again, and yup, it was faster. We jumped a spot and were now in the top 3 apps from my weirdly curated list, with ~400ms improvement. Although, we can see other apps also improved and some even regressed, and this was done few weeks apart, so it’s probably not wise to read too much into it. However, our app still “feels” slow, despite being objectively faster. Clearly we need to work on perceived performance at some point.

A video with 9 side by side screen recordings of an android phone launching individual apps. “Amazon” 1.91s, “BigBasket” 3.6s, “Slack” 3.1s, “udaan” 2.93s, “Flipkart” 2.47s, “BlinkIt” 4.67s, “Swiggy” 4.26s, Ajio 9.45s, Notion 7.62s. There’s a dropdown with “AMJ 2022” selected, implying these tests were performed in April-May-June 2022.

Promising stuff. Now let’s look at the P90 numbers, which was already more than a minute! These changes could’ve saved them minutes of their lives every day. As I excitedly selected P90 aggregate group on Mixpanel, the startup performance of my frowns was beyond measure. I was utterly disappointed.

A bar graph showing msApp for 90th Percentile. Jan 2022–59.27K, Feb 2022–59.47K, Mar 2022–62.74K, Apr 2022–54.53K, May 2022–77.38K, Jun 2022–90.89K, Jul 2022–79.05K

“No, this can’t be right, I made it worse for them?! Shouldn’t my hypothesis only get proven right at fringes of our user base with even slower devices and slower network types?” I said to myself in despair. It was also curious to see how July numbers were better than June; this was my hint that these numbers are probably caused by events during June and now by the code directly.

Part 2: Question everything, even the evidence

I couldn’t believe it, so I dig deeper. I knew that P99 numbers were absolutely non-sensical. Nobody would wait for our app to load for hours, so something must be wrong with our measurements. I approached this from multiple directions.

One of my colleagues recommended me to call some of the folks from this cohort. I requested access to their business phone numbers, and shortlisted ~5 people who were witnessing >5 minutes startup times in the last week. They had all sorts of concerns, regarding promotions, delivery times, discovery of past orders, but they never mentioned app performance. When I specifically asked about app loading times, some said it’s alright, some said it’s okay, it loads up in 10–20 seconds. One said that even if it loads after 30 seconds it’s okay, but the order details page is loading very slowly for them. This was my first hint that our telemetry is reporting wrong numbers.
I looked at P90 closely over a span of a day, hour by hour in a tier 3 city. I was expecting to see a smooth line of ~60s, but that wasn’t the case. I saw numbers like 2.5s, 3.7s, 3.6s in most hours, but at some irregular intervals the number would spike up to 21s, and even 187000s (51 hours). These sporadic large spikes were making an otherwise ~3s P90 number an averaged 60s. 51 hours, damn, that’s over 2 days. These high peaks were not consistent, 12pm one one day, 3pm on another. All cities that I gathered the data for all had similar trends.
We have other react-native apps under udaan that were forked from the b2b app at some point in the past. I looked at their P90 numbers that are also measured similarly, and they actually were much better. Something must be unique about our app.
I also computed order conversion rates broken down by app launch. I made cohorts of 0–1s, 2–3s, 3–5s, 10–60s, 60s+ startup times. One would assume that if the app loads in 3–5 seconds vs 100–200 seconds, the user just might not care to place an order in the latter case. But I didn’t find any correlation. In fact, it was better in 10–60s group, which was also where majority of the reported startup times were. This was my second hint that these numbers are all non-sense.

A screenshot showing line graph for Mar 20, 2022 — Apr 5, 2022 for msApp, P90. It’s largely near 0 but with peaks going as high as 500M. There’s a table beneath it, showing numbers ranging from 3000–5000 and and some in ~100 million. The average is 12.39M.

I started to wonder what all can make a simple subtraction so wrong. How we measure this msApp value is that we measure the time on the native side (t0) and pass it over the bridge to JavaScript side. Then, once we reach a particular phase of our startup timeline, we subtract the current system (t1) time with previously calculated native time, and send it to our telemetry service.

🚨 React Native 0.72 adds performance.reactNativeStartupTiming with similar measurements, though the pitfalls shared ahead might still be relevant.

How can this number be so huge?

What if t0 is from a previous app session? This can explain a number like 51 hours.
What if t1 occurs much after the app actually starts? This can explain sporadic events.
How come my changes regressed this number? What if t1 is actually that huge in certain cases?

Case 1

After various experimentations and logging, I found that each time our codepush update is applied and the JS context refreshes, the startup time is counted again. The issue is that the native side never truly restarted, so that t0 was still stale. So if I started the app at 1pm, received an update at 1:30pm, and actually applied it at 2pm, t1 would be ~2 hours ahead of t0. This was easily fixed by adding an alreadyInvoked boolean that we flip to true after sending the initial t0 payload over the bridge at the native side. If JS requests for t0 again, say due to a codepush update, we can simply return early if alreadyInvoked is true, as we don’t have to measure this startup time. I did a small A/B using the above Math.random() system and yes, there was substantial difference.

A bar graph comparing two sets of data labeled “Without Fix (seconds)” and “With Fix (seconds)” for Sep 6, 2022 — Sep 13, 2022. “Without Fix” has erratic numbers, minimum value of 73.83 and maximum value of 784.7. “With Fix” is largely stable with a downward trend, with minimum value of 15.38 and maximum value of 23.06.

Case 2

The second point also turned out to be true. I got this hint from previously mentioned race condition I fixed by removing the optimization when our app was launched by a notification. So, t1 is measured when a particular component is mounted, but it doesn’t get mounted in case of a push notification. However, native system and some parts of our bootstrap timeline do get executed. Imagine I receive a notification at 1pm, and I actually open my app the next day. This would make t1 roughly 24 hours ahead of t0. This explains 51 hours. Users may receive notifications over the weekend but they may open the app only when the open their shops on Monday, for example. The fix was simple, we skipped these measurements if the app was in background using AppState.currentState !== "active".

Case 3

What if t0 and t1 were truly that far apart? None of the above cases explain the regression that I might have caused by my simple optimizations? It was surely interesting to see that right when I was performing all these comparison and aggressively logging the app, the P90 numbers increased. Wait a minute, do we send logs in debug mode!? Yup. That was it. Can we please have a moment of laughter at the stupidity of this. Each time I was investigating our app performance, or a developer was running the emulator, we were inflating the P90 numbers! If only we didn’t work this hard, the numbers would’ve been so much better! The fix was simple, I just added an early return using __DEV__.

Did we get it right this time?

We pushed all these fixes under similar A/B experiments and took them to 100% by late September 2022. Now that we spent so much time fixing our telemetry, it was time to observe the impact, and it was once again incredible. Note, these improvements are largely on paper, as users were never actually observing 51 hour long app launches, but it doesn’t hurt to have accurate telemetry next time we do another round of optimizations. Note that P75 numbers immediately improved in June 2022, but improvements in P90 only started to appear around August-September 2022. This is also part of slow adoption rates at the fringes. The number came down to ~11s by the end of the year.

This also shows that the P75 improvements weren’t the result of fixing the telemetry. However, we did see the numbers slightly improve (~200ms) and stabilize as these fixes were rolled out, making them much more deterministic. And as you can see, we have been consistently performant for almost a year now.

A bar graph showing msApp for 90th Percentile. Jun 2022–194.8K, Jul 200–78.6K, Aug 2022–39.06K, Sep 2022–20.26K, Oct 2022–13.5K, Nov 2022 to May 2023 are all around ~11K. A sign of stability and consistency of data.

Fun fact, we did upgrade from react-native 0.63.5 to 0.70.6 sometime in December 2023. My incredibly talented teammate Mihir Karandikar handled that migration for our app with ~300 screens. The adoption took ~4 months, during which he and Anupam Prakash, another exceptional developer, maintained two tracks of codepush, one for 6.x and other for 7.x versions of our app. This meant, most of the business logic had to deployed twice, for both android and iOS. Luckily, we have a layer of separation between our business logic (ui-screens) and app shell (udaan-fe/app), so it was as simple as updating a version in package.json of the two git branches (maybe we should write a blogpost on the same).

As you can imagine, this was a huge collaborative effort between release managers and developers, and we finally made it by April 2023 when both iOS and Android apps had reached ~95% adoption. We benefited a lot by modernizing our codebase and several third party dependencies. This added much more polish to otherwise neglected parts of our app. It was definitely much needed, as 0.63.5 was released more than 2 years ago at the time of the upgrade.

However, simply updating react-native didn’t really translate into startup performance substantially (~400ms). This only validated my initial gut feeling that simply updating package.json dependencies ain’t gonna cut it. Hermes was definitely the exception though.

Part 3: Sometimes perception can matter more than reality

The proverbial hotel elevator story comes to my mind when we talk about perceived performance. This is how it goes;

A hotel manager was receiving complaints from guests that the elevators were running too slowly. She looked into speeding them up and installing destination dispatch, a multi-elevator optimization system, but the cost prohibited her from implementing either. The complaints continued to pour in and she was at a loss for what to do. She knew that she had to come up with a solution before the guests started leaving negative reviews. After much thought and consideration, she finally came up with an idea. She installed mirrors in the elevator. The guests, busy admiring themselves, actually were quite satisfied by the elevator’s speed, in fact they felt it was a bit too fast, if at all.

I don’t know the validity of this story, but it seems reasonable. In fact, it is common almost everywhere you have to wait. Magazines at the Dentist’s, cheery music in office lobbies, useful tips and player stats, or sometimes even squeezing through tunnels in video games as the engine loads up the next level.

Softwares usually mask loading using splash screens and animations. This was quite evident in BlinkIt’s startup timeline. Kudos to their team for doing a fantastic job. Though BlinkIt’s actual startup time is slower in absolute terms, the experience actually feels quite delightful and reasonably fast.

A screen recording of an android phone opening “BlinkIt” app. It opens the splash screen, with a eye blinking animation concealing the text of “BlinkIt” as if there’s an eye lid blinking. It’s quite delightful.

I teamed up with our design head, Gaurav Sharma, and we deployed a simple yet effective startup animation. The effort was primarily in fine tuning the durations. We kept it long enough to cover our P75 startup duration, with an extended loading screen in case even more time elapsed. Apologies for using an emulator for the comparison and for minor stutters in the video, I couldn’t find time to find the old APK to do side by side comparisons on same resolution on real device, but you get the idea I guess.

A video showing two Android Emulators side by side launching the udaan app, running Pixel_5_API_30 and Pixel_5_API_31 (i.e. Android 11 and Android 12). Pixel_5_API_30 has no animation, just a simple red background and white udaan logo, while the Pixel_5_API_31 one has a satisfying animation of udaan logo coming into view from a small dot and the text slowly revealing itself left to right.

It’s really hard to measure impact of perceived performance for it’s purely subjective, but general consensus between the team was that the app felt really smooth and fast after this change. One of our Program Manager shared this with me.

“The startup animation is really slick. The app genuinely feels so much faster. This was a great change!” — A happy udaan user

I think there’s a lot one can do with perceived performance. As we saw above, some of the apps loaded a second or two later than udaan, but still the perception was that udaan app feels slower. Fixing this bias is very difficult, but there’s definitely a strong argument to be made here that investing time in better loading experience can pay better dividends than oozing out a few 100ms using complex engineering. At the same time, it doesn’t mean it’s a silver bullet. Ajio Business, for example, tries to mask its ~10s startup time with an elaborate animation, and I wonder how less than ideal that could be for someone opening the app multiple times a day.

Coming back to udaan, we achieved the above by converting After Effects animation into an Animated Vector Drawable on Android along with react-native-bootsplash. You can actually use different splash screens for different android versions, by simply naming your “values” folder differently. As a web developer I found this pattern to be pretty cool.

A screenshot of VS Code’s folder tree view. It shows two expanded folders named “values” and “values-v31”. Both of them contain a “styles.xml” file. “values” folder has 2 other files “colors.xml” and “strings.xml”, implying these are used by default, and “styles.xml” from “values-v31” overrides the “styles.xml” from “values” when API_31 is detected, i.e. on Android 12 and above “values-v31/styles.xml” is used and on Android 11 or below “values/styles.xml” is used.

Here’s a part of our styles.xml for android. Pretty straightforward. If and when we wish to improve this animation, all we’ve to do is update the drawable/animated_logo.xml file and push a native release. It would be fully backwards compatible and wouldn’t require segmenting users.

<!-- values-v31/styles.xml on Android 12 and above, use .avd file -->
<style name="BootTheme" parent="Theme.SplashScreen">
    <item name="windowSplashScreenBackground">@color/bootsplash_background</item>
    <item name="windowSplashScreenAnimatedIcon">@drawable/animated_logo</item>
    <item name="windowSplashScreenAnimationDuration">3000</item>
    <item name="postSplashScreenTheme">@style/AppTheme</item>
</style>

<!-- values/styles.xml on Android 11 and below, use a PNG -->
<style name="BootTheme" parent="Theme.SplashScreen">
    <item name="windowSplashScreenBackground">@color/bootsplash_background</item>
    <item name="windowSplashScreenAnimatedIcon">@mipmap/bootsplash_logo</item>
    <item name="postSplashScreenTheme">@style/AppTheme</item>
</style>

What’s next?

There’s a lot that we can still do, though I think we were lucky to discover the above pipeline optimization.

For starters, we have a lot of cleanup to do. Fast paced and high velocity culture naturally accumulates a non-significant amount of tech debt. As previously mentioned, we’ve roughly ~300 screens, though not all of those might be in use, or they could be clubbed together and refined. Such tasks usually have high cost but with seemingly low impact, but doing it for ~100s of screens definitely adds up. Reducing JS bundle should make the JS VM init times faster. Reducing number of screens should also improve <NavigationBuilder />s performance (react-navigation). Lazy loading those screens might help us further.

Similarly, using TurboModules should in theory lazily initialize every single native package that isn’t part of the critical path. Imagine a QR Code scanner package that’s buried 5 screens deep after launch. The 330ms was only for non-autolinked packages, the auto linked ones are in the majority.

Streamlining all the necessary network calls (auth session and data for first screen for example) can also help us a lot. I am not sure how fragile moving this to native would be, coz we suddenly lose the benefits of colocating data requirements and UI/business logic, but it is definitely something to think about.

And we haven’t touched any third party logic and business logic yet. What if we looked closely at how CodePush works and optimized its code paths? What if we explored even better way to register our screens with priority (critical screens are registered immediately and rest are registered after the fact)? What if udaan’s first page wasn’t an infinitely loading feed but a “launcher” screen, akin to what BigBasket or Swiggy does? What if the feed is pre-populated with cached data so that the user instantly gets to use our app? There’s so much one can do!

Closing Thoughts

We have reached the end. I started with poor understanding of React Native and am now left with substantially faster startup times with minimal business logic changes, reliable deployment via experiments, and even a non-broken telemetry system with consistent reporting. So if you too are scratching your head trying to improve your app’s performance, I hope this article shares some insights as to how to tackle this problem beyond the general “enable this flag, use that package, delete that code” advice. All of this was achieved by an intimidated and scared web developer who simply added log statements all over the place to understand where all the time was spent.

Here’s your tl;dr

Comparative analysis suggested that our app wasn’t actually that slow to begin with, but it “felt” slower.
Major bottlenecks were found in the JS side of our react-native app by simply adding logs, everywhere from MainActivity to first screen we rendered.
2 of those were ~500ms long each asynchronous tasks that were running way too late than they could’ve.
Moving them to run right at the very start of JS’s control flow and reusing “cached” promises helped us reorder the startup pipeline. This is more akin to preloading assets in a network waterfall on a website. The change was ~100 lines without impacting any business logic.
Math.random() was used to segment user sessions on device, for a reliable, safer and gradual rollout. The improvement were pushed as CodePush updates, without any native release.
Several issues in the telemetry were found by analyzing hourly data for a city at P90. High peaks were causing large P90s;
CodePush was skewing our startup times due to rerunning JS while native timers were still intact.
Push Notifications were messing up with our startup times as the “start” and “end” were happening when the notification was sent and when the user actually opened the app (sometimes days later) respectively.
We were recording startup telemetry even in the debug mode, adding noise to high percentile figures.
Perceived performance was improved by using animated splash screen. Argument can be made that this is more important if not as important, as you start seeing diminishing returns after a certain threshold.

I also wanted to say that this wasn’t possible without the environment created by my peers. I learned to build things in a way we can measure, to release things behind experiments, to do cost-benefit analysis before jumping to code, to problem solve one step at a time, to be driven to excellence and to use first principles, only due to the tech culture created by the folks of udaan. If I wasn’t pushed in the right direction (shout out to Kaushik Mukherjee for the same), we might have blindly gotten into blackhole tasks without any meaningful improvements. So if you are in a position to lead others, make sure you create a safe place where your peers can work and learn by experimentation without fear of judgement and come up with tangible, measurable solutions that don’t require you to perform an engine transplant while the car is still running.

Finally I wanted to exclaim how awesome Web Development tooling is. You can just drop any URL to web.dev/measure and come up with more solutions than you can count with your hands in matter of seconds. While React Native tooling may not be as robust as Web Development, it’s only improving thanks (FlashLight is the mobile Lighthouse-equivalent) to the efforts by team behind Maestro and other community members. I don’t have any tool to share but I hope the caveats and general approaches I shared might help you to solve problems with your app, and maybe even build a tool for the same!

Appendix

Here’s the raw data for all the Gantt Charts above. Note the absolute values don’t matter much. The order and relative length of tasks is critical in identifying spots for optimization.

App Startup Timeline (Before)

Configuration.init() Breakdown

<InitialRoute /> breakdown

App Startup Timeline (After)