As an athlete, two ways to improve performance are building strength or improving efficiency. If you train consistently, you start to see improvements. The path to improving software performance has some similarities like how you might see steady improvements over time, only to reach a temporary plateau before breaking through to see huge improvements. Things diverge as we search for ways to improve software performance. Three potential options are:
- Perform the same work faster than it was done previously
- Reorder work such that less important work happens after critical work completes
- Remove work entirely
The first is familiar and something all athletes and engineers can appreciate — figure out a way to get faster at doing the same work. Although athletes don’t have the luxury of the latter two options, engineers often provide the greatest efficiency improvements by employing those strategies.
What measurements we are tracking
In order to track performance and improvements, we defined three important statistics as a team to measure our progress. The first measurement is application initialization time, which is considered complete when our splash screen has its onResume method called. The next measures the time until our Feed screen appears. The last measures when Feed content is displayed. Our metrics only apply to cold start launches, meaning Strava isn’t already running in the background on the device. We set a goal only for the display of Feed content, the others were used as data points to inform and guide our work.
Third party dependencies
Relying on third party dependencies has major advantages and disadvantages. Using an external library allows you to conserve engineering resources and focus on business specific functionality. The downside being you are at the mercy of what is included in that dependency. Relinquishing control and knowledge of the implementation can introduce conflicting transitive dependencies or instability in the library. For example, one of our third party dependencies introduced instability in our app in the form of crashes as well as a significant performance overhead. An entire blog post could be written on the process for evaluating an external dependency. In our case, we ultimately built our own version to replace the library improving performance and stability while being tailored to our exact needs. Many times, building your own version of a dependency might not be a realistic option. However, for every dependency introduced, it is wise to evaluate its effect on performance.
Measuring App Start
A lot needs to happen when starting a large mobile application. We quickly learned a single measurement wasn’t going to be sufficient. To better understand the performance implications of each piece, we introduced granular measurements. From this, we learned the lion’s share of initialization was dependency injection of our application class.
Dependency injection makes an engineers job significantly easier. It is the equivalent to dining at a restaurant versus cooking at home. You show up, sit down, order a meal, and it shows up some time later. You are either blissfully unaware of the work needed to create your dish, or perfectly aware and want to avoid the effort. In our case, objects are provided to our application class, but we were blind to trail of objects those objects depend on.
To improve the accuracy of the granular measurements, we removed all injected objects from our application class. By transitioning the injection into smaller classes serving a single purpose, we finally had a clear picture of our initialization timeline.
Even this wasn’t perfect, as the order of operations influenced measurements. When two classes both depend on a shared resource, the first object created will incur the cost of initialization. For example, consider if ObjectA measured 100ms and ObjectB measured 50ms. The natural decision would be to optimize ObjectA. Assume we find a way to eliminate ObjectA entirely. Rather than realizing a 100ms win, ObjectB was now measuring 125ms, as it has begun assuming responsibility for the creation of ObjectC.
The onCreate method of an application is the de facto place to put code that needs to happen each time an app is launched. Numerous things fall into this bucket: third party library initialization, database migrations, purging resources saved to disk, and more. Many of these things don’t need to happen immediately when the application is launched; so long as they happen at some point after launch, that is sufficient. Until your application reaches a certain size, this isn’t a concern.
In order to push back certain launch tasks, we create a single object responsible for all delayed initialization. This consisted of cleaning cached photos, pruning stale database entries, along with others. Because these changes were not A/B tested, we were unable to quantify the gains. A simple measure of how long our delayed initializer takes to run wouldn’t account for background tasks and network calls that would be competing for resources during the startup.
One of the first rules of android programming is not to perform long running operations on the main thread. Network calls and database operations are the most common tasks to avoid if you want to prevent users from seeing the dreaded ANR (application not responding) dialog.
Our granular traces gathered during application start helped identify tasks that might provide improvement if moved to a background thread. However, we learned not to underestimate the cost of thread scheduling and to A/B test every possible change, as one change had a negative impact from moving code into the background.
A big part of improving performance is understanding what affects it. This can be difficult, especially when there are multiple variables affecting performance in different ways and some only occurring under certain situations.
Trace logging provided measurements for individual pieces of work. However, we noticed gaps in time that were unaccounted for. Here, we used the CPU profiler to visualize how our code executes and identify code that can be improved.
In one case, we observed nearly a half-second delay between the time we expected our feed content to be rendered and when it actually was. In our Activity that renders our feed content, we subscribe to a shared RxJava observable that loads our feed in the background via a network request. We noticed that even if the request finished prior to subscribing, there were instances showing a sizeable delay between this point and when the content was rendered. The profiler showed that between the time we expected the feed to be rendered and the time it was actually rendered, there were other chunks of work run. Attempting to pinpoint the delay, we discovered it was the shared observable thread scheduling. Interestingly the trace measurements showed an uneven distribution. Looking at the user percentiles, the delay was reasonable (under 20ms) for our top half of users, beyond that the delay quickly shot up:
55% — 52ms
60% — 93ms
65% — 134ms
70% — 176ms
75% — 221ms
80% — 271ms
Although it only affected half of our users, the users affected were significantly impacted. The solution wasn’t anything special — we replaced the shared observable with basic event listeners as we had control of the threads in other places. This goes back to A/B testing as much as possible, as we found something seemingly innocuous to have fairly negative impact on performance.
While looking for performance improvements, nothing was off limits. One idea was replacing our ViewPager with a single Fragment where the content is swapped. But first, we needed to gather analytics to understand how our athletes were interacting with our Feed.
By tracking tab clicks and swipes between pages, we learned clicks accounted for 90% of navigation between feed types. We tracked swiping events even deeper. Asking how often an athlete swipes from one tab to another only to immediately swipe back, as well as how often is a swipe initiated but abandoned such that it returns to the original tab. It turned out only ⅓ of all swipe events didn’t fall into that bucket.
When we introduced the newly designed feed containing a swipeable image carousel, we discussed the potential conflict of swiping pages versus swiping images. This included work necessary to avoid what we considered user experience issues. We now had data showing swiping was allowing inadvertent navigation between tabs and that only 3% off all navigation between tabs was by swiping.
Beyond the fact that swiping was uncommon, the ViewPager has a larger memory footprint and is slower to initialize because there are three Fragments created and held in memory.
One of the main things that takes up time during initialization is rendering and setting up your UI. This includes inflating layouts and setting up the UI in its initial state. As your app grows and supports more features this setup gets complicated. Often times, this complexity is caused by supporting use cases or features that aren’t immediately necessary to the user.
For example, certain failure states and edge cases occur infrequently and require special views to be handled. These special cases make a screen’s view hierarchy unnecessarily complex. To avoid this, use ViewStubs and only inflate the views when necessary. An added benefit of ViewStubs is that it breaks down your XML layouts into smaller, easier to understand pieces. We also used AsyncLayoutInflater, as it supports inflating layouts without blocking the UI thread and animations.
There is no silver bullet magic fix to improve performance. It is a grind filled with incremental wins. It is a never ending process. Similar to athletics, you get out of shape much faster than you get into it. Ongoing performance monitoring is crucial to the long term success of avoiding regression to your application’s performance.