You’ve received reports of poor performance from users of your mobile app, and you’re not able to reproduce the issue in a development environment. Where do you start debugging a problem on a device that you have no physical access to? Your application, like many others, has hundreds of views and user flows, and any one of them could be responsible for the issue. This is further complicated by the possibility that the issue may only be reproducing under specific conditions — a particular app version, OS version, or device model, just to name a few variables in the large matrix.
We started out by building support for tracing — instrumenting sections of code by automatically collecting performance statistics on CPU, memory, network requests, and more. We aggregated data from these traces to provide a birds-eye view of how your application was performing, and how performance changed between app versions. However, traces alone were too limited to diagnose specific performance issues that occur in production, which is why we have added spans as a new instrumentation primitive on our platform that will help target these use cases.
The case study in this post presents one kind of issue that Specto was intended to solve. If these scenarios sound all too familiar to you, and you are interested in trying a full suite of tools for improving the performance of your iOS or Android app, request access to Specto at https://specto.dev.
Spans help break down larger tasks like rendering a scrolling feed of content into smaller sub-components that can be analyzed individually to isolate the root cause of a performance problem. Spans are created by adding start and end markers to sections of code, which are then measured and visualized on our dashboard.
Unlike traces, spans can be concurrent and nested, and multiple spans of the same name can exist within the same trace. We recommend using spans to mark the more granular, high-level operations within a traced interaction in your app — for example, fetching data from the network, reading data from a cache, or performing resource-intensive tasks like image processing.
The Trace Dashboard
At a glance, the scatter plot can be used to identify clusters of traces that have durations that are unexpectedly short or long. By using the filter bar, the list of traces can be filtered using attributes like their timestamp, duration, device model, and more. These tools will help you narrow down the set of traces that you have to look at to find an occurrence of a performance issue.
Let’s take a look at a real-world scenario where using span data can help us visualize a performance issue and understand how to fix it. We have an iOS app we use to test the Specto SDK internally — this app renders a feed of upcoming movies, and users can open a movie to see the artwork, a description, the cast, and other information. We’ve captured the following trace while loading the detail screen for a movie:
The spans in green represent instrumented network calls that are captured automatically by the Specto SDK, once you opt into network instrumentation and group your requests with a tag. The spans in blue are spans that have been explicitly added to the code using start and end markers — this is as easy as:
let span = Specto.startSpan("analyze-image")
From the timeline, we can see that we have a few operations kicking off in parallel to fetch movie information like artwork, videos, and credits. It looks like most of the operations end around the 130 ms mark in this trace, but there are a few operations, like
load-movie-credits that start later and cause the trace to take 319 ms in total.
From looking at the source code of the app, I can see that we kick off the fetches for
movie-credits after the
movie-details request has returned a response, which means the rendering for that data also starts later. This is unnecessary, as the fetches for videos and credits do not depend on the data returned in the response of
movie-details, which just fetches basic movie metadata like the title and description. We can come up with two paths for optimization:
- Start the fetches for videos and credits immediately at the start, at the same time we start the
- The API we are using to fetch movie information supports batch fetches, so we could batch fetch the video and credit information within the same
movie-detailsrequest and avoid making multiple requests altogether.
We choose to implement #2, as our API client for the movies service already has support for batch fetching. The code diff looks like this:
After making this change, we release a new version of the app and collect some more traces. Here’s a trace for the same screen, from the new version:
I chose a trace that came from the same device model, and had roughly similar durations for the network requests, as variance in request latencies can skew the overall duration of a trace.
We immediately notice that the separate fetches for videos and credits are gone, since these have been batched into the single
movie-details request. As a result, the duration for this trace is 143 ms, which is a 55% decrease from the original!
Diving Deeper with Profiles
In addition to providing higher-level constructs like spans, we also run a sampling profiler to collect function-level performance data. Unlike tools like Instruments and Android Profiler, which require you physically connect a development device and capture a profile on your computer, all of Specto’s profiling data is collected in production on real user devices — you can focus on shipping your app and not have to worry about trying to simulate the multitude of performance scenarios that may occur in the real world.
In any trace, you can open the Call Trees menu in the top right of the trace detail page to view a tree representation of samples collected during the execution of a trace. For example, we can see this call tree in a trace captured in the startup path of the same app described in the prior example:
This runs during the initialization of a widely used 3rd party SDK, which executes somewhat expensive (~170ms) code that writes to a SQLite database on a background queue. This isn’t an immediate cause for concern since this code is not running on the main thread, but if it was, we would want to move this code out of the startup path to make the application more responsive.
To see even more detail, click the View Trace button to open a flame graph, which is constructed using all of the samples collected during the trace, separated by thread. In this other trace, we can see how long our image processing operations on the movie poster image take:
(Shout out to Speedscope, an excellent open-source flame graph viewer)