Why should we care about app performance?
How can we continuously improve our users’ experience based on reliable data?
In the last couple of years, smartphones have become the true kings of the technology market. No one is surprised that Android has overtaken Windows as the most widely used OS in the world, nor that Android and iOS currently provide more than half of the devices on the market with an OS.
This reality in which Fever finds itself today has a number of implications, such as the fragmentation that occurs mainly in Android. With more than 3 billion devices worldwide, we’re faced with a multitude of different hardware and software specifications, customization layers from manufacturers, and numerous versions of the Android OS. This fragmentation, together with the differences in IT infrastructure that may exist, means that in each case the user experience with the same app can be very different in each of the more than 100 cities across the world we are present.
In Fever, we try to reach as many users as possible regardless of the hardware, software, or network they have, thus trying to minimize as much as possible the effect of these differences on the user experience. To achieve this, it is essential to measure and control the behaviour of your app. It is especially important to capture metrics in strategic places or points of the flow. For example, a new feature we want to add may work very well in optimal conditions but may be overly demanding for older devices. In this case, we could decide whether or not to go ahead with the feature based on the results of our previously captured performance metrics. This is where crash and error control tools, such as Crashlytics, and performance monitoring tools come in. In this article, we will discuss the latter in more detail.
“What is not defined cannot be measured. What is not measured cannot be improved. What is not improved, is always degraded.” — William T. Kelvin
Application performance monitoring tools
These tools collect critical performance data and information from a system in order to monitor it and thus obtain certain valuable metrics or KPIs for it. These KPIs allow us to reduce and prevent errors and optimize certain key flows based on the data obtained.
Although there is a multitude of solutions on the market that address this, some of the best known are New Relic, Datadog, Sentry and Firebase Performance Monitoring (FPM). For an early stage of monitoring in which we want to control more basic or simple indicators, any of the four previously mentioned or any other would be more than enough to cover our needs. When our requirements become more complex then it becomes worthwhile to assess whether there is a solution that suits us better. For example, if we need to capture more detailed metrics, if we need to integrate this tool with the rest of the ecosystem of our project, or even consider developing an In-House solution for it ourselves.
Both Sentry and New Relic SDKs include the crash reporting part in the app and support integration with many other libraries that have become a standard in Android (i.e. OkHttp) and in iOS (URLSession). However, in our case, we opted for FPM, given its trivial addition to the project as we were already using other Firebase modules such as App Distribution or Crashlytics.
Precisely, that one (Crashlytics) is used for reporting crashes and controlled exceptions that we mentioned earlier, also offered by Sentry and New Relic. This, together with the fact that these two tools have a higher price, since both FPM and Crashlytics are completely free products, and the existence of automatic traces that provide valuable information about your app, such as app startup time or times of the different network requests made, without any additional development effort, were determining factors for our choice. In any case, all three options would be perfectly valid and totally recommendable depending on the needs of each individual.
First steps: Automatic traces with FPM
One of the most positive things we found in FPM is that once the SDK is integrated into the project it starts collecting information about critical processes in the app automatically, without the need to write a single line of code. Some of these automatic metrics include: The app’s startup time, network request times, and the app’s time in foreground and background.
We are going to focus on one of the most important flows of any app which is its start-up flow. More specifically we are going to focus on a screen that performs certain setup and configuration tasks for our user, such as the Splash screen. Let’s see what information FPM automatically provides us, and what conclusions we can draw from it.
The first automatic metric offered by FPM is the app start-up time, i.e. from the time the app process is created (or passed to foreground if it exists) until the user can make use of it. The tool returns this duration in milliseconds, and a percentage value comparing the current start with the average of the last X days.
Another of the possibilities that FPM offers us is automatically detecting how well the rendering of the different screens within our app behaves. In this case, for example, we could be interested in seeing how the Splash screen, which performs certain initial user configuration tasks with several requests running in parallel being therefore more demanding in terms of processing, renders or reacts to changes, on different devices (an older one such as a Redmi 9 and a more modern and powerful one such as the OnePlus 9 Pro 5G) and see how it responds in both cases.
In this case, it is easy to see that the most modern and powerful phone hardly has any slow frames, while the older phone has a much higher percentage, so we could investigate how to reduce this load on these devices. It remains to be seen whether this impact is due to poor management of the interface, with a more complex view tree than desired (unlikely in Splash) or the need to redraw too many times. Or even whether it is due to the high processing inherent in that screen (in the next step we will see how to use a custom trace to see this).
Finally, FPM also automatically collects information about the network requests we make in the application. For each of these requests, we have an initial dashboard that shows us some basic metrics such as the average response time. It also shows us the size of the request and the size of the request body (if any) and the success rate of the request. Let’s take a look at the data we get from the feed request, which is an expensive request that is made at the end of the app’s start flow, on our main screen.
Several conclusions can be drawn from this dashboard. Regarding the size of the response, there is nothing particularly remarkable in this case. However, the success rate does show very positive data, and that’s this request having a reasonably low error rate, which is crucial in a marketplace. If we go into detail, we can also see which are the most recurrent errors (in this case 504 and 401), and as always, filter by OS version, device, app version, country, etc.
Finally, the last piece of data that we can observe in this initial dashboard is the average time of the request. In principle it may seem high value, so we’re going to go into detail on this metric, filtering in the same way for several sessions with different characteristics again.
One session with an older and lower-end features device such as the Oppo A74 5G (CPH2197), and another more modern and premium terminal such as the Samsung Galaxy S22 Ultra. In this case, both sessions are without WiFi and have the same country and carrier (Movistar).
Next, we will compare this data with a session of a high-end device such as the Samsung Galaxy S21 Ultra 5G, but with a different country (Australia) and carrier (Singtel Optus), and therefore with a different latency than the two previous ones.
As it is reflected in the stats, we see that the times for this request are spread over a fairly wide spectrum, with values ranging between 936 ms and 2.51 s for the Australian sessions, between 337 ms and 1.23 s for the Spanish sessions with the low-mid range device, and between 148 ms and 1.24 s for the Spanish sessions with the premium device. It is clear that, although there are differences between the sessions of different ranges and technical specifications, these are practically negligible, and latency is a much more decisive factor for performance.
Regardless of all this, it is equally clear that this is a very demanding request, but why such large differences for the same terminal and in similar conditions? Well, this is precisely because of the high cost of this request and given the low mutability of the Fever feed in a short period of time, we use a server cache that allows us to lighten the next requests to the feed of the same user in that time period, so if we wanted to evaluate this request without the cache in this case, we should focus on the percentile 75 and above. In that case, and looking at the captured times, FPM would not be leaving evidence that we have some room for improvement regarding this request, and we could start working on it.
One step further: Custom traces
As previously mentioned, in addition to the traces and metrics that FPM automatically offers, we can monitor specific parts of the code by adding custom traces. This can be useful for example, in the case that we have a part of the code with a higher complexity than desired and we believe that it may be affecting even the rendering times of that screen. Let’s see an example with the Splash screen we mentioned before.
As we know, to avoid the effect known as jank (or hitches), that is, the flickering that users experience when our app does not adjust to the refresh rate needed to offer a smooth user experience. The screens of your app should render each frame in 16ms or less. This would ensure a refresh rate of 60 fps (60 Hz). When the rendering time of more than 50% of the frames of such a screen is higher, we would speak of “Slow rendering”.
However, when the rendering time exceeds 700 ms, we are talking about an even more serious problem: “Frozen frames”. This problem is not as critical as an ANR, but it does offer a poor user experience, as it is practically one second without responding to events. This problem can be caused by several reasons, and one of the possibilities is that it occurs due to some long or complex processing in the UI Thread, a thread dedicated to the rendering of visual components. If we have doubts about whether a processing performed in the main thread is affecting the performance of our app, we can easily measure it.
This could be the code we should add to get some data about that part of the code.
In this case we see that we have a coroutine, so the thread will be suspended instead of blocked. In this coroutine there are three jobs are launched asynchronously in parallel. Depending on the value returned by the different deferred, it executes one block of code or another. To measure how expensive this whole process is, we have created a custom trace called “splash_config_trace”, to which we add an attribute “initialized” with values “true” or “false” depending on the code block that is executed.
As we can see, this job takes an average of 1.3s, so we could try to launch the coroutine in a different thread than the default one, and see if this way we can make the performance of the screen more efficient.
Possible disadvantages
Although there are a great many advantages of using this performance monitoring to improve control over the different critical flows within your app, it can also come with certain disadvantages that must also be assessed., or develop our own in-house solution for it. The main disadvantages would be the increase in the compilation time of your project or the increase in the size of your app. In our case the build time on our CI/CD platform has increased by just under 1 minute, while the size of the .apk has increased by 0.3 MB, so it was not a big problem for us. In case such issues were bigger you can check this great article by Andrii Chubko that gives some possible solutions.
Likewise, totally free solutions like FPM usually offer poor or even no user support, however, it is true that the technical documentation they provide is usually quite complete and they usually have a lot of support from the community.
Conclusion and closure
App performance monitoring solutions, whether third-party or in-house, are indispensable nowadays in such a competitive world that every detail, both in terms of UX and performance, becomes essential for user retention. This type of tool helps in preventing errors and offers us information to detect points in our processes that do not work as expected, but above all it allows us to iterate on our product with the aim of achieving continuous evolution based on data and metrics.
Throughout this article, we have given a brief introduction to these monitoring tools, what they consist of, why they are vitally important, what options we have on the market and what drawbacks they may have in certain cases. We have also seen a real case study of how by using one of these tools we can detect possible points of improvement in one of our flows.
I would like to give special thanks to my colleagues Daniel Cuesta, Justin Howlett, Sergio Vasco, Carlos Salguero, Guillermo Cuesta, Jose Miguel Álvarez, Alexandru Turda, Jose Moreno, Miguel González and Marco Sparagna and, in general, to all the Fever engineering team for their confidence and inspiration on a daily basis.
I hope you found this article useful, and don’t hesitate to try some APM solutions in your projects. In addition, if you have found it interesting and would like to work with us and find out more about our engineering department, follow this link.
And of course, keep on coding! ;)