Improving Android app performance with Baseline Profiles

Published in

Kaspersky

7 min readJul 3, 2024

Performance is a core element of user experience, which is why developers make every effort to speed up their applications. For security software, initialization time is vital: a couple of seconds can be fatal for the user. That’s why at Kaspersky we closely monitor the performance of our applications, analyze the impact of new features on key metrics, and regularly improve them.

In the majority of cases, better performance is achieved through source code optimization: first, the developer uses metrics, utilities and tools to locate bottlenecks, then explores the code to find and fix issues. Sometimes Google provides additional optimization options. One such feature, Baseline Profile, was announced quite recently. Sure, such enhancements are great for developers and applications alike, so we’re eagerly getting to grips with this new offering and applying it to our project.

In this post, I want to share our experience and results. First, let’s briefly recap the types of compilation in Android, and understand the principle on which this optimization is based. Next, we’ll go through the step-by-step guide for integration into our project, and look at the results. At the end, I’ll talk about our further steps and plans.

How it works

To understand the principle behind Baseline Profile, let’s recall the types of compilation in Android. After building an application, we get an .apk file that contains .dex files. Inside these is bytecode, understandable to the interpreter. Android Runtime translates bytecode into machine code. This conversion can happen in several ways.

In the case of Dalvik, because devices were manufactured with little random access memory (RAM), optimization was aimed at reducing its usage. This entailed just-in-time (JIT) compilation, that is, compilation at runtime. Instead of compiling the whole application, only certain sections of the code were compiled. And since all compilation takes place at runtime, this had a negative impact on performance.

Dalvik's successor, ART, took a different approach: ahead-of-time (AOT) compilation. Now, at application startup, all the code is precompiled. As a result, we get a performance gain, but also a larger memory footprint and long application installation and system update times.

Partial compilation represents a compromise. By default, bytecode is JIT-precompiled, but if there are code sections that get used frequently, these are AOT-compiled. AOT compilation uses the dex2oat utility, saving the output as .oat binaries. The result is a hybrid compilation scheme.

**Original**: https://source.android.com/devices/tech/dalvik/jit-compiler

But this scheme has problems too. AOT compilation occurs after some time (after running the application several times and going through a number of user scenarios). Therefore, initial impressions may be spoiled by low performance. And it’s often this first experience that shapes the user’s attitude toward the application. That’s why ART now offers profiles. Two types are available.

The first is a cloud-based profile. During application use, Google collects and analyzes runtime scenarios, finds frequently used areas and sends this information to the server. This data is averaged across multiple users to create a single profile that is delivered to new users for AOT compilation at first installation.

The second is a Baseline Profile. This lets you specify to the compiler which sections of code you want to precompile on user devices.
These profiles form the initial cache, which allows to speed up startup time and performance.

Integration into the product

Step-by-step instructions for generating a Baseline Profile are available here on the official Android website. In brief, the process is as follows: we generate a profile, extract it, add it to the project, rebuild and measure it.

To generate and extract a profile, you first need to get root and put a userdebug build on the device. Google says that profile generation and measurement are doable on an AOSP emulator, but when it comes to improving performance, I always prefer a real device. That said, if you don't have the right physical device, take Google's advice and create an emulator without the Google API. If kinks arise, the profile can be generated on the emulator and measured on a real device.

The guide explains how to generate a profile in its most basic scenario—"tap the Home button, start the application and wait for it to open." What we have here, in fact, is an instrumented test able to be customized to reproduce various user scenarios. I recommend using the guide's baseline scenario for the first measurements, and implementing your own tests for further improvements.

To generate a profile, you need to connect the Profileinstaller library. At the time of posting, this is in beta, and the profile generation instructions are incomplete.

Thus, information is lacking about how optimization and various obfuscators, in particular Proguard, interact with each other. This was one of the first pitfalls we encountered. Having generated our first profile, we added it to the project and rebuilt the code, but saw no gain in performance. This puzzled us. According to the guide, the baseline-prof.txt file should contain methods and classes recommended for AOT precompilation. On closer inspection of the profile contents, we noticed that most of our classes were obfuscated. Profile generation should be performed after adding the -dontobfuscate flag to the Proguard rules. I found a related request on Issue Tracker Google, and hope this information gets added to the main guide soon. My tip would be to do the first measurements without obfuscation of the application itself. The fewer additional steps you have in the build, the easier it will be to achieve the expected result at intermediate stages. The downside will be a rise in the number of actions and deadlines for the task as a whole; so it's for you to decide.

After generating a profile, I recommend viewing and analyzing it. You might feel the need to replace something there. Google advises against overloading the profile (according to the official recommendation, Baseline Profiles may not be more than 1.5 MB in compressed form).

The final measurements can be made either by automated testing or manually. If testing manually, I recommend checking the dexopt state status after each installation:

adb shell dumpsys package dexopt | grep -A 1 $PACKAGE_NAME

The result will help understand what type of compilation was used, and catch any errors in your actions before measuring starts. There are four compilation statuses. See here for more details. For installation with a Baseline Profile, the status should be speed-profile. This status indicates that the Baseline Profile was successfully applied.

When the application is installed, the compilation status can be changed by recompiling the package, specifying the desired type in the arguments. Remember that you need root to run these commands. As an experiment, I recommend everyone to try different compilation modes:

Profile-based:

adb shell cmd package compile -m speed-profile -f my-package

This command starts compilation of the specified package with the profile applied.

Full:

adb shell cmd package compile -m speed -f my-package

This command starts compilation of all methods for the specified package.

Reset:

adb shell cmd package compile --reset my-package

This command resets compilation for the specified package.

Our results

Before moving on to our results, let’s explain what, why and how we measured, what additional code protection mechanisms we use, and why this is important.

Besides the recommended Time to initial display (TTID) metric, we also looked at the time taken to fully initialize the application. By full initialization we mean the time from calling the Application#onCreate() method to the end of the chain of initialization of all features and the starting logic for checking license status, authorization, and so on. This initialization does not occur in the main thread, but its duration determines the functionality and time of displaying current statuses in the application.

Above we talked about code obfuscation. Our project uses an additional code protection mechanism that obfuscates and encrypts important classes and files whose logic we decided needed more protecting. As a result, some classes are re-obfuscated, and some removed from the .dex files and loaded at runtime. This means that the Baseline Profile will be applied not in full. As such, the table is divided according to the principle: with/without Baseline Profile and with/without additional protection. The table shows measurements for our metrics: Time to full background initialization and Time to initial display. The second-to-last line shows the average values in bold, and the last line the performance gain in comparison with the corresponding metrics from the right-hand columns.

Results for build without additional code protection:

Speedup gain of 12% and 33% for App Init and App startup, respectively.

Results for build with additional code protection:

Speedup gain of 7% and 17% for App Init and App startup, respectively.

What next?

Performance improvement is often a long and low-yield process. But Baseline Profiles gave us a decent boost for a small investment.

After pretty good local results, we plan to roll out the changes and look at the metrics on user devices. These results may be worse than the local ones due to various reasons. One of them is Cloud Profile. Therefore, we shall analyze the intersection of Cloud and Baseline Profiles. We also plan to include profile generation in our CI/CD to get an up-to-date profile for each new release build. This is important for projects under active development. Profiles that are not updated periodically may become less effective, because user scenarios and class/method names may both change. If you’re interested and want to explore the topic deeper, experiment with different scenarios and screens, find weak spots and speed up our application even more, come and join our team :)

Useful links:

Implementing ART Just-In-Time (JIT) Compiler
Baseline Profiles guide
Writing a Macrobenchmark