Profile guided optimization for native Android applications
Posted by Pirama Arumuga Nainar, Software Engineer
Profile-guided optimization (PGO) is a well known compiler optimization technique. In PGO, runtime profiles from a program’s executions are used by the compiler to make optimal choices about inlining and code layout. This leads to improved performance and reduced code size. Developers can now leverage Google’s toolkit to easily deploy PGO tools and improve their native Android apps.
On selected Android system components, enabling PGO improved performance by 6–8%. PGO also provided code-size improvements in one component while slightly increasing the code size of the other two components.
PGO can be deployed to your application or library with the following steps:
- Identify a representative workload.
- Collect profiles.
- Use the profiles in a Release build.
Step 1: Identify a Representative Workload
First, identify a representative benchmark or workload for your application. This is a critical step as the profiles collected from the workload identify the hot and cold regions in the code. When using the profiles, the compiler will perform aggressive optimizations and inlining in the hot regions. The compiler may also choose to reduce the code size of cold regions while trading off performance.
Identifying a good workload is also beneficial to keep track of performance in general.
Step 2: Collect Profiles
The profiles are collected by running the workload from step 1 on an instrumented build of the application. To generate an instrumented build, add
-fprofile-generate to the compiler and linker flags. This flag should be controlled by a separate build variable since the flag is not needed during a default build.
Profiles get collected when the instrumented binary is run and get written to a file at exit. However, functions registered with
atexit are not called in an Android app — the app just gets killed. The application/workload has to explicitly trigger a profile write by calling the
Writing the profile file is simpler if the workload is a standalone binary — just set the LLVM_PROFILE_FILE environment variable before running the binary.
The profile files are in the
.profraw format. Use the
llvm-profdata utility in the NDK to convert from
.profdata, which can then be passed to the compiler.
clang from the same NDK release to avoid version mismatch of the profile file formats.
Step 3 Use the Profiles to Build Application
Use the profile from the previous step during a release build of your application by passing
-fprofile-use=<>.profdata to the compiler and linker. The profiles can be used even as the code evolves — the Clang compiler can tolerate slight mismatch between the source and the profiles.
dex2oat” is Android’s on-device AOT compiler. To get a representative workload for
dex2oat, we randomly selected 25 of the top 100 most-installed apps in the Play store. We also randomly generated dex2oat’s compilation options.
To generate PGO profiles, we built a PGO-instrumented
dex2oat binary and used it to compile the workload. We then generated a release-build of
dex2oat that uses these PGO profiles and evaluated performance gains on the remaining 75 of the 100 most-installed apps.
We leveraged the test infrastructure available to the Android team to automate the collection of these PGO profiles so they can be easily kept up-to-date.
PGO is a very useful performance optimization technique. After an initial setup of workloads and integration in the build process, it delivers impressive performance improvements with minimal upkeep.
Here are a few other topics that can help improve performance of Android apps: