Profile guided optimization for native Android applications
Posted by Pirama Arumuga Nainar, Software Engineer
Profile-guided optimization (PGO) is a well known compiler optimization technique. In PGO, runtime profiles from a program’s executions are used by the compiler to make optimal choices about inlining and code layout. This leads to improved performance and reduced code size. Developers can now leverage Google’s toolkit to easily deploy PGO tools and improve their native Android apps.
On selected Android system components, enabling PGO improved performance by 6–8%. PGO also provided code-size improvements in one component while slightly increasing the code size of the other two components.
PGO can be deployed to your application or library with the following steps:
- Identify a representative workload.
- Collect profiles.
- Use the profiles in a Release build.
Step 1: Identify a Representative Workload
First, identify a representative benchmark or workload for your application. This is a critical step as the profiles collected from the workload identify the hot and cold regions in the code. When using the profiles, the compiler will perform aggressive optimizations and inlining in the hot regions. The compiler may also choose to reduce the code size of cold regions while trading off performance.
Identifying a good workload is also beneficial to keep track of performance in general.
Step 2: Collect Profiles
The profiles are collected by running the workload from step 1 on an instrumented build of the application. To generate an instrumented build, add -fprofile-generate
to the compiler and linker flags. This flag should be controlled by a separate build variable since the flag is not needed during a default build.
Profiles get collected when the instrumented binary is run and get written to a file at exit. However, functions registered with atexit
are not called in an Android app — the app just gets killed. The application/workload has to explicitly trigger a profile write by calling the __llvm_profile_write_file
function.
Writing the profile file is simpler if the workload is a standalone binary — just set the LLVM_PROFILE_FILE environment variable before running the binary.
The profile files are in the .profraw
format. Use the llvm-profdata
utility in the NDK to convert from .profraw
to .profdata
, which can then be passed to the compiler.
Use the llvm-profdata
and clang
from the same NDK release to avoid version mismatch of the profile file formats.
Step 3 Use the Profiles to Build Application
Use the profile from the previous step during a release build of your application by passing -fprofile-use=<>.profdata
to the compiler and linker. The profiles can be used even as the code evolves — the Clang compiler can tolerate slight mismatch between the source and the profiles.
Case Study
“dex2oat
” is Android’s on-device AOT compiler. To get a representative workload for dex2oat
, we randomly selected 25 of the top 100 most-installed apps in the Play store. We also randomly generated dex2oat’s compilation options.
To generate PGO profiles, we built a PGO-instrumented dex2oat
binary and used it to compile the workload. We then generated a release-build of dex2oat
that uses these PGO profiles and evaluated performance gains on the remaining 75 of the 100 most-installed apps.
We leveraged the test infrastructure available to the Android team to automate the collection of these PGO profiles so they can be easily kept up-to-date.
Conclusion
PGO is a very useful performance optimization technique. After an initial setup of workloads and integration in the build process, it delivers impressive performance improvements with minimal upkeep.
Here are a few other topics that can help improve performance of Android apps:
- Link-time Optimization: LTO + PGO is better than each individually.
- Cloud Profiles for Java apps