Improving app startup with I/O prefetching
In Android 11, we introduced IORap, a new feature which greatly improves application startup times. We have observed that apps start more than 5% faster (cold startup) on average across a variety of devices. Some hero cases show 20%+ faster startup times. Users get this additional performance without any developer app changes!
IORap prefetching for Android apps
IORap reduces app startup times by predicting which I/O will be required and doing it ahead of time. Many app startups have a lot of time that the IO request queue isn’t being saturated because of blocking I/O. As a result, we aren’t maximizing IO latency. After prefetching the data and compacting the I/O, the app can access this data nearly instantly from the
pagecache , significantly reducing app startup latency.
When we evaluated some popular top apps from the Play Store, 80%+ spent 10%+ time in blocking I/O during launch time. while ~50% of the apps even spent 20%+ time. A majority of apps we looked at could benefit from IORap.
IORap works as an independent service on the device. It interacts with package manager, activity manager,
perfetto service, etc via IPC. The overall architecture of IORap is shown in the following figure:
Step 1: Collecting perfetto traces
IORap uses a profiling-based strategy to determine the I/O to be prefetched. The knowledge comes from
perfetto trace, which records the kernel
pagecache page removals/additions (from ftrace). In the first several cold-runs of an app, the
perfetto tracing is on to get the
pagecache missing events. Our study shows the overhead of
perfetto tracing on startup time is neglectable.
Step 2: Generating prefetch list
Based on the
perfetto traces obtained from the prior step, IORap generates a prefetch list during the idle time of the device. Basically, the prefetch list contains the information of the file (name, offset, length) that was accessed by an app when it’s launched. IORap analyzes the
mm_pagemap events from the
perfetto trace and converts its result (
inode, offset, length) to (name, offset, length) by reversing
inode to filename. Data is then stored in the prefetch list, which is a
Step 3: I/O prefetching
After the prefetch list is generated, IORap can prefetch the corresponding data for the following runs of the app. The
perfetto tracing is not needed any more. The user and developer don’t need to do anything. The prefetching is performed when the user taps on the icon or indirectly via another app requesting it via Intent. Enjoy the speedup!
Step 4: Obsoleting the prefetch list
The prefetch list doesn’t live forever. Several events may cause the prefetch list to become obsolete. When an app is updated, the prefetch list is deprecated because the app may change and the previous data may be inaccurate. Also, the
dexopt service can optimize the app after installation. Once the app is optimized, the layout may differ making the prefetch list obsolete. The obsolete prefetch list will be removed and a new round will start with
perfetto trace collections.
Improvements & Observation
Collating results from several experiments in our lab we determined that IORap benefits cover the spectrum from low end to high end devices. On average, IORap could provide up to ~26% speedup. It’s extremely helpful for apps that have heavy I/O during startup. For example, Spotify shows double digit improvement for both low-end devices (Go and Pixel 3A) and high end-devices (Pixel 3 or 4).
One interesting observation during the experiment is that the performance of IORap is largely impacted by the amount of prefetched data. An accurate trace duration is super important for IORap. A shorter trace duration causes less data than necessary to be prefetched and less performance gain. On the other hand, a longer one leads to more data than necessary being prefetched, which may result in slower startup in worst case scenarios. IORap uses the timestamp of when an app reports the
ReportFullyDrawn event to estimate trace duration. For apps not reporting this event, the display time is used. So invoking the
reportFullyDrawn callback at the right time can improve the performance of IORap.
We’re excited about the improvement that IORap has shown, and we plan to explore this concept more in the future in the following two directions. Firstly, prefetching more often. It would be great if prefetching could be done during profiling. Then we could eliminate some of the performance gap before generating the prefetching list by providing a prebuilt prefetching list. Secondly, IORap could predict that an app will start and begin prefetching earlier, further speeding up startup time.
You can help IORap out by invoking the
ReportFullyDrawn callback when your app completes its startup. IORap mainly helps reduce the I/O blocking time, so consider profiling your app startup for other possible performance issues.