Improving app startup with I/O prefetching

yanwang
Android Developers
Published in
5 min readJul 15, 2020

--

In Android 11, we introduced IORap, a new feature which greatly improves application startup times. We have observed that apps start more than 5% faster (cold startup) on average across a variety of devices. Some hero cases show 20%+ faster startup times. Users get this additional performance without any developer app changes!

IORap prefetching for Android apps

IORap reduces app startup times by predicting which I/O will be required and doing it ahead of time. Many app startups have a lot of time that the IO request queue isn’t being saturated because of blocking I/O. As a result, we aren’t maximizing IO latency. After prefetching the data and compacting the I/O, the app can access this data nearly instantly from the pagecache , significantly reducing app startup latency.

When we evaluated some popular top apps from the Play Store, 80%+ spent 10%+ time in blocking I/O during launch time. while ~50% of the apps even spent 20%+ time. A majority of apps we looked at could benefit from IORap.

IORap works as an independent service on the device. It interacts with package manager, activity manager, perfetto service, etc via IPC. The overall architecture of IORap is shown in the following figure:

Step 1: Collecting perfetto traces

IORap uses a profiling-based strategy to determine the I/O to be prefetched. The knowledge comes from perfetto trace, which records the kernel pagecache page removals/additions (from ftrace). In the first several cold-runs of an app, the perfetto tracing is on to get the pagecache missing events. Our study shows the overhead of perfetto tracing on startup time is neglectable.

Step 2: Generating prefetch list

Based on the perfetto traces obtained from the prior step, IORap generates a prefetch list during the idle time of the device. Basically, the prefetch list contains the information of the file (name, offset, length) that was accessed by an app when it’s launched. IORap analyzes the mm_pagemap events from the perfetto trace and converts its result (inode, offset, length) to (name, offset, length) by reversing inode to filename. Data is then stored in the prefetch list, which is a protobuf file.

Step 3: I/O prefetching

After the prefetch list is generated, IORap can prefetch the corresponding data for the following runs of the app. The perfetto tracing is not needed any more. The user and developer don’t need to do anything. The prefetching is performed when the user taps on the icon or indirectly via another app requesting it via Intent. Enjoy the speedup!

Step 4: Obsoleting the prefetch list

The prefetch list doesn’t live forever. Several events may cause the prefetch list to become obsolete. When an app is updated, the prefetch list is deprecated because the app may change and the previous data may be inaccurate. Also, the dexopt service can optimize the app after installation. Once the app is optimized, the layout may differ making the prefetch list obsolete. The obsolete prefetch list will be removed and a new round will start with perfetto trace collections.

Improvements & Observation

Collating results from several experiments in our lab we determined that IORap benefits cover the spectrum from low end to high end devices. On average, IORap could provide up to ~26% speedup. It’s extremely helpful for apps that have heavy I/O during startup. For example, Spotify shows double digit improvement for both low-end devices (Go and Pixel 3A) and high end-devices (Pixel 3 or 4).

One interesting observation during the experiment is that the performance of IORap is largely impacted by the amount of prefetched data. An accurate trace duration is super important for IORap. A shorter trace duration causes less data than necessary to be prefetched and less performance gain. On the other hand, a longer one leads to more data than necessary being prefetched, which may result in slower startup in worst case scenarios. IORap uses the timestamp of when an app reports the ReportFullyDrawn event to estimate trace duration. For apps not reporting this event, the display time is used. So invoking the reportFullyDrawn callback at the right time can improve the performance of IORap.

Future Development

We’re excited about the improvement that IORap has shown, and we plan to explore this concept more in the future in the following two directions. Firstly, prefetching more often. It would be great if prefetching could be done during profiling. Then we could eliminate some of the performance gap before generating the prefetching list by providing a prebuilt prefetching list. Secondly, IORap could predict that an app will start and begin prefetching earlier, further speeding up startup time.

Conclusion

You can help IORap out by invoking the ReportFullyDrawn callback when your app completes its startup. IORap mainly helps reduce the I/O blocking time, so consider profiling your app startup for other possible performance issues.

--

--