Large Database Queries on Android

Windows of Opportunity

SQLite is a great way to persist many thousands of items of data on Android, but presenting these huge data sets in UI has historically been difficult, and can lead to performance issues. Before starting the new Paging Library, we investigated existing paging approaches in the platform, especially the potential pitfalls in SQLiteCursor.

In this blogpost, we’ll go over its problems, and why that motivates us to use small queries with the Room Persistence and Paging libraries in the Android Architecture Components.

SQLiteCursor and CursorAdapter

SQLiteCursor is the return type for an Android SQLite database query. It allows you to view large query results with a fixed initial loading cost. 
The first read initializes a CursorWindow, a buffer of rows typically 2MB in size, with content from the database. The SQLiteCursor refreshes this window each time you request a row that isn’t present. In this way, SQLiteCursor implements paging, with a fixed page size.

CursorAdapter has been around since Android API 1, and has provided a simple way to bind data from a Cursor (typically a SQLiteCursor) to items in a ListView. While it serves this function fine, it queries the database directly on the UI thread any time a new load is needed. That in and of itself isn’t acceptable for a modern and responsive app. So one might ask: can’t we just have a Cursor-based adapter that loads on a background thread? SQLiteCursor has paging built right in, after all.

Problems with paging within SQLiteCursor

Much of the problem with paging within SQLiteCursor comes from surprising behavior as it uses its window to page content in. The following is a list of challenges we discovered when experimenting with the internal paging of SQLiteCursor for the Paging Library:

SQLiteCursor doesn’t hold any database transaction open

When I started to investigate paging, I was inexperienced with SQLite and especially Cursors on Android. I had simply assumed that SQLiteCursor would, after loading a window, pause its query, to be resumed when the next window was needed. That way, accessing the 10th window would be just as efficient as the 1st. This is incorrect. Each time a new window is read, the query restarts from position 0 and skips rows which aren’t needed to fill the window, one at a time. This is because SQLiteCursor can’t resume a query.

It’s like accessing items 1000 through 1050 in a linked list — you have to skip a large number of items to access the next page to load. As you load, each subsequent window has to skip past more and more of the query, which slows it down. This is equivalent using the SQL OFFSET keyword to skip content, which isn’t the most efficient way to page content, but cannot be avoided when relying on paging within a SQLiteCursor. You can see how SQLiteCursor pages in a new window here.

SQLiteCursor.getCount() is required, and scans entire query

Before reading the very first row, SQLiteCursor calls getCount() for bounds checking. Since SQLite must scan the entire result set of a query to count it (again, like a linked list), this can add significant overhead. If you’re paging in a large query to a UI gradually, in response to the user scrolling, you may not need to know the entire size of the query, so the count adds unnecessary upfront work.

SQLiteCursor.getCount() always loads the first window of rows

As part of computing the count, while scanning the result set, SQLiteCursor proactively fills its window from position 0, on the assumption that the first items in the query will be needed.

It preloads these items so that it can know roughly how many rows fit into a window ahead of time (more on this below). This opportunistic loading is reasonable if you’re presenting data from the start of the query, but a position being restored from saved instance state may start the index much further down the list, where the initial window isn’t relevant. If you’d like to present data from the third window of content, you’re forced to load and throw away 2MB of data first. The code for this counting behavior is here.

SQLiteCursor may load data you didn’t ask for

Cursor.moveToPosition() guarantees that the requested row is in the window, but the SQLiteCursor doesn’t start filling at the requested row. Because SQLiteCursor doesn’t assume the app is reading forward, it starts filling its window when it’s about ⅓ of a window away from the target position. This means a CursorAdapter scrolling back a few rows after a window load doesn’t trigger another window load. It also means that each 2MB of data loaded after the first loads 650KB or more of data you’ve already seen. You can see the code and explanation for this behavior here.

SQLiteCursor loading position can be unpredictable

When SQLiteCursor attempts to load a target position, it tries to start ⅓ of a window early. This means it must guess how many rows fit in a window. To do this, it uses the number of rows filled in its first window load. Unfortunately, this means that if your rows are of varying sizes (e.g. if you’re embedding user comment Strings of arbitrary length), its guess can be wrong. The SQLiteCursor can undershoot the target position, filling a window with content — then discarding it all, and start filling again. If, for example, you’re scanning through a long query and get to a row that requires a window reload, it’s possible that the load will only capture a trivial number of new rows. The clear-window and restart code is here.

Cursors require closure

Cursor must be closed using the close() method, so wherever they are stored, there must be code that cleans them up when they’re no longer needed. CursorAdapter notably doesn’t help with this, passing that responsibility to the app developer. To store and reuse a Cursor requires the developer to write code to handle events like the activity stopping.

SQLiteCursor does not know data has changed

SQLiteCursor doesn’t track if the database has changed after the first window read (and first count). This means that if some items are added or removed, the SQLiteCursor’s cached size is incorrect — a problem both for bounds checking and if you want your loaded data to look consistent. This can cause an exception when moving to a row that no longer exists, or inconsistent data in some circumstances. For example, if you’ve loaded row N already and a new item is inserted at position 0, and you then try to read row N+1, you’ll end up loading the old row N a second time.

Avoiding the Problems

The above problems tell us that SQLCursor doesn’t scale to queries with many thousands of results. Fortunately, these problems all have a single simple workaround: small queries. Queries that fit within a single CursorWindow avoid all of the problems, which is why we favor them so strongly in Paging and Room. It’s common to configure a page size of just ten to twenty items, and to query only that many items at once.

There’s much that goes into picking a page size though — larger queries up to a window generally improve performance, and smaller improve latency and memory. Ten items may make sense for tall list items where the DB isn’t a bottleneck, while 300 may be better if your list items are small tiles, or your queries are expensive.

If you are relying on the internal paging of SQLiteCursor to lazily load a much larger set of results, we recommend that you switch to another approach. Either use the new Paging Library and its integration with the Room Persistence Library, or use a custom implementation where you handle the paging yourself and ensure that your query results are small enough to fit in a single CursorWindow.

To page a large SQL query result with small queries in Room using the new Paging Library, you can change:

@Dao
interface UserDao {
// regular list query — falls over with too much data
@Query(“SELECT * FROM user ORDER BY mAge DESC”)
LiveData<List<User>> loadUsersByAgeDesc();
}

into:

@Dao
interface UserDao {
// paged query — handles arbitrarily large queries
@Query(“SELECT * FROM user ORDER BY mAge DESC”)
LivePagedListProvider<Integer, User> loadUsersByAgeDesc();
}

which you then provide with paging parameters to get a LiveData<PagedList> that handles arbitrarily large result sets:

LiveData<PagedList<User>> users = userDao.usersByLastName()
.create(/*initial load position*/ 0, /*page size*/ 20);

In the above code, we get LiveData versions of the paged query results which will also update any subscribed Observers when the database changes. To learn more about paging from SQLite using the architecture components, see the Paging introduction, and the Paging sample on Github.

Android Platform

We’re currently looking at improving some of the behaviors of SQLiteCursor in a future Android release. For example, perhaps both the CursorWindow size, as well as the ⅓ of a window heuristic should both be configurable, since modern usage patterns look very different than when SQLiteCursor and CursorAdapter were designed. When we do make changes, we’ll be sure to update the Room Persistence Library to use these new options to keep queries as efficient as possible.