Squeezing Performance from SQLite: Insertions

Sometimes we need to manage a lot of data in our apps. What’s the fastest way to get all of that data into your Android app’s SQLite database?

In this post I will investigate the the options available to you when it comes to inserting a ton of data into your SQLite database, evaluate the performance of each, and discover some best practices along the way.


Methodology

Each experiment involved comparing two or more ways of inserting 1000, 10,000, and 100,000 randomly generated records into two different types of tables:

  • an extremely simple table (simple) consisting of a single integer column, and
  • a more real-world table (tracks) which describes a collection of music tracks where each contains an id, title, duration, lyrics, etc. The id column is this table’s PRIMARY KEY.

I performed every test on my 16GB Nexus 5X, running API level 25 (Nougat).

The results were calculated by tracking the time that had elapsed while all of the inserts for the current size iteration were running. The time taken to connect to the database before and to wipe the table after each iteration was excluded from the results.

You can download the source code from GitHub and build/run the test application yourself, if you feel the urge!


Exploration

db.insert() with and without an explicit transaction

The official training material for Android’s SQLite bindings provides an example of populating a table using the insert()method provided by the SQLiteDatabase object. Unfortunately they don’t give any advice on how to do a lot of inserts at once.

The naive approach for the simple table looks something like this:

ContentValues values = new ContentValues(1);
for (int i = 0; i < numIterations; i++) {
values.put('val', random.nextInt());
db.insert("inserts_1", null, values);
}

I thought that there could be some performance to be gained by running the calls to db.insert() within a transaction, so I made that my first experiment:

db.beginTransaction();
ContentValues values = new ContentValues(1);
for (int i = 0; i < numIterations; i++) {
values.put('val', random.nextInt());
db.insert("inserts_1", null, values);
}
db.setTransactionSuccessful();
db.endTransaction();
Comparing implicit and explicit transactions when inserting data.

Looking at the chart, it’s pretty clear that wrapping a bunch of calls to insert() within a transaction vastly improves performance compared to doing the inserts without a wrapping transaction.

But why is it so much faster?

It turns out that unless you explicitly execute your queries between calls to beginTransaction() and endTransaction(), SQLite itself will wrap every query with an implicit transaction. From the docs:

No changes can be made to the database except within a transaction. Any command that changes the database (basically, any SQL command other than SELECT) will automatically start a transaction if one is not already in effect. Automatically started transactions are committed when the last query finishes.

It’s important to note that SQLite only writes the inserts to disk once the transaction has been committed. Therefore, if you can minimize the number of transactions (regardless of whether they’re explicitly or implicitly started), you will minimize disk access and maximize performance. Converting the data in the chart to records inserted per second:

  • Without an explicit transaction: ~75 track records per second.
  • With an explicit transaction: ~950 track records per second.

That’s more than a 10x improvement, with only 3 lines of code!

Now that we know using transactions is such a huge advantage, we’ll use them from here on out as we play with other ways to insert data.

db.execSQL()

Another method exposed by SQLiteDatabase that will let us insert data into our tables is db.execSQL(String, Object[]). It’s provided as a bare-bones way to execute non-selection statements. Here’s our experiment code:

db.beginTransaction();
Object[] values = new Object[1];
for (int i = 0; i < numIterations; i++) {
values[0] = random.nextInt();
db.execSQL("INSERT INTO inserts_1 (val) VALUES (?)", values);
}
db.setTransactionSuccessful();
db.endTransaction();
Comparing insert() and execSQL()

In this experiment, by using db.execSQL() we were able to improve our records per second metric slightly:

  • db.insert(): ~850 track records per second.
  • db.execSQL(): ~925 track records per second.

It makes perfect sense if you think about it; db.insert() is essentially syntactic sugar which abstracts away the creation of the SQL statement for you. That abstraction layer, while not super expensive, isn’t free.

Batched Inserts with db.execSQL()

We got some improved performance working with raw statements and db.execSQL(), so I thought now that we’re constructing out our statements ourselves: what if we insert more than one record at a time?

I had originally discounted the idea thinking that because SQLite is an in-process database engine, we wouldn’t necessarily be saving anything by batching inserts (unlike with database servers, where you incur network latency with each statement).

My first attempt at putting together a batched insert:

db.beginTransaction();
Object[] values = new Object[numIterations];
StringBuilder valuesBuilder = new StringBuilder();
for (int i = 0; i < numIterations; i++) {
if (i != 0) {
valuesBuilder.append(", ");
}
values[i] = mRandom.nextInt();
valuesBuilder.append("(?)");
}


db.execSQL(
"INSERT INTO inserts_1 (val) VALUES "+valuesBuilder.toString(),
values
);

db.setTransactionSuccessful();
db.endTransaction();
Oops.

Too many SQL variables?!
It turns out that in the SQLite source code, they place a hard limit on the number of variables that are allowed within a prepared statement. From sqlite3.c:

/*
** The maximum value of a ?nnn wildcard that the parser will accept.
*/
#ifndef SQLITE_MAX_VARIABLE_NUMBER
# define SQLITE_MAX_VARIABLE_NUMBER 999
#endif

If you don’t mind being a little bit devious, it’s not too hard to work around this issue with a little bit of finagling:

db.beginTransaction();
doInsertions(db, numIterations);
db.setTransactionSuccessful();
db.endTransaction();
// ... elsewhere in the class ...
void doInsertions(SQLiteDatabase db, int numInsertions) {
if (total > 999) {
doInsertions(db, numInsertions - 999);
numInsertions = 999;
}

Object[] values = new Object[numInsertions];
StringBuilder valuesBuilder = new StringBuilder();

for (int i = 0; i < numInsertions; i++) {
if (i != 0) {
valuesBuilder.append(", ");
}
values[i] = mRandom.nextInt();
valuesBuilder.append("(?)");
}

db.execSQL(
"INSERT INTO inserts_1 (val) VALUES "
+valuesBuilder.toString(),
values
);
}

Using recursion makes it super easy, but if that’s not your style or it won’t work for your situation, a loop is cool too. I guess.

One-by-one inserts vs. Batched insert statements.

Wow! It turns out even though we don’t save on any network latency with the batched insert, we do get a fair bit of additional performance out of not having to do so many individual statements.

One thing to note, however: the more columns your table has, the less of a benefit you get out of batching your inserts. This is because the number of records you can insert per statement is equal to 999 / # of columns. The tracks table has 9 columns, which means each batched insert call to db.execSQL() can only insert 111 records. Meanwhile, the simple case only has one column — allowing for 999 records to be inserted per batch.

Running the throughput numbers again gives us:

  • One-by-one inserts: ~1400 track records per second.
  • Batched inserts: ~1800 track records per second.

Using SQLiteStatement Directly

After I had shown my first draft of this blog post a friend of mine from #AndroidChat, he kindly pointed out one more experiment I should run:

What if, instead of only using the methods available from SQLiteDatabase, I tried using the underlying SQLiteStatement class directly?

The thinking was along the same lines as the logic to go from insert() to execSQL(): cut out the middle man wherever you can. ASQLiteStatement is used under the covers when you call either of those two methods, so it would make sense that using the statement object directly could speed things up.

Also, if we can re-use an SQLiteStatement object over and over, we might be able to see some more performance gains due to not having to create so many objects. (thus: fewer situations where the garbage collector will be likely to rear its ugly head)

I wrote up two versions of test cases that use SQLiteStatement: one that executes the same single-record insert statement every time, and one that would re-use a batch insert statement.

The code for single-record/one-by-one insert with SQLiteStatement:

SQLiteStatement stmt = db.compileStatement(
"INSERT INTO inserts_1 (val) VALUES (?)"
);

db.beginTransaction();
for (int i = 0; i < numIterations; i++) {
stmt.bindLong(1, random.nextInt());
stmt.executeInsert();
stmt.clearBindings();

}
db.setTransactionSuccessful();
db.endTransaction();

The code for batch insert, using the recursion trick from before stores statements in a HashMap-based, size-indexed cache:

Map<Integer, SQLiteStatement> statementCache = new HashMap<>();

db.beginTransaction();
doInsertions(db, numInsertions, statementCache);
db.setTransactionSuccessful();
db.endTransaction();

// ... elsewhere in the class ...

void doInsertions(SQLiteDatabase db, int numInsertions,
Map<Integer, SQLiteStatement> statementCache) {
if (numInsertions > 999) {
doInsertions(db, numInsertions - 999, statementCache);
total = 999;
}
SQLiteStatement stmt;
if (statementCache.containsKey(numInsertions)) {
stmt = statementCache.get(numInsertions);
} else {
StringBuilder valuesBuilder = new StringBuilder();
for (int i = 0; i < numInsertions; i++) {
if (i != 0) {
valuesBuilder.append(", ");
}
valuesBuilder.append("(?)");
}
stmt = db.compileStatement(
"INSERT INTO inserts_1 (val) VALUES "
+ valuesBuilder.toString()
);
statementCache.put(numInsertions, stmt);
}

for (int i = 0; i < numInsertions; i++) {
stmt.bindLong(i+1, random.nextInt());
}

stmt.executeInsert();
stmt.clearBindings();

}
Comparing batch inserts with execSQL() to using SQLiteStatement directly

From the charts, it’s pretty clear that re-using single-record insert SQLiteStatement objects alone doesn’t beat batched inserts using db.execSQL(). However, applying the paradigm of re-using the statement objects to the world of batched inserts does seem to provide a bit of improvement in performance at the cost of some added complexity in the codebase.


Conclusion

The number one most important thing to take away from this post is that explicitly wrapping your inserts in a transaction makes for a massive and unmistakeable improvement in performance.

After that, you can get some reasonable speed improvements by using db.execSQL() with batched insert statements, especially if the number of columns your table has is small.

Finally, if you’re really wanting to squeeze the most out of SQLite while inserting data: consider re-using SQLiteStatement batch-insert objects directly to do some more middleman cutting-out.


P.S. I’d like to do some more research into how to squeeze the best performance out of SQLite and Android, maybe this can turn into a series. If you’ve got a particular suggestion for a topic, leave it in the comments!

P.P.S. If you were paying attention, you might have noticed some funkiness going on with my numbers in the charts as well as the records per second breakdowns. They fluctuate kind of widely between tests. For example: the time it took to insert 100,000 track records within a transaction using db.insert() in the first two experiments went from 91.552 seconds all the way up to 145.607 seconds.

While I freely admit this is somewhat concerning and I’m not yet completely sure of the cause of such a variance, I do feel confident the constant trend towards improved records-per-second scores is real.

I’d love to hear from you about your experiences running the test app, and would be fascinated to know if you’ve got insight into the issue. Maybe it was just that my tests weren’t isolated enough and my Nexus was could’ve been receiving too many push notifications from GMail and Facebook in the background or something...