Squeezing Performance from SQLite: EXPLAINing the Virtual Machine

The virtual machine? Yep, but not the Java Virtual Machine. In this post I will focus on providing you with a basic understanding SQLite’s “Virtual DataBase Engine” or VDBE.

My “Squeezing Performance from SQLite” series is primarily meant for Android engineers, but this post in particular will dive into SQLite itself and the topics discussed will hold true for all developers who use it.


SQLite is a Virtual Machine

I used to be under the impression that SQLite parsed and ran statements like an interpreter, but it turns out that’s not quite the case. While looking for SQLite’s version of MySQL’s EXPLAIN, I stumbled across documentation that described the way SQLite operates:

SQLite works by translating SQL statements into bytecode and then running that bytecode in a virtual machine. — SQLite.org

The SQLite virtual machine is called the “Virtual DataBase Engine”, or the “VDBE” for short.

Incredibly simplified depiction of SQLite executing a Statement.

You might already know what a bytecode program in SQLite more commonly goes by: “prepared statement”. Also, just like most programs: prepared statements can take input (? variables).

The bytecode program is a binary listing of instructions which each consist of an opcode and parameter values. Each opcode corresponds to a particular command that the VDBE knows how to process, and when being processed can operate on data contained within a bank of registers within the virtual machine. According to the official documentation, the number of registers is finite but can be quite large and depends on how SQLite was configured at compile time.

In the remainder of this article we’ll explore how SQL statements are processed by SQLite into bytecode programs, and then I’ll show you how you can examine the bytecode your statements get compiled into.


Preparing a Statement

When you ask SQLite to prepare a statement, your painstakingly handcrafted SQL is dissected (parsed), analyzed (query planned), and boiled down (compiled) into a bytecode program SQLite’s VDBE is capable of executing.

Tokenizing & Parsing

Just like any programming language, SQL starts as a bunch of text. To get from a string of text into something SQLite can understand, that text needs to be broken down and understood. This is what we call parsing.

SQLite’s parsing approach is made abundantly clear throughout the official documentation. For example, if you’ve ever ended up at sqlite.org while looking for how you are supposed to write an INSERT statement — you’ve likely seen a diagram that explains a part of how SQLite’s parser works:

INSERT syntax diagram (from sqlite.org)

Diagrams like the one above are a way of visualizing the Backus-Naur Form (BNF) description of the SQL grammar that SQLite understands. Reading the syntax diagrams is a pretty straightforward process when you know what to look for:

  • Ovals with text in ALL CAPS are keywords.
  • Ovals with lower-case-and-hypenated text represent reusable grammar clauses (their syntax diagrams will be listed further down the page in the documentation).
  • Ovals with punctuation like parentheses, commas, etc are literal tokens that are required.
  • The arrows that connect the ovals tell you the order in which the keywords, clauses, and tokens need to appear to construct a valid statement.

Query Planning

After the statement has been parsed into its component parts, SQLite needs to decide how to approach executing the statement.

For any given SQL statement, there might be hundreds or thousands or even millions of different algorithms of performing the operation. All of these algorithms will get the correct answer, though some will run faster than others. The query planner is an AI that tries to pick the fastest and most efficient algorithm for each SQL statement. — SQLite.org

Understanding how SQLite determines the best way to execute your statements is a big enough topic to warrant its own post. However, for this article it’s just important to know that there is an optimization step between parsing and compilation.

Compilation

Finally: after SQLite has determined how to best approach running your statement, it puts together a list of low-level bytecode instructions that describe the whole operation. That list of instructions is, quite literally, a program which will be run on the VDBE. A more popular name for the program compiled by SQLite is “prepared statement”.

Each bytecode instruction consists of an opcode (name of the instruction) and up to 5 parameters (input values or references to registers). In modern SQLite versions, there are over a hundred different types of instructions. They run the gamut between simple control-flow instructions like Eq: “jump to an instruction if two registers have the same value” and more database-specific instructions like ResultRow: which provides data to the database cursor at the current position, pointing at values which have been loaded into the VDBE’s registers.

Once compiled into bytecode, a prepared statement does not need to be parsed or to go through the query planning process again. This is precisely what makes re-using prepared statements so fast when compared with opting not to re-use them.


“EXPLAIN”

Now that you know how your raw statement becomes a prepared statement, you may be wondering if it’s possible to examine the bytecode program that was compiled.

SQLite provides a mechanism you can use to inspect the bytecode it would generate for any statement. In order to see the bytecode, just prepend your statement with EXPLAIN. From the documentation:

When the EXPLAIN keyword appears … it causes the statement to behave as a query that returns the sequence of virtual machine instructions it would have used to execute the command had the EXPLAIN keyword not been present. 
SQLite.org

EXPLAIN's output is a series of rows where each row is an instruction in the prepared statement bytecode and each has 8 columns:

  • addr — The instruction address.
  • opcode — The instruction’s opcode.
  • p1 through p5 — Parameter values for the instruction.
  • comment — Comments explaining what the instruction is doing.

That last column: comment will likely be empty unless you compile SQLite yourself and set the -DSQLITE_ENABLE_EXPLAIN_COMMENTS option.

Explaining “EXPLAIN”, by Example

Let’s use EXPLAIN to take a look at the bytecode for a few statements. Because my install didn’t have the comments turned on, I’ll leave that out. Each example starts with the output you receive when you use EXPLAIN, followed by a walk-through of the flow that the VDBE takes through the bytecode program.

(Note: all bytecode shown here is from SQLite 3.16.2, and may not be the same in other versions.)

“Hello World”

sqlite> EXPLAIN SELECT "hello world";
addr opcode p1 p2 p3 p4 p5
---- ------------- ---- ---- ---- ------------- --
0 Init 0 1 0 00
1 String8 0 1 0 hello world 00
2 ResultRow 1 1 0 00
3 Halt 0 0 0 00

Sorry, I had to.

  1. Init 0 1 0 — Denotes the start of the program and moves to address 1.
  2. String8 0 1 0 'hello world' — “hello world” is stored in register 1.
  3. ResultRow 1 1 0 — Let the cursor know that the register 1 contains a row of output.
  4. Halt 0 0 0 — Exit with error code 0 (everything is okay).

CREATE TABLE

sqlite> EXPLAIN CREATE TABLE blog (
title TEXT NOT NULL,
author TEXT NOT NULL,
pub_date INTEGER,
body TEXT
);
addr opcode p1 p2 p3 p4 p5
---- ------------- ---- ---- ---- ------------- --
0 Init 0 27 0 00
1 ReadCookie 0 3 2 00
2 If 3 5 0 00
3 SetCookie 0 2 4 00
4 SetCookie 0 5 1 00
5 CreateTable 0 2 0 00
6 OpenWrite 0 1 0 5 00
7 NewRowid 0 1 0 00
8 Blob 6 3 0 00
9 Insert 0 3 1 08
10 Close 0 0 0 00
11 Close 0 0 0 00
12 Null 0 4 5 00
13 OpenWrite 1 1 0 5 00
14 SeekRowid 1 16 1 00
15 Rowid 1 5 0 00
16 IsNull 5 24 0 00
17 String8 0 6 0 table 00
18 String8 0 7 0 blog 00
19 String8 0 8 0 blog 00
20 Copy 2 9 0 00
21 String8 0 10 0 CREATE TABLE blog (
title TEXT NOT NULL,
author TEXT NOT NULL,
pub_date INTEGER,
body TEXT
) 00
22 MakeRecord 6 5 11 BBBDB 00
23 Insert 1 11 5 00
24 SetCookie 0 1 1 00
25 ParseSchema 0 0 0 tbl_name='blog' AND type!=...
26 Halt 0 0 0 00
27 Transaction 0 1 0 0 01
28 Goto 0 1 0 00
  1. Init 0 27 0 — Start up and go immediately to address 27.
  2. Transaction 0 1 0 0 01 — Start a write transaction on the current database and check that the database schema is the expected version.
  3. Goto 0 1 0 — Jump back to address 1.
  4. ReadCookie 0 3 2 — Read cookie #2 (the database format) from the database and store its value in register 3.
    What are “cookies”? Cookies are values that are present in the actual database file that SQLite is using.
  5. If 3 5 0 — Jump to address 5 if the value in register 3 is not zero. Interpretation: we’re going to skip setting some cookies on the database if there is already a database format configured (let’s assume there isn’t one yet).
  6. SetCookie 0 2 4 — Write 4 to cookie number 2 for the current database. This sets the database file format to 4.
  7. SetCookie 0 5 1 — Write 1 to cookie number 5 for the current database. Cookie number 5 denotes the database file’s text encoding format. By setting its value to 1, we are choosing UTF-8.
  8. CreateTable 0 2 0 — Create a new table and place its location in the database file at register 2.
  9. OpenWrite 0 1 0 5 — Open a read/write cursor called 0 with 5 columns on the table whose root page is 1
    Interpretation: Page 1 contains the sqlite_master table, to which we’re going to add a record defining our blog table.
  10. NewRowid 0 1 0 — Get a new rowid value and place it in register 1.
  11. Blob 6 3 0 _ — There is an empty blob of length 6-bytes that we are going to store in register 3.
  12. Insert 0 3 1 _ 08 — Using cursor 0, write the data in register 3 to the table using the value in register 1 as the key and increment the number of rows for the table.
    Interpretation: We are writing the blob that was recently loaded into register 3 to the table as a new row and are using the rowid we constructed as its key.
  13. Close 0 0 0 — Close the cursor we opened previously (cursor 0).
  14. Close 0 0 0 — No-op, since the cursor is already closed.
  15. Null 0 4 5 — Write NULL into registers 4 and 5.
  16. OpenWrite 1 1 0 5 — Open a read/write cursor called 1 with 5 columns on the table whose root page is 1.
    Interpretation: Re-open a cursor into the sqlite_master table.
  17. SeekRowid 1 16 1 — Using cursor 1, jump to address 16 if the cursor doesn’t contain a rowid of the value in register 1. Otherwise move along. (let’s move along, since we did just write something to the value in register 1)
  18. Rowid 1 5 0 — Store the value of the rowid for cursor 1 in register 5.
  19. IsNull 5 24 0 — Jump to address 24 if the value in register 5 is NULL.
    Interpretation: If, for some reason, the cursor isn’t pointing at a row in the sqlite_master table, we are going to quit out by jumping to the end of the program. (let’s assume the value was not NULL)
  20. String8 0 6 0 'table’ — Store 'table' in register 6.
  21. String8 0 7 0 'blog' — Store 'blog' in register 7.
  22. String8 0 8 0 'blog' — Store 'blog' in register 8.
  23. Copy 2 9 0 — Copy the value from register 2 to register 9.
    Interpretation: we’re copying the database file page location of the new table we created way back during the instruction at address 5 from register 2 to 9 because we are about to use it in an insert into the sqlite_master table.
  24. String8 0 10 0 'CREATE TABLE ....’ — Store the CREATE TABLE statement into register 10.
  25. MakeRecord 6 5 11 'BBBDB' — Create a table record using registers 6 through 10 (6 + (5–1)) and store a reference to that record in register 11. The BBBDB string says that the first three columns and the last column in the record should have a type-affinity of “blob”, and the fourth should be a number.
    Interpretation: We’re finally building the row in sqlite_master for our blog table.
  26. Insert 1 11 5 —Using cursor 1, write the record data pointed to by register 11 using the rowid we’ve stored in register 5 as its key.
  27. SetCookie 0 1 1 — Set the value of the schema cookie to 1. The schema cookie is cookie number 1, and it denotes the current version of the database schema.
  28. ParseSchema 0 0 0 "tbl_name='blog'..." — Parse all schema entries in sqlite_master using the p4 param value as a WHERE clause. (this spawns another call into the VDBE)
  29. Halt 0 0 0 — Terminate the program with an error code of 0 (success!).

INSERT

Lastly, let’s see what it takes to add a row to our newly created blog table.

sqlite> EXPLAIN INSERT INTO blog (title, author, pub_date, body) VALUES ('Winter is Coming', 'Ned Stark', date('now'), 'It comes in season 7.');
addr opcode p1 p2 p3 p4 p5
---- ------------- ---- ---- ---- --------------------- --
0 Init 0 12 0 00
1 OpenWrite 0 2 0 4 00
2 NewRowid 0 1 0 00
3 String8 0 2 0 Winter is Coming 00
4 String8 0 3 0 Ned Stark 00
5 Function0 1 6 4 date(-1) 01
6 String8 0 5 0 It comes in season 7. 00
7 HaltIfNull 1299 2 2 blog.title 01
8 HaltIfNull 1299 2 3 blog.author 01
9 MakeRecord 2 4 7 BBDB 00
10 Insert 0 7 1 blog 1b
11 Halt 0 0 0 00
12 Transaction 0 1 1 0 01
13 String8 0 6 0 now 00
14 Goto 0 1 0 00
  1. Init 0 12 0 — Start up and go immediately to address 12.
  2. Transaction 0 1 1 0 01 — Start a write transaction on the current database and check that the database schema is the expected version.
    Remember: at address 24 of the CREATE TABLE bytecode program, 1 was stored as the schema version; in this instruction we’re just making sure that it is still 1.
  3. String8 0 6 0 'now’ — Store 'now' in register 6.
  4. Goto 0 1 0 — Jump to address 1.
  5. OpenWrite 0 2 0 4 — Open a read/write cursor called 0 with 4 columns on the table whose root is at page 2.
    Remember: The blog table was created on database page 2, and it has 4 columns.
  6. NewRowid 0 1 0 — Get a new rowid for the table and store it in register 1.
  7. String8 0 2 0 'Winter is Coming' — Store 'Winter is Coming' in register 2.
  8. String8 0 3 0 'Ned Stark' — Store 'Ned Stark' in register 3.
  9. Function0 1 6 4 date(-1) 01 — Invoke the date function using the value at register 6 as its only argument and store the result in register 4.
  10. String8 0 5 0 'It comes in season 7.' — Store 'It comes in season 7.' in register 5.
  11. HaltIfNull 1299 2 2 'blog.title' 01 — If the value in register 2 is null, end the program with error code 1299.
    Interpretation: Thetitle column on the blog was defined as NOT NULL, so we need to verify that the value we’re about to store is, well, not null. If it is, we will error out with the SQLITE_CONSTRAINT_NOTNULL code.
  12. HaltIfNull 1299 2 3 'blog.author' 01 — If the value in register 3 is null, end the program with error code 1299.
    Interpretation: Just like the title, the author column was also defined as NOT NULL.
  13. MakeRecord 2 4 7 'BBDB' — Create a table record using registers 2 through 5 (2 + (4–1)) and store a reference to that record in register 7. The BBDB string says that the record should consist of two blobs, a number, and a blob (in that order).
  14. Insert 0 7 1 'blog' 1b — Insert the record pointed to by register 7 with the key in register 1 into the blog table. The 1b value for p5 is a bit mask denotes among other things that the VDBE should count the number of rows changed, and store the newest inserted row’s id for access later.
    Interpretation: This is where Ned Stark’s blog post is actually added to the table.
  15. Halt 0 0 0 — We’re done! Exit with error code of 0 (success).

Try it yourself!

If you’re still with me, nice work! If you’d like to learn even more about SQLite’s VDBE and bytecode, try looking through the EXPLAIN results for following on your own:

  • Create a table with a compound primary key.
  • Insert data into that table.
  • Create another table, add data to it that relates to the data in the first table, and do a SELECT query which JOINs the two tables together.
  • See what a UNION of SELECT queries does.