Faster SQLite Lookup Using Without Rowid Optimization
TL;DR: A WITHOUT ROWID table can reduce DB size and perform query faster than an ordinary rowid table.
A good strategy is to simply not worry about WITHOUT ROWID optimization until near the end of development, then go back and run tests to see if adding it helpul or not.
SQLite rowid
By default, every row in SQLite has a special column — usually called rowid.
rowid can be used to uniquely identifies a row within the table.
For example, create a new table:
CREATE TABLE TopRatedFilm(
movie TEXT,
studio TEXT,
rating REAL
);
As a result, SQLite creates rowid beginning with 1 and increasing by one with each added row. But if rows are deleted, gaps can appear.
SQLite allows us to control the rowid assigned, so that rows can be inserted at the top of the table by using negative rowid.
The rowid(s) are always unique. They may not be consecutive but always in strictly ascending order.
SQLite searching
Normal lookup
Suppose you want to know the rating of Starwars. The query would be:
SELECT rating FROM TopRatedFilm WHERE movie='Starwars';
SQLite reads every row out of the table, checks to see if the movie
column has the value “Starwars”, and if so, outputs the rating
from that row.
This algorithm is called a Full Table Scan.
In a full scan, the entire content of the table must be read and examined to find needed row(s).
If your table contained a million rows, SQLite might read MBs of content just to find a single value.
Lookup by rowid
One technique for avoiding full scans is to lookup by rowid. For example, to know the rating of Starwars, we query for the entry with rowid of 2:
SELECT rating FROM TopRatedFilm WHERE rowid=2;
Since rows are stored in order of increasing rowid, SQLite can find using a Binary Search, so the time complexity is reduced to O(logN) (rather than O(N)).
The problem is: you actually don’t want what the 2nd entry is — you want to know the rating of Starwars!
Lookup by index
An index is another table similar to the original one, but only store rowid & indexed column(s).
SQLite creates a B-Tree to hold the indexed data.
Let’s create an index by specifying the primary key:
CREATE TABLE TopRatedFilm(
movie TEXT PRIMARY KEY,
studio TEXT,
rating REAL
);
Or use the explicit CREATE INDEX statement:
CREATE INDEX movie_indexed ON TopRatedFilm(movie);
After that, TopRatedFilm
table is implemented as two separate B-Trees:
- The main table uses the rowid as the key and stores the
(movie, studio, rating)
columns as data. - The index table uses
movie
and rowid as the key and stores no data at all. Note that: themovie
column is sorted to support binary search.
This index can be used to implement a faster lookup algorithm:
SELECT rating FROM TopRatedFilm WHERE movie='Starwars';
The key used to look up is still the rowid:
- The query starts by doing a binary search on the
movie_indexed
for entries that havemovie='Starwars'
. - Having found rows in the
movie_indexed
, SQLite extracts their rowid. - Then, it does a second binary search on the original
TopRatedFilm
table using the extracted rowids to find the original rows.
With index, SQLite still has to do two binary searches. But for a large table, this’s still much faster than doing a full scan.
The WITHOUT ROWID Optimization
If the phrase WITHOUT ROWID is added to the end of a CREATE TABLE statement, then the special rowid column will be omitted.
Benefits
The WITHOUT ROWID syntax is an optimization
A WITHOUT ROWID table can use about half the amount of disk space and can operate nearly twice as fast.
Consider TopRatedFilm
, as an ordinary rowid table, two separate binary searches are required when doing a lookup.
On the other hand, a WITHOUT ROWID table uses a different data design:
CREATE TABLE TopRatedFilm(
movie TEXT PRIMARY KEY,
studio TEXT,
rating REAL
) WITHOUT ROWID;
In this table, there’s only a single B-Tree that uses the movie
column as key and the rating
column as data. (Technicality: the low-level implementation stores both movie
and rating
in the “key” area of the B-Tree).
Because there’s only a single B-Tree, lookup for value only involves a single binary search.
Entries of the movie
column are now only stored once. Querying the rating
value only involves a single binary search (into the main B-Tree), since the rating
can be retrieved directly from the record found by that first search.
Differences from ordinary rowid tables
The WITHOUT ROWID syntax provides no new capabilities. Anything that can be done using it, can also be done in exactly the same way with an ordinary rowid table.
They both generate the same answers given the same SQL statements.
There’re only some additional restrictions on WITHOUT ROWID tables:
- It must have a PRIMARY KEY
- AUTOINCREMENT does not work.
- NOT NULL is enforced on every column of the PRIMARY KEY
When To Use?
The WITHOUT ROWID optimization is helpful for tables that:
- Have non-integer or multi-column PRIMARY KEYs: it’ll work correctly for tables with a single INTEGER PRIMARY KEY, however, ordinary rowid tables will run faster.
- Do not store large strings or BLOBs: it works best when individual rows are not too large. A good rule-of-thumb is that the average size of a single row in a table should be less than about 1/20th the size of a DB page. This’s because rowid tables implemented as B*-Trees (all content stored in the leaves), whereas WITHOUT ROWID tables implemented using normal B-Trees (with content stored on both leaves & intermediate nodes). Storing content in intermediate nodes makes them take up more space and increasing the search cost.
A good strategy is to simply not worry about WITHOUT ROWID optimization until near the end of development, then go back and run tests to see if adding it helps or hurts performance, and retaining the WITHOUT ROWID only in those cases where it helps.
Happy coding~