The Natural Order of things.

Tyler Elkink
BACIC
Published in
5 min readFeb 28, 2023

I would like to explain a Stupid Database Trick called “natural order.” This is a stupid database trick because it’s horrible and causes all companies a real headache eventually and is stupid unless you’re a database. Which nobody is. “Natural order” is the DBA jargon term for “the order of data as stored on disk,” more or less, and it’s how the database will return values if it’s not told to order the values any other way. Very often a company will start a small database with limited activity, and queries written assuming data will be returned in the order it was stored will work, until the order of those records change on disk. Suddenly things break, and the result is a refactor and possibly a lot of shouting.

The “technically correct” response to natural order problems is to point out that an RDBMS never promises to store values in a sorted order, only to return values sorted in a specific order when asked to. This is the sort of helpfulness offered by Senior DBAs and other people confident that they can’t be fired. I’d like to walk you through a hypothetical situation to explain why order currently on disk is not the same as order of records created, and why planning for the natural order to change is important for data architects and developers.

Finally, a brief description of what this article isn’t. This explanation is only about the general idea of natural order. It doesn’t get into technical details like how heaps and disk allocation work. It doesn’t explain database features like cluster indexes or materialized views or other things that amount to “well okay, this isn’t always true.” Finally, “natural order” (order on disk) sounds like “natural sort” (“2” comes before “11”), but they are different things.

The Tragic Story

Let us tell the story of a programmer asked to develop a solution with a nested hierarchy of folders stored in a database. “Time to store data in the database,” thinks the innocent programmer, and does so, initially storing 100 records, editing some of them, and eventually adding another 5 records. “Now I need to retrieve my records,” the programmer thinks, and they do so. And because they are a good, smart programmer, they assume the database is good and smart and works on FIFO principles and that the query will return data in the same order it was stored.

Unfortunately, databases are less Good Smart FIFO storage engines and more Lovecraftian Horrors Of Non-Euclidean Geometry. Instead of returning the records in the order of 1–105 the database returns the records in the order of 101–105,11–100,1–10. The poor innocent programmer writes code that assumes their first-inserted “record 1: /root/” folder entry would be returned first. Using the query results with that assumption proceeds to blow up the ocean entirely by accident. But why? HOW? How can our brave and noble programmer prevent this sort of thing?

Why? HOW?

The reason a database is not FIFO is that a database is concerned with efficient storage and retrieval of data, not data being well-ordered on disk. When records 1–100 are stored, the database very likely stores them in that order. Let us imagine that each record is given perhaps 100kb bytes on disk for a total of 10,000kb.

An imaginary disk with 100 blocks of 100kb written to it (in dark green).
An imaginary disk with 100 blocks of 100kb written to it (in dark green).

At some point, entries 1–10 are edited and increase in size to 150kb on disk, and are written as a group to the database. Obviously the new extra 50kb of data from record 1 can’t continue into the first 50kb of record 2, so the database writes the new 150kb first record value at the end of record 100 where there’s room, and marks the original first record as garbage (for efficiency). If there’s an index (for efficiency), the database will update the index so it knows record 1 is at the end of the collection. Then (for the purposes of our example it does the same thing for records 2–10. The order on disk is now 11–100,1–10.

An imaginary disk with the first 100kb marked for recycling (in light green) and an extra 150kb (in blue) tacked on the end.
An imaginary disk with the first 100kb marked for recycling (in light green) and an extra 150kb (in blue) tacked on the end.

There is now a 1,000kb block of recyclable space on disk at the beginning of the DB files. To complicate matters, 5 new records are inserted, each 200kb in size. In order to use disk space efficiently the database stores them into the newly recyclable first 1,000kb on disk rather than appending to the end of the table. Now the order on disk, the natural order, is 101–105,11–100,1–10.

An imaginary disk with the first 100kb recyclable block filled with new data (in pink).
An imaginary disk with the first 100kb recyclable block filled with new data (in pink).

This is also one of the reasons an index is important; the database knows it’s unlikely the “natural order” is the same as the inserted order. A database asked to search for a value, even a sequential value, will scan the entire table for values if there’s not an index to check instead. An index is a smaller, explicitly ordered record of all the values in a column or columns, with a pointer to where the records for a given value live on disk. Queried correctly, the database could search the e.g. 10kb index for values instead of the now 11,500kb table.

How can our brave and noble programmer prevent this sort of thing?

The essential principle is to remember that the database is Lovecraftian, not FIFO, and cannot be trusted to act FIFOey even though it seems to when you start using it. If your code needs records returned in a certain order, you need to explicitly order the query results. Ideally, both your important fields and metadata fields will be indexed and you can order query results by them. Many databases provide a means of creating a column with a value that increments with each record inserted, allowing queries to sort by order of record creation- the FIFO order a programmer might assume happens naturally.

If, on the other hand, you’re dealing with code that incorrectly assumes the database is a FIFO store, or a query is written incorrectly assuming a record is unique and it accepts the “first and only” result, that code may blow up the ocean. It may do so suddenly one day after months or years of meeting expected behaviour. It may stumble into a duplicate record or a changed natural order. To patch around this problem, either as a fix for ordered results or a stopgap around duplicate values, you will have to order your results by some value in the record. If you’re lucky you have an auto-incrementing numeric field, or a metadata date and time value that stands in for order of records created. If no such field exists, talk to the DBA or the data architects. And prepare for a refactor and possibly a lot of shouting.

--

--