“Write-Behind Logs”. Because…NVM?

Mahesh Paolini-Subramanya
2 min readFeb 13, 2018

“Hard-Disks” — remember them? Yeah, old school, and all that, but so much (!) of what we take for granted in Database architectures derives from the limitations of hard-disks. The big, huge, one, of course, is the difference in sequential and random access.

Writing data sequentially is easy — the disk head moves to the appropriate location, and just starts writing block after block. Random writes, however, incur the additional — and huge! — latency associated with moving the disk head to the appropriate location.

Extend this to your favorite database, and consider what this means for a transaction. Back in the hard-disk days, it was much, much simpler to write all the transaction’s changes into a single set of sequential writes (in a separate “log”. Hence “write-ahead logging”, or WAL) instead of a whole bunch of random writes at different locations.

Enter the world of nonvolatile memory (NVM), which is pretty close to RAM speeds, definitely, doesn’t have to worry about sequential vs. random writes, and, for that matter, doesn’t even have to worry about writing “blocks” of data. Heck, you can even set it up to work at cache-line granularity!
Why, in this world, do are we still dealing with architectures that hearken back to the hard-disk days?

In a fascinating new paper by Arulraj et al. (•), have come up with a nifty update they call Write-Behind Logging (WBL), where the DBMS logs the parts of the database have changed, instead of the actual changes themselves, in the “log” — in effect, flushing the changes before recording them.

Performance goes up by 1.3x, storage goes down by 1.5x, and happiness abounds 😇.
Mind you, this increases the startup overhead, as well as the commit and recovery protocols. As the authors put it
When the DBMS restarts after a failure, it needs to locate the modifications made by transactions that were active at the time of failure so that it can undo them. But these changes can reach durable storage even before the DBMS records the associated meta-data in the log…The DBMS [deals with this] by recording meta-data about the clean and dirty modifications that have been made to the database.

Incidentally, the move to WBL also changes the way replication works. In a happy co-incidence, when using synchronous replication, the secondary can immediately start handling transactions since a committed transaction on the primary is, by definition “clean”.

(•) “Write-Behind Logging” by Arulraj et al. — http://www.vldb.org/pvldb/vol10/p337-arulraj.pdf

--

--