Writing a Database: Part 5 — A Fresh Start
In Things You Should Never Do, Part I, Joel Spolsky cautions against the temptation to rewrite code from scratch, calling it the single worst strategic mistake any software company can make. I use it to remind myself that rewriting something is always a bigger investment than you might think it is, and that new code always contains new bugs :)
Nowadays however, my view is a little more nuanced, having seen / heard about several systems at Google that are now in their second+ iteration, often with very different architectures from their first. The first generation of the Google File System (GFS), had a single master, which worked fine for a long time, but eventually was replaced with sharded master system. So it is possible to completely replace a system, but as Spolsky cautions, it can take a long time and a lot of effort to do it.
The reason I bring this up is that I’m wondering where to go next with my toy database. I eventually hope to layer Raft replication on top of it. When toying around with how I would do it, I discovered that I needed a way to durably persist the Raft logs. Luckily, I have a database, so that should be easy, right? Turns out, DDB was missing a few features ideally to make it a suitable database for storing its own Raft logs:
- Some notion of separate tables. Ideally, we’d have some way such that the Raft logs get their own storage on disk, so that user data doesn’t end up co-mingled with the Raft logs, as they have very different lifetimes and compacting the two together seemed like a waste. However, DDB currently is factored to only support a single global table, effectively.
- A scan API would be nice. DDB is currently only supports key lookups. When replaying raft logs, it would be nice to be able to efficiently scan and pick up a series of consecutive log entries, rather than look up each one individually.
While pondering how to make the above changes, John Ousterhout’s A Philiosophy of Software Design was fresh on my mind.
What I’ve decided (I may regret this later and change my mind), is to take a fresh start the code layout of DDB. I managed to cobble together something that mostly works — now I want to go back and look at each piece critically and try some other potential designs, and hopefully document things along the way. I’m hoping I’ll be able to copy significant portions of the existing code, although tests will probably have to be re-written since I’m going to change the API.
And yes, I know I’m doing the very thing I said was a bad idea at the start of this. But hey, this isn’t for work, so I’m going to capitalize on the fact that there’s absolutely no pressure to, you know, actually get stuff out the door on any time frame.