DB4: hengfeiyang/lsmdb
NOTE: Before starting, the codebase doesn't have Level-based/Size-based SST compaction. So, the SST files reside in Level 0. But, this would anyways be a good starter for learning WAL, Memtable, ImmutableMemtable, and SSTable. It also doesn’t have Bloom Filters.
These days, I am reading more database codes, than writing any of them. I have been reading codes in Java, Go, and Rust. I started my journey by reading java code for SimpleDB (a traditional RDBMS), then tokio mini-redis written in rust, RheaKV in Java again (this was similar to TiDB, TiKV based on RAFT), Raft-KV in Go, and KeyDB in Go. I did find similarities in the patterns used, but yes so far, these were all toy projects to get myself clear on the DB internal. There is a lot left to understand about a production-ready database.
It took me a while to juggle between languages a figure out what I should learn to be in this domain. My best bet was on RUST, but honestly, It requires a lot of time investment to get myself to master RUST. And code is usually verbose and I was not really liking it. Then the next bet was either Go or Scala. I find Go to be more aligned with the type of work that I would be interested in future, ie more into Distributed Systems (Go) side of things, rather than the Data Engineering side(Scala).
On a side note, TiDB and TiKV architecture has greatly influenced my pursuit of DB specialization. I really like the modern tech stack (Go and Rust) they have chosen for the DB. They have great learning materials with immense potential if translated to English. Hope they make their contents globally accessible,
There is a lot to learn in Java itself. I am ashamed to say, even after spending 4 years working on Java, I still can’t say I have mastered the language. If I could undo my past, I would definitely read netty internal architecture, focus more on the distributed systems side of things than Data Engineering and try Open source contributions in a Java DB. But, enough of my pity talk, let's start with the dissection of a new DB here.
Project Structure
I liked the project structure laid out for this one.
cmd/lsmdb/main.go
internal/server/server.go
We are passing the Gin
engine to the router.Route()
. We are passing it by reference, so that, the router can add the routes defined in that class. Then we invoke Run()
on the Gin engine.
internal/router/router.go
The API’s
internal/service/get.go
internal/service/set.go
internal/service/flush.go
internal/service/bulk.go
internal/pkg/lsm
This is the crux of our database. Here we will be looking into a small LSM tree implementation. Below is the SET and GET data flow.
Also, a snip from a translated article.
internal/pkg/lsm/server.go
- Here in all the helper functions, we are calling the functions from the
memory_table.go
- Compaction is not yet implemented thought.
internal/pkg/lsm/memory_table.go
This is the main file, which has the GET, SET, LoadFromDiskTable, LoadFromWAL, etc.
- When the
activeTable
is full, we move the contents toimmutableTable
and the set the activeTable as newSSTable
ready to receive fresh writes.
- Data Structure definition
- Usage of NewMEMTable in
server.go
- Alter commands ie SET/Delete
- Here you can see that, we lock the mem_table, and then check if the size is ≥ blockKeyNum, if it is, then we do the switch.
- If the restore is false, ie the normal write, we first append it to the WAL.
- Then we do a lock again, call the
activeTable.set(command)
API and call the unlock.
- The
public Query function
checks if it already Deleted an entry. If so, it throws the errorNotExists
. - Essentially, we delete items by setting a
delete flag
or setting theCommand
toCommandTypeDelete
(soft delete), instead of removing the entry from the file (hard delete). We dolevel compaction
, we ensure that the marked for deletion items are not included in themerged SST
.
The actual query follows this path
query
→ memory SSTable →active SSTable
→immutable SSTables
→sparse Index → SST file
Now we are going to enter the meat of this memory_table, the Flush, LoadFromDisk & LoadFromWAL.
Let's see how the File byte layout looks like for SSTable
(ie sdb file)
Setting the metadata partial data
Covered the last 3 in the layout.
We will do deviation here, read about Sparse Index, and come back to this later.
Sparse Index
Just watched some short videos (video 1, video 2), and here is the gist
So now coming back.
Here is the pseudocode of what is happening in the Flush()
Excerpt from an article
From here on let's deviate for a while before coming back to LoadFromDiskTable()
& LoadFromWAL()
.
Let’s now focus on the DTOs that will go inside the disk.
spase_index.go
metadata.go
sparse_index.go
I’m not sure how TableName is not written into the file. May be we read it from the file name or so.
This is how the data folder looks like:
Let’s see the Datastructures in here.
wal.go
NOTE: The “Google: Big Table” paper on creating a CP database. They said that they can ensure higher availability(A in CAP theorem) by doing redundant replication(Raft or Paxos), and improving the Quality of Hardware. And these days, most data centers have more specialized hardware than commodity hardware.
sstable.go
Here we are using Skip List
library.
What happens in Bytes()
should be clear by now.
We have in a sense covered all the files except sstable_disk.go
. We will cover it later. Now lets go back to the memory_table.go
and finish the rest of the functions.
memory_table.go [Cont.]
From the above code, we now have pretty go idea what Restore()
does within the /disk
files.
Restore is like, restoring the state
of that object by reading the byte[]
.
Now, we are left with just sstable_disk.go
sstable_disk.go
Usage of NewDiskSSTable()
Code:
Here you can see, we are keeping a Map<FileName, DiskSSTable>
for faster disk access by application caching.
The DiskSSTable
contains an array of SSTable
(ie Blocks
), in that file. A quick reminder, SSTable
is nothing but a wrapper around SkipList
.
If the block is not in memory, load it and do the Query
on that SSTable
.
Load is fairly simple as well.
Since Compaction is not implemented, we are not doing any garbage collection (removal of deleted key-value entries) here.
Conclusion
We have covered the complete lsm-DB. This was a great codebase to learn about Go structuring and also a beginner introduction to Memtable, WAL, LSM tree, etc. I made some minor modifications to the lsm-DB on my personal fork. Feel free to check that out as well. That’s it for today. Have a great weekend everyone!