CSR Tale #5: The story of PebblesDB
I figured it was time I pitched in with a story of my own. This is the story of how our PebblesDB key-value store came to be built and published at SOSP 17. The story also involves my post-doc experience at VMware Research.
This story begins in 2015, when I was finishing up my PhD. I was interviewing in both industry and academia: I actually thought I was not going to get any academic offers (I didn’t work on sexy topics like cloud computing or machine learning), but wanted to give it a try before I left for industry. I wanted to join an industrial research lab like Microsoft Research.
Interestingly, at the time I was on the job market, a few folks from Microsoft Research Silicon Valley had started a new research lab in VMware. My internship mentors Mahesh Balakrishnan and Marcos Aguilera were at this new lab, so they encouraged me to apply. I interviewed at the lab and had a great experience. They gave me an offer to join as a researcher.
I went on to interview at different universities, and I was extremely lucky to land an offer at the University of Texas at Austin. I called Dahlia Malkhi (one of the founding members of the research lab) to tell her the news and inform her I couldn’t join VMware Research (or VRG as it is called). On the same phone call, Dahlia suggested I do a one-year post-doc before joining UT, and offered to host me at VRG. I had already heard of successful one-year post-docs (for example, Philip Guo), and I didn’t really want to plunge into faculty life immediately, so I agreed. I spoke with the folks at UT Austin, who were thankfully okay with deferring my join date by a year.
When i joined at VRG, I had the explicit goal of forming new connections, and not just working on the same things I did during my PhD. I spoke to a lot of researchers about what they were working on, and how I could help. This was when I first talked to Ittai Abraham, a theory/data structures expert (his expertise spans a lot more areas, but this is the briefest way to describe him). Ittai had this idea about a new data structure for key-value stores, and he wanted someone with systems experience to help build it. I joined on thinking it would be a quick one-to-three months project.
The initial days were a bit rough, with most of what Ittai was saying going right over my head. Systems people and theory people really do speak different languages, so it was a while before we were in sync. To better understand the intuition behind the project, I started building a quick python prototype that embodies the new data structure Ittai was thinking about. Our prototype showed that the new data structure could drastically reduce write amplification, though our latencies were significantly higher than the C++ key-value stores we were comparing against. I presented the early form of PebblesDB at the VMware RADIO conference, an internal R&D conference in VMware. Btw, academic conferences have nothing on RADIO: RADIO’s production value is nearer to that of TED than an academic conference. You could have had a small concert on that stage, and it wouldn’t have looked out of place.
After receiving positive and useful feedback at RADIO, Ittai and I set out to modify an existing key-value store to use our new data structure. We chose LevelDB, since it was significantly simpler and easier to understand than RocksDB, and began modifying it. Specifically, we started modifying HyperLevelDB, a port of LevelDB by the HyperDex folks at Cornell (Emin Gun Sirer’s group).
We had several moments when what we had assumed clashed with what LevelDB was actually doing: for example, we thought there would be a binary search across the entire sstable making search O(logn); turns out sstables just have indexes making search O(1).
This was the fun part of the project, because there is so much involved in going from a theoretical data structure to building an actual key-value store that delivers great performance. We had to use a number of well-known engineering tricks to build PebblesDB.
We were halfway through the implementation when my post-doc ended, and I joined UT. Thankfully, almost immediately, Pandian joined my research group and took over the system-building part. Pandian is an amazing systems builder, so pretty soon we had a prototype ready. We evaluated it against LevelDB and got great results. So we wrote it up and sent it to Euroys.
We got rejected at Eurosys, mainly due to two reasons: we hadn’t evaluated against RocksDB, and we hadn’t explained the design very well. It seemed to come across more as a bunch of hacks to LevelDB than a new data structure. So we got to work, evaluating against RocksDB, and evaluating the performance of applications such as HyperDex and MongoDb on top of PebblesDB. This is when Rohan Kadekodi joined the project. Rohan is another amazing systems builder, and in the space of a month, he went from not knowing anything about MongoDB to modifying it so that it would run on top of PebblesDB.
It was when we were benchmarking application performance that we got other surprises. For example, in both HyperDex and MongoDB, many put() requests would be transformed into a get() + put() requests to first check if the key is already there. This significantly impacted PebblesDB performance, since PebblesDB could handle a lot more put() requests than the application was throwing at it. It was interesting figuring out these application quirks though!
Another thing we tackled was the writing. I distributed our draft to the systems group at UT Austin. We got excellent feedback, and I re-wrote the paper to make it clear we were doing two things: data structure innovation in terms of the Fragmented Log-Structured Merge Trees (FLSM) data structure, and building PebblesDB on top of FLSM (along with the associated engineering tricks). In particular, the feedback for the introduction was super useful, and we re-wrote it several times to get the point across. We submitted the paper to SOSP.
News came in August: we got accepted with great reviews! It felt good to know all that work finally paid off. We worked with our shepherd, the amazing Frans Kaashoek, to address the reviewer comments. We also worked hard to release the code as open-source on Github(where it has received a fair amount of attention: 98 stars as of now!). We also worked to release the changes we did to MongoDB so that it can be run on top of PebblesDB.
Working on PebblesDB got me thinking about the problem of write amplification across the storage stack, so I started working on it in UT Austin. The preliminary work in this space led to a Best Poster at ApSys and an NSF CAREER grant! So overall, a successful post-doc experience :)
My lessons from the PebblesDB experience:
- Writing is super important. I strongly feel the time spent in re-writing the paper to be better pays off much more than time spent in doing additional experiments (though a strong evaluation is important too).
- Working with theory folks is a lot of fun! If you find the right collaborators, working on a mix of theory and practice leads to extremely satisfying research plus a lot of impact. There are similar projects on-going at VMware Research that are super cool.
- If you will be joining academia, I highly recommend a one-year post-doc after you finish your PhD. The post-doc allowed me to catch my breath after the PhD, explore new projects, and form new connections that I never would have otherwise.
- I highly recommend doing a post-doc at VMware Research (and no, I am not paid to say this.) The research group has amazing researchers with deep experience in a number of fields, and the culture is oriented towards doing large projects that may take longer but will have lasting impact.