Database I - Developing your own data storage engine aka create your own database

I began programming software around seven years ago. Like every young upcoming developer I always dreamed about developing a huge application, something very complex that is better than everything other people did. Even if the last thing isn’t that easy everyone will try to make some projects on their own. Developing your first little web server, programming language and so on. But there are some topics where there is no golden way. You won’t find a step to step guide how to develop your own database and thats why it is so fucking interesting for any developer: It’s a mystery, an adventure. It’s complex and not trivial and you will know you actually did real work when you finished it. Well, let’s talk about how your dream of your own database become true. This is the story how a friend and I created our own little database and which challenges and adventures we had to take.

The very beginning

It was a cold morning when a friend of mine texted me about something he just developed: A simple data storage. It wasn’t complex or revolutionary but the birth of a bigger project. The proof of concept was only able to store a little limited amount of data in a simple key value system. You couldn’t delete data, you couldn’t really replace data, but it was the very beginning.

The Counter part

There is a famous set of rules out there: The cardinal rules of DBMS development.

Rule 1: Developing a good DBMS requires 5–7 years and tens of millions of dollars.
Rule 2: You aren’t an exception to Rule 1.

This os overall right. But we don’t aim developing a full DBMS and come on: You also think you are an exception. And for education, this is ok. To quote a character of one of my favourite series:

“Discovery requires experimentation.”
- Daniel Whitehall

The project goals

We want to archive the following functionalities and features:

  • Support CRUD operations
  • Support for raw data and json records
  • Database communication via REST
  • Database communication via a binary protocol
  • 100% Test coverage

Most basic thoughts

When taking a look at our requirements we can specify, what we have to think about. Our first step will be talking about low level files access. Then we will create layer that wraps this low level files access to suitable for reading data and indices. This function will be put together into a storage index that will work on indices and queries. The storage engine will be run by a database, managed by an authority that can handle multiple databases. These databases will have to be exposed via a server. Sounds funny? Lets visualise this a bit. First we consider a single database:

As mentioned the databases will be managed by a manager. This manager oversees where the databases are located and it will load and unload the databases whenever needed. As soon as we have such a manager we can add a server layer.

Thats all for now! The next part will be about low level file access: