Quelle: https://commons.wikimedia.org/wiki/File:Notizzettel.JPG Copyright: Sebastian Hartlaub

Database IV — Block Storage

Felix Klauke
Published in
4 min readSep 26, 2018

--

How is data organised at persistence? How does block storage work and what algorithms and data structures do we need?

Preparation

After reading my last article:

We have a good understanding about how we can persist and read raw byte data from any low level storage. The next step is to bring order in this arbitrary data and organise it a way that makes it easy to interpret the data we’re reading. We won’t be 100% accurate in terms of efficiency, we only want to explain the system.

Blocks

In our understanding we call a fix number of bytes a block. Imagine you have a hard disk with 1KB of space for our database. We divide this space into equal parts, lets say 8 bytes. That would make 128x 8 Byte blocks. These blocks are enumerated from 0 to 127.

Block storage of 0–8… Blocks with 8 bytes each

Now we can address unique blocks with eight bytes. You could now easily save data of 8 bytes and read it back. Thats amazingly unhelpful.

But John Doe, I want to save an arbitrary amount of bytes!!!!11!!11eleven

Yeah my son, sure you want. At the end we want to address data just by one block id. E.g. we know our data is saved at block 7, then we want to say: “Gimme the data of block seven, b*tch” — At the moment we could only get eight bytes, not less, not more.

I want less

We only know the block id. When talking to our low level data access layer we could only say: “Read eight bytes from block xy”. But what can we do to store less than 8 bytes? If we would just write less bytes when writing data, we risk to have arbitrary bytes at the end of our block, that doesn’t belong the actual data but will still be read.

In this case we have to introduce the concept of a length byte. The length byte will say us how much actual data a block has. So we cut off one of our 8 byte and use 1 byte that tells us the length of the data and we still have 7 bytes left for data. When reading a block (We only know the id 0…127) we can now read the length byte first and know how many bytes belong to our actual data.

New structure about one single block. 1 length byte, n data bytes, j fill bytes = 8 bytes

I want more

We now have even less data, only 7 bytes per block! But what the hell can we do now to store more bytes, let’s say eight? Your first idea may be to use multiple blocks, whats the actual answer. Knowing only one block id you may wonder how you can address multiple blocks.

In order to achieve more data space one can create a chain of blocks — Could also be called a linked list. Every block will have a so called next block index byte, that indicates the next block in the chain. At the end of the chain will be the -1 as the next block index.

When reading the data you just have to append the data of every block in the chain.

Block Header

Combining the concept of a length byte and a next block id we create the concept of a so called block header. This header tells us meta data about what is stored in a block and how to read it.

In the end we have 1 byte for the next block, 1 byte for the data length and 6 byte for data per block.

Block header and Block data

The End

Finished! We now have a ready to use block model the wraps the raw bytes and will help us organising the data.

In the next part we will talk about management operations on block storage and explain what can we do with our blocks.

--

--

Felix Klauke
FelixKlauke

25, oving infrastructure and backend services and networking, devops by night, privatizing the world peace, only doing the extravagant jobs