After reading my last article:
Database III — Implementing low level file access in Java and C++ for a databases file system…
Implementing fast and effective storage functions using low level file access in java and C++.
We have a good understanding about how we can persist and read raw byte data from any low level storage. The next step is to bring order in this arbitrary data and organise it a way that makes it easy to interpret the data we’re reading. We won’t be 100% accurate in terms of efficiency, we only want to explain the system.
In our understanding we call a fix number of bytes a block. Imagine you have a hard disk with 1KB of space for our database. We divide this space into equal parts, lets say 8 bytes. That would make 128x 8 Byte blocks. These blocks are enumerated from 0 to 127.
Now we can address unique blocks with eight bytes. You could now easily save data of 8 bytes and read it back. Thats amazingly unhelpful.
But John Doe, I want to save an arbitrary amount of bytes!!!!11!!11eleven
Yeah my son, sure you want. At the end we want to address data just by one block id. E.g. we know our data is saved at block 7, then we want to say: “Gimme the data of block seven, b*tch” — At the moment we could only get eight bytes, not less, not more.
I want less
We only know the block id. When talking to our low level data access layer we could only say: “Read eight bytes from block xy”. But what can we do to store less than 8 bytes? If we would just write less bytes when writing data, we risk to have arbitrary bytes at the end of our block, that doesn’t belong the actual data but will still be read.
In this case we have to introduce the concept of a length byte. The length byte will say us how much actual data a block has. So we cut off one of our 8 byte and use 1 byte that tells us the length of the data and we still have 7 bytes left for data. When reading a block (We only know the id 0…127) we can now read the length byte first and know how many bytes belong to our actual data.
I want more
We now have even less data, only 7 bytes per block! But what the hell can we do now to store more bytes, let’s say eight? Your first idea may be to use multiple blocks, whats the actual answer. Knowing only one block id you may wonder how you can address multiple blocks.
In order to achieve more data space one can create a chain of blocks — Could also be called a linked list. Every block will have a so called next block index byte, that indicates the next block in the chain. At the end of the chain will be the -1 as the next block index.
When reading the data you just have to append the data of every block in the chain.
Combining the concept of a length byte and a next block id we create the concept of a so called block header. This header tells us meta data about what is stored in a block and how to read it.
In the end we have 1 byte for the next block, 1 byte for the data length and 6 byte for data per block.
Finished! We now have a ready to use block model the wraps the raw bytes and will help us organising the data.
In the next part we will talk about management operations on block storage and explain what can we do with our blocks.