Database II — Low level file access in C/C++

Accessing data an a low level is important for all kind of software that aims to be fast and effective and needs direct data / storage access.


Low level file access in general

In the following we will think about low level file access in Linux/Unix. Some examples in C/C++ are used for demonstration. This section is used to introduce the idea of random access and establishing the mindset behind how file access works. We will used abstractions for this later, but we need some basic knowledge of file handling.

High level vs. low level

We first have to define what we mean when we talk about high level and low level file access. When talking about low level file access we are talking about direct system calls on kernel level. When talking about high level file access we are talking about a wrapper around system calls that is placed in user space.

System calls

For our implementation we need some basic system calls that help us opening, closing, reading and writing a file. For reading (/ writing) we want random access, so we can read and write from wherever to wherever we need. These calls can be found in:

  1. open
  2. close
  3. pread
  4. pclose

Implementation in C

For demonstration we will write a very simple file using C language here. The source code is quite simple:

The first step is obtaining the file descriptor via #open(filePath, flags) where the file path gives where the file is located and the flags what we want to do. You can imagine the file descriptor like a pointer to the underlying file. All our other file regarded system calls need the file descriptor as a target.S

We now can fill some example data to an array and write this array using #pwrite(fileDescriptor, buffer, amount, offset) where the amount says how many bytes from the buffer should be written to the file descriptor beginning at the given offset.

Last step is closing the file. Simple, huh? Now we can take a look at what we have created with this code.

The system output:

The resulting file is looking very spectacular, when opening it with a hex editor (I use https://hexed.it/):

When we now calculate the other way round:

Surprise surprise, the same we put in! We have made our first step about writing data to a file using low level file access! As a human can read the file we created and the result is correct, what about the computer using the last system call we didn’t use yet: pread?

Following the same procedure like with writing we first create the file descriptor. This time we open the file for reading!

Then we recreate the buffer with the same size and throw it into our pread call.

The last step is just printing out what we have read and then closing the file.

The system output:

That is black magic! Or, how programmers call it: Low level file access.


Abstraction for our database’s low level access

While this low level file access is very fast and powerful we won’t use this directly in our application. Maybe implementing our own file access is a task we can approach later but for now we will use a bit higher API provided by the corresponding standard library of our programming language.

Defining an interface for our file system access layer

We have learned enough to be able to define a simple interface around our file access layer. The main idea is really simple: Random read and write access. When putting this into a C++ header file, a result could look like this:

As you can see we construct a file system accessor by the name of the we are accessing. Under the hood we will use ifstream/ofstream for reading and writing files. These are nothing but a wrapper for the raw calls shown above. The corresponding private fields were left out for demonstration reasons.

Conclusion

Low level file access is a mighty tool when you want fast and direct access on data. But you should always use it with caution as operating on a low level is always more risky and less error tolerable. It’s ok to use wrappers like ifstream/ofstream in C++ or ByteChannel or FileChannel in Java. These will also be the ones we will use when we implement our low level file access layer in the next part of our database series.