Database III — Implementing low level file access in Java and C++ for a databases file system storage

Felix Klauke
FelixKlauke
Published in
5 min readJun 30, 2018

--

Implementing fast and effective storage functions using low level file access in java and C++.

General idea

In the last part of the database series we talked about low level file access and whats the idea of random access. We will need this concepts when implementing our file system layer for the database we develop. Like mentioned before we wont use direct system calls but some wrapper classes instead.

Java IO: RandomAccessFile

The Java RandomAccessFile is one of the best known ways for random access on a particular file. You can imagine it as a byte array representation of a file. While normal Files are much more meta oriented and were unsuitable for our approach, this one would fit our requirements. It has a so called file pointer that will hold the position in the underlying file and you can seek at any position you want (until the end of the file). You can create a RandomAccessFile with read access, write access or both. It’s located in the Java IO package and supports blocking access only, thats why we will drop it in favour of a solution based on NIO and channels, that provide better support for ByteBuffers and so on.

Java NIO: AsynchronousByteChannel & AsynchronousFileChannel

Coming from the Java NIO package these channels allow use reading writing and manipulating byte buffers and files. Other than with the random access file and the synchronous ByteChannel and FileChannel we don’t have fix pointer to a position as all calls are executed asynchronously. We will use these in our java application as asynchronous IO with locks and futures is very very easy here. You can take a look at the JavaDocs for further information.

C++: ifstream & ofstream

These wrappers are all about managing a filebuf that has basically direct access on our system calls. I wont speak that much about them, as they are very low level near and wont need further introduction. If you still want to know more you can take a look at the documentation derived from fstream.

Implementation

We now have a solid basic idea of which wrapper classes Java and C++ provides for us to make low level file access as easy as even possible. When we think about what our random access interface could look like we will now implement it in Java and C++. Before we actually implement the classes we will write Tests and perform Test Driven Development.

Interface and Test in C++

For C++ we will use Google Test for creating the tests. As we now know which wrapper classes for low level file access exist in C++, we can extend our previously designed header file:

Now we can formulate the functions into a simple test:

First step is including the needed files. In our case we want to use the header of the google test framework and import our file system accessor. The second step is creating a Fixture that will assist us setting up a test environment. In this case it will create the file accessor and and will delete the file and the accessor on tear down.

The test itself isn’t 100% accurate as the two methods should ne tested separately, but it fits for our demonstration.

We first create an array of bytes and fill it with some example data. Then we write these bytes using our file system accessor. The other way round a second empty byte buffer is created and the data is read into it.

The last steps are asserting the example data matches the read data and deleting the open resources.

Implementation in C++

Implementing this is now very easy:

In the end this is just the manifestation of random access. We get the file name in in the constructor and can begin creating our wrapper classes that will handle low level file access.

The destructor will delete and close the streams:

Reading and writing is now as easy as seeking to a given position and writing a given amount bytes from a given buffer!

Finished!

Interface and Test in Java

While the corresponding interface in java is very easy, the test is also much more advanced and comprehensible.

We basically have the same content like in our C++ header file with a read and a write method that take the same base content of an offset, that defines the starting position for our random access. One difference is that in Java the ByteBuffer will be allocated by the file system access implementation, another that we will read and write asynchronous.

For further functionalities we will implement Closeable for a nicer integration.

The test is fair and begins by defining the name of a test file and content that we will write later in the test. The setup method will create the file system accessor and write some test data into the file.

We already define some behaviour for exception handling that will be very important in the further implementation.

On tear down the file will be deleted.

The write and read methods are also very short and comprehensive. In writing we only test that the correct amount of bytes is written to the file.

The main part of the test is the read method test that will check if the test content we wrote in the setup method will be read correctly.

Implementation in Java

The implementation is more or less trivial, after we heard so much about file and byte channels and know how the implementation in C++ works. The main difference will be, that we will use asynchronous channels using Futures in this case.

First to mention is that I will use Google Guava’s Futures for transforming futures. Second ist that we didn’t really handle file locks in this example. That will come in a later part.

As you can see an asynchronous file channel is used. We could also be more minimalistic and use an asynchronous byte channel, but we will later use some more advanced functionalities the asynchronous file channel provides.

With some exception handling and buffer allocation we have some further boiler code in this file but if you only look at reading and writing you can see the parallels with the C++ implementation and the underlying system calls, if you strip out some abstractions and the asynchronous future aspects.

Reading is just about allocating a byte buffer of a given length and reading from the given offset, setting up the byte buffer for reading and returning, thats all!

Writing is trivial. You will see that.

Conclusion

The first big step was just made: We have access on the file system and are able to read and write persistent files on disk. We will now follow our overview of architecture:

So the next step will be thinking about how Indices and Data can be stored. We will develop a format and then evaluate how we can implement it using our file system layer with random access. Furthermore we will need some file lock handling to ensure thread safety and prevent concurrent modification of sections.

--

--

Felix Klauke
FelixKlauke

25, oving infrastructure and backend services and networking, devops by night, privatizing the world peace, only doing the extravagant jobs