bcache — small SSD hero in HDD world

Published in

weles.ai

6 min readSep 22, 2021

Data science is all about the data — the more the better! But where to store all this data? For small sets, the fast SSD is a great choice. What about PB scale datasets used frequently? The cost of storage using fast drives may put a dent in the project budget. Can we do something that will allow us to use slower and cheaper HDDs to store huge datasets without compromising access performance? Yes, we can! Bcache is there to help.

First, let’s see the concepts behind the Bcache and then explore some scenarios where it can be used — after this article, you will know why, how and when to use bcache!

Let’s dig in.

What’s bcache?

It is a Linux kernel block layer cache. It means that it works on kernel level, far from the user space. The main benefit of it is using faster disks to improve the performance of writes and reads while being able to leverage huge storage volumes of slower disks. The cost per GB is much lower for slower disks. Additionally, the transparency allows swift integration with the already existing workloads, which frequently access disk files. They will not see any difference.

The general idea is to use SSDs as buffers for files to allow sequential writes on HDDs. Additionally, reads are done from SSDs, so they are a magnitude faster. On the other hand, extensive usage of SSDs for sequential operations would shorten their lifespan, so this type of operation is by default saved directly to HDD.

Key concepts

Caching modes

Caching mode determines the behaviour of the cache — the order of persistence and destination. Different modes were created to handle various usage patterns. The choice of caching mode is the first and most important configuration challenge that is ahead of you. Let’s look deeper at how bcache caching modes work:

writethrough — this is a default and most secure mode. During the save operation data are stored on both SSD and HDD. There is no point in time when async replication is taking place. This mode does not give us the same performance as SSDs for write operations, because there is a need to wait for synchronous data persistence on HDD. The improvement comes from the fact of buffering data on SSD and writing sequentially on HDD. Read performance is close to SSD’s.
writeback — this mode gives the best performance for the price of potential data loss during SSD disk failure. The write operation has two phases — write to SSD and replicate it to the HDD. Replication is an asynchronous process running in the background. Data that were saved to SSD but not yet on HDD are called dirty data. Bcache supports shutdowns gracefully during data replication. Unfortunately, if SSD breaks down during replication then dirty data can be lost. This can be mitigated by using an SSD disk array with redundancy instead of a single disk. The performance of write and read operations is close to SSD’s.
writearound — this is read-only cache mode. Writes are done to the HDD directly and read are performed on SSD. The first time, a file is read from cache, there is a need to replicate data from HDD to SSD. In bcache, there is no possibility to define the size of the read and write cache separately. Using this mode the SSD drive is used only to support read operations. The write performance is the same as HDD, while the performance of repeatable reads is close to SSD.
none — the caching is disabled in this mode. All reads and writes are performed on HDD. You can use this caching mode as a benchmark for your performance tests.

Sequential IO

Sequential IO operations occurs often during working with large files. The HDD are using the “head” to find information on the spinning disks. One of the most time-consuming activities is moving the head to perform a sweep. It can be to find free space or a certain file. If all operations performed on HDD disk would be sequential, then there would be a significant increase in disk performance.

The bcache uses SSDs to buffer data and perform sequential write operations on HDDs. This improved write performance, which is most valuable when using writethrough cache mode. But what happens when a big sequential operation is performed? The bcache is constantly monitoring the rolling average of IO size per task. When this metric reaches the cutoff threshold, the operation is marked as sequential and performed on HDD instead of SSD. This is performed to save SSDs lifespan. The mechanism is enabled by default and can be disabled using configuration parameters.

DIY

The basic setup of bcache on Ubuntu is pretty straightforward. Let’s assume that I have two devices called sda3 and sda4. Both of them are empty and ready to be used as caching devices (sda3) and backing devices (sda4). Use make-bcache from bcache-tools package to initialize bcache.

Create a bcache using sda3 and sda4 devices

Now you can mount it like other disks in your system. If you want the bcache device to be mounted after a system reboot, remember to add it to fstab.

List of available block storages with bcache mounted under /media/bcache

Use bcache like every over block storage.

Costs and Risks

Adding SSDs capabilities to our HDD does not come without a price. Performance improvements require additional SSD drives and hard drives require slots on machines. Buying a single fast drive to cache lots of data is not optimal. The best scenario is to keep at least a single redundant drive for replication (ie. using RAID). This is especially important when using the writeback caching mode. This minimizes the possibility of data loss. On the other hand, bcache does not require an SSD for each HDD. You can cut costs and have multiple HDDs cached by the same SSD. But, be aware that possible performance issues will be harder to diagnose in a complex environment.

When to not use bcache?

There are 2 main scenarios when using bcache is not encouraged.

Scenario 1 — When only SSD is a viable solution. If you have a small dataset then just invest in SSD for it. Bcache will require you to buy it anyway.

Scenario 2 — When you perform lots of sequential writes and infrequent reads. The operations like backups or large log files storage won’t get a lot from bcache. This kind of operation is detected by bcache and files are written directly to the HDD disk. Small gains will not cover additional investments.

Summary/tltr

Let’s put all the information together.

Bcache is a very useful, battle-tested and powerful tool for the block layer cache. It allows having the performance of SSD disks for the fraction of a cost when storing huge datasets.

The most common scenario for using bcache is when you have a slow disk with lots of data (TB scale). On the other hand, you are actively using only a small portion of it (a few GBs). The speed of disk operations is slow and impacts your algorithm performance. This is a great candidate for using bcache.

Which cache mode to use?

Writethrough—Use this mode when even a small possibility of data loss is unacceptable

Writeback— Use this mode when performance is more important than avoiding possible data loss. If SSD breaks during replication, so be it — I will recover the data from backup.

Writearound — Use this mode when you focus more on the read performance. Additionally, you write a lot of large files — lots of sequential writes.

This is the basic theory behind bcache, benefits, usage patterns and a quick DIY. I’m sure you will find many ideas on how bcache can help in your projects.

Eager to try? Go for it!

For more MLOps Hands-on guides, tutorials and code examples, follow me on Medium and contact me via social media.