Linode Block Storage and its Applications
In June 2017, Linode released a new storage service named “Linode Block Storage” for beta testing. This article helps you explore its usage, behavior, architectures, and use cases.
- What exactly is Block Storage?
- 8TiB storage capacity and more per server
- Storage Planning
- Current Limitations
- Attaching a block store — Details about mounting
- Hot-plugging block stores into and out of Linodes
- Consolidating many block stores into a single mega disk
- Filesystem for block store — Ext4, XFS or ZFS?
- Expanding and shrinking a block store
- Sharing a block store across multiple machines
- Performance — Block Store vs. Instance storage
- Going far beyond 8 TiB
- Cloning behavior — Block store and linode cloning
- Automating block storage operations using APIs
- Use case: Big Data storage
- Use case: Remote desktop with your own custom distribution
- Use case: Dataset storage for Kaggle and other data science competitions
- Use case: Backup your local disks using Clonezilla
- Use case: Android OS build farm
- Use case: Photo & video storage
- Use case: Self-hosted clouds
What exactly is Block Storage?
A “Block Store” is, to put it simply, a disk with expandable storage capacity that can be scaled per your current storage needs. In some places, such as the Linode Manager web app, block stores are also called “volumes.”
Normally, when you select any Linode plan, you get a certain fixed amount of instance storage space to play with (I’ll get back to why I call it instance storage). This fixed capacity is proportional to the plan’s pricing. For example, the Linode 1GB plan comes with 20 GB of storage while the Linode 12GB plan comes with 192 GB of storage.
You are free to slice up that fixed storage into a maximum of 8 virtual “disks,” and further slice up each disk into any number of partitions, each with its own file system. However, the total fixed storage space is always limited by the plan. Before block storage, if you wanted more storage space, your options were to either upgrade to a higher plan or purchase more Linodes.
Block Storage enables you to totally exceed these fixed capacities without upgrading plans or buying more instances. Using block storage, even a humble $5 1GB server can be configured with 8 tebibytes (1 tebibyte = 1024 GiB) or more of storage!
I’ll explain the mechanics of how to do this in the next section; but before that, you should know about the other aspect that differentiates a block store from a normal disk — its lifetime. Remember how I previously termed the fixed storage that comes with each plan as “instance storage”? The reason is this — when you destroy a Linode instance, you also lose all its disks and the data stored on them. The lifetimes of a server’s fixed disks are coterminous with that of the server itself. This has always been the case and continues to be the case.
But block stores behave differently. A block store leads an independent existence with its own lifetime. You can temporarily attach a block store to a Linode as an additional disk and use it like a regular disk, but if you destroy that Linode, the block store and its data are not destroyed. They remain available for attaching to another Linode in the same datacenter.
One potential source of confusion is trying to parse the phrase “block store.” It gives the impression that only this kind of storage technology handles data in some kind of blocks in a way that other types of storage do not. But this is not true. Even the “disks” created from fixed instance storage are considered block devices by your OS. “Block Store” is largely a marketing term that has become entrenched among cloud infrastructure providers to refer to variable capacity storage.
8TiB storage capacity and more per server
As of August 2017, Linode’s Block Storage service is still in beta with maximum storage capacity of each block store capped at 1 TiB. This limit may or may not increase after the beta, but in any case, it’s unlikely to be lowered.
Every Linode can have up to eight disks in total. Before block storage, all eight of them had to be allocated from the fixed storage capacity of its plan. Now, all eight of them can be expandable block stores. Or they can be a mix of block stores and disks created out of the fixed instance storage.
This means you can create 8 block stores, attach them all to a humble $5 1GB Linode and have up to 8 TiB per server for your storage needs.
It works out well from a pricing perspective too.
Let’s say, for example, you wanted to store all your photos and videos. Previously, if you wanted 1 TiB of storage on a single server with no demand on RAM or CPU, the cheapest possible option was the Linode 64 GB plan with 1152 GiB storage at $480/month. You’d get your 1 TiB storage, but you’d also be paying for 64 GB RAM and 16 cores you didn’t need.
Even if your storage usage was only 50GB in the first month because you couldn’t upload all your photos and videos, you’d still have to pay $480. If you had cleverly anticipated that you wouldn’t be able to upload everything and had planned to upgrade the instance every month for additional 50GB storage starting from a Linode 4GB with 48 GiB of storage, you’d still have to pay $20 in the first month and double the costs month over month.
Now, with block storage, you can start with a 50GB block store for $5 and a Linode 1GB instance for $5, and pay only $10 in the first month with an additional $5 every month. It’s a drastic reduction in server costs.
For plans like the Linode 64GB and Linode 80GB which already come with more than 1 TiB of instance storage, it makes sense to use their instance storage, too, for data storage and add only 6 or 7 block stores if required.
More realistically, for any plan, you should definitely make use of instance storage for the swap and boot disks because they are likely to be faster.
The softwares, home directories and data files can be on a block store if your planned usage is to install some custom softwares for a particular goal and then use them periodically without requiring the server to be retained during unused periods.
- For example, if you indulge in creative hobbies such as digital graphics or photography or video editing or audio mixing occasionally, you can install a creative distro like Ubuntu Studio or AVLinux and a remote desktop server on a block store.
Whenever some light video editing work comes up, you can attach it to a lightweight server, like a Linode 4GB, finish editing and then destroy the Linode 4GB instance. The block store with its installed softwares remains intact.
If some heavy video editing work comes up, you can attach it to a more beefy server like a Linode 24GB, finish editing and then destroy the instance. The block store with its installed softwares remains intact.
- If you are into hobbies like penetration testing or reverse engineering, you can install a distro like Kali Linux on a block store with your custom scripts and tools, create a Linode whenever you think up some interesting approach, do your experimenting, and then destroy the Linode without losing your customized distro.
I’ll cover more use cases later in this article.
Since you are paying for each GiB of block storage used, an economical workflow is:
- Allocate the instance storage you get with the plan into boot, swap and data disks.
- Wait until these are filled with data close to maximum fixed storage capacity.
- Then, add the first block store disk, with a reasonable initial capacity.
- Next, expand it whenever it’s getting close to its current capacity.
- Wait until it’s filled with data close to its maximum capacity (currently 1 TiB).
- And then, add the second block store with reasonable initial capacity.
- Repeat until you have filled all 8 disk slots available for the Linode.
Each of these disks — including the block store disks — can also be partitioned as per your needs. So there is a lot of flexibility available to accommodate your usage and use cases.
- The biggest issue is that block storage is still in beta testing, which means problems are still being discovered and ironed out, and there are no durability guarantees for your data.
It’s recommended not to store anything critical in a block store while it’s in beta, without a solid, hot backup strategy to a fixed disk or an external storage endpoint.
- During beta period, block store capacity is available for free but capped at 75 GiB. Out of beta, its capacity can reportedly range from 1 GiB to 1 TiB.
- Block store capacity cannot be reduced using Linode’s API or manager applications, only increased. This is unlikely to change. It’s probably designed this way for safety, to prevent customers from inadvertently reducing capacity and corrupting or losing their data. Of course, there are other approaches to do it yourself, which I’ll explain later on.
- It’s currently available only in the US-east region (i.e., Newark datacenter). I anticipate it will soon be available in US-west region (i.e., Fremont datacenter).
- A block store is scoped to a datacenter. It can be attached only to Linodes in the same datacenter. This is not a “current limitation,” but its inherent behavior and is likely to remain so in future too.
However, that doesn’t mean the data in block stores can’t be used from other datacenters. In the sections below, I’ll explain multiple ways to do just that.
Attaching a block store - Details about mounting
I’ve previously briefly mentioned “attaching the block store to a Linode.” But what exactly do I mean?
Briefly, every block store has a name and can be either “attached” to a server or remain unattached. When attached, it appears in the server’s list of block devices with a name like this:
which is a soft link to the appropriate /dev/sd* device name.
Once attached, it can be treated like any other block device. Partitioning tools like fdisk or gparted, consolidation tools like LVM, cloning tools like partclone or Clonezilla, and filesystem tools like mkfs can be run on it.
If you format a block store with its own filesystem, instead of making it part of a LVM volume and filesystem, it can be temporarily mounted using mount or more permanently mounted via /etc/fstab.
Even if a Linode to which a block store is attached is shutdown, it still remains attached. The only ways to detach are either by destroying the Linode or using Linode APIs or manager applications to explicitly detach.
Hot-plugging block stores into and out of Linodes
The planned behavior seems to be that block stores can be smoothly plugged into and removed from a running Linode.
However, I have faced occasional problems of server reboots while hot plugging in block stores. These are possibly beta version bugs.
A block store is treated like any other block device, and can be divided into any number of partitions using standard tools like fdisk and parted.
Consolidating many block stores into a single mega disk
The workflow above involving creating multiple block stores as and when needed certainly works for increasing total capacity of the server. And it’s a fine approach for some use cases.
But it’s also dismissing with a handwave some realities. For example, most common web software — such as content management, forum management, photo gallery software and even databases — expect to store all their data in a single data directory on a filesystem. They just won’t have the logic to store data across multiple disks created by a workflow like the one above. If you are running a successful photo gallery SaaS with paying customers and their valuable photo data, you most certainly won’t have the time to retrofit third-party software quickly to handle multiple disks without any service downtime.
What you want is a setup in which storage capacity can be increased by adding multiple block stores on demand, but doing so transparently without your applications or customers facing downtime. Such an approach exists through the Logical Volume Manager, or LVM (version 2).
Briefly, LVM is a system software component that enables consolidation of multiple disks — which can be block stores or instance disks or a mixture of both — and shows them to the OS as a single disk with a single filesystem. If you have seven 1 TiB block stores, you can use LVM to show a single disk with 7 TiB capacity.
You can transparently add more block stores to the logical volume to expand its capacity without any client applications being aware of it. They just see the free space on the disk magically increase.
You can also store files that are larger than the capacity of a single block store.
If you are planning to use block stores for their expandability and plan to use them for a public facing service that cannot afford downtime, I would strongly recommend using LVM.
In addition to disk consolidation, LVM also offers up software RAID goodies such as replicated storage, striping and load distribution.
If you plan to use a ZFS filesystem, then it comes with its own logical volume capabilities and should be treated as an alternative approach to using LVM. ZFS is rather new and not yet proven in the Linux world, but it has forged a stellar reputation within the Solaris and BSD worlds.
Filesystem for block store — Ext4, XFS or ZFS?
Whether you prefer to install a filesystem directly on a block store or install one on a LVM volume consisting of one or more block stores, at some point, you’ll be faced with the question of which filesystem to install.
Debian, Ubuntu and Arch default to Ext4, while CentOS 7 and above default to XFS.
XFS is supposedly far better at handling large directories and large files, and is the recommended disk file system for distributed file systems like GlusterFS, Ceph and HDFS. If you are planning to use block stores for any kind of large file storage or database storage, look into XFS. Wikipedia gives an excellent explanation of XFS’s capabilities.
But XFS also reportedly consumes more RAM and CPU than Ext4, which means the block store can only be attached to Linodes with higher configurations. Also remember that the Linode itself is a virtualized machine with virtualized storage, and aspects like RAID and disk caching configuration of the underlying physical hardware has an impact on XFS performance.
Another aspect to be careful about is not to place an XFS journal device of a block store on an instance disk — if you lose the server, you may end up with severe data corruption. If performance is critical, make sure to run your own benchmarks using tools like iozone and fio under realistic loads.
ZFS is rather new and not yet proven in the Linux world, but it’s very feature-rich compared to everything else and has a stellar reputation from its performance in the Solaris and BSD worlds. ZFS is not just a file system but is also a logical volume manager. If you plan to consolidate multiple block stores as a single disk, evaluate ZFS against LVM.
I don’t have any personal experience with ZFS, but I recommend evaluating all its features if you plan to use the block store for any kind of critical functionality.
Expanding and shrinking a block store
Expanding a block store is simple and supported by the manager applications and by Linode’s APIs.
In contrast, shrinking it back is not. Neither the applications nor the APIs support shrinking a block store. It’s probably implemented that way to prevent inadvertent corruption of a block store’s file system. But if your block store sees frequent addition and deletion of large files, then it makes sense to implement some form of “shrinking” to reduce your storage costs.
To be clear, there is absolutely no way to actually “shrink” a block store’s size in place. The only practical approach is to copy all its data to a smaller block store and destroy the original block store.
You can do this in a number of ways:
- The most reliable way is to use LVM’s or ZFS’s resize and mirroring/moving capabilities. If you anticipate that your block store requires this kind of shrinking frequently, I recommend placing it under LVM’s or ZFS’s volume management from the start.
Both LVM and ZFS support adding a new smaller block store, mirroring an existing block store to it, and destroying the original block store without any downtime in filesystem availability.
- Another approach is lsyncd, which mirrors file changes in real time by combining inotify and rsync. It too should not result in any down time.
- The last approach is scheduled, simple copying of filesystems using rsync. It’s simple, but it may also require some downtime to avoid the two filesystems from going out of sync.
Sharing a block store across multiple machines
A block store can only be attached to one Linode at any point in time. However, this does not mean the data on it can’t be accessed from other machines. Some of the arguably “simpler” approaches to do that include the following:
- NFS (Network File System) is a network protocol for receiving and transmitting file system operations and data from a server on which a storage is mounted to client machines over TCP/IP channels. In the context of this article, the server would be the machine to which a block store is attached and mounted, while the clients would be all the other machines that require access to it.
The kernel modules required for NFS are already available in Linode’s kernels.
- DRBD (Distributed Replicated Block Device) is a way to synchrononously replicate operations to a local block device to other participating block devices across a network. So a Linode with an attached block store can have all block operations replicated to another Linode with its own block store. It’s not exactly “sharing”; more a kind of hot backup mechanism, but its consequence is that both servers see the same data.
The kernel modules required for DRBD are already available in Linode’s kernels.
- NBD (Network Block Device) is a way to expose a block device to client machines as a block device. Any block operation on the client is sent across the network to the server to be executed by the server.
The kernel modules required for NBD are already available in Linode’s kernels.
If none of those are suitable, then it’s time to bring in the heavy artillery — the massively distributed storage systems such as GlusterFS or Ceph.
Performance — Block Store vs. Instance storage
Detailed IO performance tests by the centminmod forum administrator back in June indicated that block store showed lower IOPS than instance storage. This is possibly due to block storage having higher availability and durability guarantees, and possibly due to having a different underlying architecture.
Whatever the reasons, it implies that block stores are not simple drop-in replacements for instance disks. I recommend you perform I/O performance tests using your application-specific realistic loads to evaluate if block stores meet your particular performance requirements.
Going far beyond 8 TiB
Everything you have read so far about block stores is in the context of a single server. But by combining block stores with distributed storage systems such as GlusterFS, Ceph and HDFS, it’s possible to scale far beyond 8 TiB to hundreds or even thousands of TiBs economically!
I’ve written extensively about GlusterFS, Ceph and HDFS in the past in the context of big data. At the time, there was no Linode block storage, which meant that capacity planning had to rely on instance storage, and had to be a careful trade-off between costs, capacity, and durability. For high capacity and durability, a lot of high-capacity expensive instances had to be created. At the time, I had recommended that in order to not waste all the RAM and CPU capacity that came along with those instances, systems should also run data processing logic on the same instances.
But block stores solve this problem to a large extent. Now that up to 8 TiB of block storage can be attached even to lower configuration, inexpensive instances, the economics of running these distributed file systems on Linode are radically improved.
I recommend reading those articles first, and then following some of the recommendations below.
One configuration option is to provision each block store as one “brick” with XFS filesystem (Gluster recommends XFS and stores filesystem metadata in XFS attributes). If there are n Linodes in the cluster, each with 8 block stores, there will be 8n bricks.
However, Gluster docs recommend that each brick should have its own LVM thinly provisioned logical volume, mainly to support snapshots.
Ceph does not seem to play well with LVM managed logical volumes because Ceph apparently relies on udev information and disk UUIDs. If using Ceph, keep block stores as independent block devices outside of logical volume mangement, create XFS filesystem on each block store, and configure each block store as a separate Ceph OSD (Object Storage Device).
Since HDFS is designed for data locality — keeping compute nodes close to the data — use relatively high configuration Linodes for DataNodes and attach block stores to them, so that they are capable of running CPU intensive data processing logic while also acting as data storage nodes.
It’s probably better to keep each block store completely out of logical volume management or at least keep each in its own logical volume, so that multiple data directories can be specified and IOPS throughput can be increased.
Cloning behavior — Block store and Linode cloning
If a Linode to which a block store is attached is cloned, I would expect it to clone the block store, too, so that any links to files on the block store are retained. It does get cloned, but there are other inconsistencies right now, such as listing only one volume in the list of volumes and showing the same volume in the disk lists of both linodes. I believe this is a bug currently in the beta version and have reported it.
Each block store can be independently cloned to another block store using either the manager applications or APIs. However, this didn’t work for me — attached or unattached. I suspect it’s just another bug in beta version.
Automating block storage operations using APIs
This section is targeted at developers. If you aren’t a developer, you can forego this section and jump to use cases.
Creating and Attaching a block store to a Linode using v3 API
1. Use volume.create to create an attached or unattached block store, and get a volume ID.
If initially unattached, use volume.update to attach it later.
The position of this entry in DiskList determines its device path. If it’s the third entry, it becomes /dev/sdc.
Creating and Attaching a block store to a Linode using v4 API
The sequence is much simpler using v4 API. Everything can be done with a single call.
Just POST to /linode/volumes to create a block store, and attach it to a Linode’s list of block devices.
Alternately, POST to /linode/volumes/:id/attach to attach a created block store to a Linode’s list of block devices.
Formatting and Mounting a block store
This information is useful for configuration scripts or stackscripts that are executed after a Linode with attached block store is booted.
The block store device has alternate device paths:
That prefix “/dev/disk/by-id/scsi-0Linode_Volume_…” is the same for every block store.
- /dev/sdX where X is in range [a-h] depending on the position of the volume in DiskList of active configuration.
Similarly, each of the block store’s partitions have alternate device paths:
That prefix “/dev/disk/by-id/scsi-0Linode_Volume_…” is the same for every block store and N is the partition number starting from 1.
- /dev/sdXN where X is in range [a-h] depending on the position of the volume in DiskList of active configuration and N is the partition number starting from 1.
Block Store use cases
In the sections below, I go through some typical use cases where block stores are useful.
Use case: Big Data storage
I’ve already covered how block stores drastically improve the economics of big data storage on Linode, using distributed storage systems such as GlusterFS, Ceph and HDFS.
In addition, large SQL and NoSQL databases can also benefit by storing their data on expandable block stores.
Some big data systems — especially ones that maintain a journal — are sensitive to IOPS. It’s best to benchmark block store performance before you plan any deployment.
Use case: Remote desktop with your own custom distribution and kernel
There are many situations where you’d want to use a machine as a desktop but with server-grade hardware.
Tasks that involve large files such as photo editing, video editing, audio mixing and graphics design can be much easier with large amounts of storage, many CPU cores and high RAM. If you do any of them only once in a while, instead of buying a high-end personal computer, it may be more economical to install a suitable distro such as Ubuntu Studio or AVLinux along with remote desktop software on a block store, attach it to a high configuration Linode whenever you want to, complete your tasks, and destroy the Linode. The softwares on the block store remain intact.
Use case: Dataset storage for Kaggle and other data science competitions
Participants in data science competitions conducted on platforms like Kaggle face some typical problems:
- Datasets are too large and take considerable time due to their internet connection speeds.
- Datasets are too large to store easily in their personal computers.
- Their personal computers may be underpowered for processing tasks.
- Participants working as a team cannot share their results or chain their tasks easily.
One solution is to download datasets to one or more block stores, set up NFS for making the data and intermediate results available to multiple Linodes, and set up a data processing cluster consisting of mid- to high-end Linodes. Most data cleanup, exploratory data analysis and machine learning tasks can be completed this way by an individual or team of participants.
About the only tasks that require external infrastructure are for GPU-accelerated deep learning models, but they too can be set up to access the dataset from the same Linode block stores.
Use case: Backup your local disks using Clonezilla
Clonezilla is a filesystem-aware partition backup software. Filesystem-aware means that unlike typical disk imaging software, it does not waste time or disk space copying sectors that contain no data or data of deleted files. Clonezilla images are magnitudes smaller than those produced by typical disk imagers.
Run Clonezilla on your personal computers to create image backups of their disks. Clonezilla supports storing the disk image on a SSH server or NFS share. A Linode can be setup with one or more block stores for disk image storage, and given as the destination to Clonezilla.
Since incoming bandwidth is free for a Linode, the only cost incurred is the storage for disk images.
Use case: Android OS build farm
The Android Open Source Project (AOSP) at this point has somewhere around 25 million lines of code.
Downloading just its source code repository takes about 2–3 hours, and consumes around 100GB of storage.
Building just a single branch takes another 2–3 hours and consumes 150–250 GB of storage. It has dozens of release branches.
It’s a good case study for using high CPU/high RAM instances for smooth builds, and block storage to provide sufficient storage. Since it supports distributed builds, a cluster of Linodes can be set up for the builds and share the downloaded code repository using NFS or a distributed file system.
Another advantage of using block stores is that once a round of builds is completed, the block stores can be retained while the instances are destroyed. It’ll avoid wasting 2–3 hours downloading the code next time a round of builds is required.
Use case : Photo & Video Storage
Photo and video storage are actually unsolved problems.
Users typically want to store their photos with confidence that they’ll be available even after decades. Even if the storage service is to shut down, they would prefer that their photos and videos are easily transferable elsewhere.
On the other hand, every provider of such services prefers a walled garden where they can lock-in their users. If a user has accumulated hundreds of gigabytes of photos and videos with an archiving service over many years, and it shuts down, it’s not easy to extract so much data via their APIs and transfer them elsewhere. There are also risks of having their photos and videos down-sampled or modified, and losing their originals or having to store their originals with yet another provider.
Instead, a good option probably is for every user to store photos and videos as plain files in their own managed storage. Even if a storage provider shuts down, since the data stored are just files, they can be relatively easily copied to another provider.
Such files can be stored on a Linode block store. Self-hosted photo management and photo gallery web applications, like piwigo (along with piwigo-video extension for video playback) or Lychee, can be installed on a Linode instance. Even if Linode shuts down their block storage or infrastructure services, transferring a bunch of files should be relatively simpler and faster than extracting from APIs.
Use case: Self-hosted clouds
Self-hosted file sync and file sharing softwares, like NextCloud, ownCloud, SeaFile, and Filerun, are ideal for storing all your personal data, documents, media and other files on block stores, and making them available to all your devices without relying on any other external service.
Linode Block Store has been a much demanded feature that has finally arrived. If you have any kind of storage needs at all or have particular performance requirements, just open a support ticket and Linode staff enable your access to block storage very quickly. I recommend trying it out and giving feedback to Linode about your experience.
I thank user “eva2000” for his very useful block storage benchmarks published on the centminmod forums.
About me: I’m a software consultant and architect specializing in big data, data science and machine learning, with 15 years of experience. I run Pathbreak Consulting, which provides consulting services in these areas for startups and other businesses. I blog here and I’m on GitHub. You can contact me via my website or LinkedIn.
Please feel free to share below any comments or insights about your experience using block storage. And if you found this blog useful, consider sharing it through social media.
While Karthik’s views and cloud situations are solely his and don’t necessarily reflect those of Linode, we are grateful for his contributions.