Storage is the Achilles Heel of Containers

This article outlines our thesis for leading a Series A investment in StorageOS to address that problem. If you’re not into containers or storage, stop now — you’ll find the article too long and too boring. (French mathematician Blaise Pascal was apparently the first ever to have this problem.) Unnecessary references to French mathematicians is also among the reasons this article is so long.

You’re still reading so you must love containers or storage. We would love your feedback, particularly flaws in our thinking that may help us or StorageOS. I’ve tried hard to not make this an advertisement, but to lay out the thoughts that led us to our conclusions.

Containers are rapidly being adopted by all sizes of enterprises which use them to quickly build, deploy, and scale cloud native applications. But enterprises face a number of challenges in aligning storage to their container applications.

Challenges with Container Storage

First, containers were designed to be stateless — they do not natively support requirements for databases, application state and instrumentation data.

Second, in the fast-moving container compute environment, traditional storage is too slow, complex and expensive. The cost model for storage involves capital expenditure spikes and complex refresh cycles. The enterprise needs more agible, policy-driven, and software-based storage for containers.

Last but not least, containerized and dev-ops environments need automation, portability and orchestration. Traditional storage architectures are too complex to support these characteristics. They lack API functionality and don’t support integration, and cannot be moved to public cloud. So the legacy environments don’t scale with apps, and their performance is unpredictable. It’s also difficult to move data securely within and between environments. And management and performance tool sets are lacking.

To be successful, container ecosystem technologies must deliver a frictionless developer experience, clean integrations, and extensible APIs while allowing enterprises to maintain all the control and security they currently have in their data center. And just as for container technologies such as Docker, commercial models for container storage must also encourage the developer community to develop, test and prove success with the products easily and freely.

Thesis

All of this led us to the thesis that there is a need for simple-to-use, cost-effective persistent container storage that is highly portable across physical, virtual and cloud infrastructures.

StorageOS was working on software that “presents” storage to a container-based application that is policy-driven, scalable, deterministic, and low-latency. It simplifies the development, provisioning and management of container-based apps; all while encrypting data at rest and in-flight, for containers running anywhere — on bare metal, or on virtual machines — or in cloud storage. And they were committed to making both “dev” and “ops” easier with native integrations with orchestration tools like Docker Enterprise, Kubernetes (OpenShift), and Mesosphere.

Container Storage without Third-Party Products

We started by asking, what’s the best thing enterprises can do for container storage without using any third-party products, including StorageOS. To make the scenario more concrete, let’s say you are running your containers in Amazon Web Services (AWS) and thus have access to all the services offered by AWS. (The same would apply to containers in any other public cloud.)

Could you not provision Amazon Elastic Block Storage (EBS) volumes and attach them to the instances on which their application containers are running? Containers could then use some naming convention based on their own id to have a particular place in the filesystem on the volume that’s been provisioned, so that no matter which instance that container is later moved to, it can always find its own storage.

In fact Amazon advertises that, “EBS Elastic Volumes allow you to dynamically increase capacity, tune performance, and change the type of any new or existing current generation volume with no downtime or performance impact. Easily right-size your deployment and adapt to performance changes. Simply create a volume with the capacity and performance needed today knowing you have the ability to modify your volume configuration in the future, saving hours of planning cycles.”

Amazon also says that EBS-Optimized EC2 Instances can get you 10,000 Mbps between the EC2 and EBS instance. AWS aside, Docker has volume drivers for Azure File Storage and Google Compute Engine persistent disks.

That doesn’t seem so bad, right? What’s the big issue? How do third-party products such as StorageOS make it better? Why not just use EBS or the venerable Network File System (NFS)?

Why Not Just Use EBS or NFS?

Certainly, without StorageOS or any other third-party products, you would be able to use local EBS volumes, or perhaps NFS using the directory-mount or named-volume method.

EBS would be the preferred option since a block device would be needed for database use cases (performance and consistency). EBS volumes can only be presented to an individual node at a time, however. If a container moved between nodes, the failover operation would be a manual process. i.e., the EBS volume would have to be moved between nodes through the AWS API or console. Moving volumes between nodes involves complex operating system configuration (with varying degrees of complexity that would depend on the Linux distribution and version that was being used). There would also be a variety of technical, operational and performance disadvantages that are highlighted later in this article.

If NFS is used, then the shared directory could be mounted on each node and named volumes would be able to provide a consistent mount for a container as it moved between servers. However, this would require a dedicated server instance to act as an NFS server which would imply a single point of failure for that share. Furthermore, an NFS share would be inappropriate for database use cases since it would not support the performance or the consistency profile required by a database application.

AWS does support a “shared filesystem as a service” through the Elastic File System (EFS) offering, but that has numerous, well documented issues with performance, consistency and file locking. NFS or EFS would ideally be suited to application containers that need to share mostly static files between containers (such as a shared application configuration or static reference content for a front-end service).

Finally, container and microservices deployments often use network overlays to create private networks between each of the defined services. Bridging the NFS network namespace to the private overlays of the applications will add another layer of complexity and operational overhead for each microservice when using NFS. StorageOS provides transparent shared volume services with native filesystem level integration to the container namespace, which avoids this issue entirely.

EBS Peculiarities and Limitations

Amazon’s EBS is a mature offering with a number of performance options at different price points. EBS, however, has several limitations.

Reliability and High Availability: Amazon states that EBS volumes are designed for an annual failure rate (AFR) of between 0.1% — 0.2%, where failure refers to a complete or partial loss of the volume, depending on the size and performance of the volume. This statement provides developers with a false sense of security. In fact, this statistic is only valid if you back up your data constantly and never have a delta of more than 20GB since your last backup — of course, all the backups consume capacity and Amazon charges you for them! Depending on utilization, backup profiles, and number of volumes, most organizations are seeing a number of real volume failures every year; bottom line, AWS EBS volumes are unreliable. Additionally, Amazon states their service availability statistics do not take into account any poor performance of the EBS volume, making it possible for an EBS volume to be only processing a trickle of I/O due to a failure condition, yet still be deemed “available.” StorageOS provides replicas across nodes and EBS volumes and provides seamless uptime and transparent failover should an EBS volume or an EC2 node suffer a failure. StorageOS is also able to synchronously replicate across AWS availability zones to ensure data availability during serious cloud outages. By providing storage level replication of database volumes across availability zones, users can additionally save the expense and operational overhead of running multiple EC2 instances for database level replication.

Failover and Volume Mobility: EBS volumes can only be presented to the operating system on an individual EC2 instance. If a container moves (i.e., changes nodes) then moving a volume along with the container involves detaching the EBS volume from the source node and re-presenting it to the destination node. This is not a transparent process! The orchestrator has to manage operating system device id’s across the cluster. EBS detach operations can have lengthy delays if applications are not shut down cleanly: in a number of reports in Kubernetes forums, users are reporting delays of up to an hour. These delays are not acceptable during a standard application / container move or during application scaling operations. StorageOS, on the other hand, presents storage to the application in an operating-system-agnostic way and can move the volume presentation instantly such that a storage volume can transparently move with the application or database container instance. Furthermore, data in EBS volumes is locked into AWS. StorageOS is platform-agnostic and can run in different cloud providers as well as in on-premises deployments, so users have the ability to migrate and replicate their data in hybrid environments.

Performance: EBS only supports 3 IOPs/GB for standard volumes with a limit of 3000 IOPs per volume regardless of size and a limit of 50 IOPs/GB for performance SSD. This is limiting for many applications and results in AWS users overprovisioning their EBS storage to get access to additional performance or alternatively using Linux volume managers to combine multiple volumes into a single larger volume. This is both expensive and complex to manage. StorageOS creates a storage pool transparently across multiple EBS volumes and across multiple nodes providing a simple way to balance the load evenly across both hot and cold EBS volumes. StorageOS also supports local in-memory or ephemeral SSD caching to accelerate applications and databases while reducing the backend EBS demand. Additionally, StorageOS can systematically reduce the amount of data written and read from each device using compression, further improving performance.

Encryption: EBS has built-in encryption functionality, but it requires the application or container to hand over the key to the AWS API. This may not be compliant with regulations and privacy policies in some jurisdictions. StorageOS provides an inline encryption function which is platform agnostic and uses keys that are only known to the application/container and that are managed by the end user and not the cloud or service provider. Using this feature, an organization can ensure compliance with applicable regulations and also avoid lock-in of using a non-portable cloud-specific technology.

Management: StorageOS creates a storage pool across all the nodes in a cluster using a combination of EBS and S3 storage to provide an extremely simple way of managing any number of volumes for applications and databases. All the StorageOS volumes are thinly provisioned — this means that a volume only consumes the amount of data that has actually been written in the pool. This allows users to create large volumes and can cope with future growth of the application while not deploying excessive EBS storage up front. It is perfectly feasible, for example, to create a 250TB volume for a container with only 10GB of EBS capacity in the storage pool. StorageOS is building AWS integration so that StorageOS will automatically add EBS volumes dynamically to the storage pool as the front-end consumption increases, providing a safe scalable solution that reduces up front cost. StorageOS volumes are provisioned instantly and do not require any AWS API calls to connect to the host operating system. The volumes can also be dynamically resized on demand as utilization grows while the application is still online.

Policy and Automation: The StorageOS control plane can use volume labels from Docker or Kubernetes to define and enforce data placement and data services. Some example use cases might involve placement in specific cloud geographies or the use of specific encryption keys or replication policies based on a simple label definition.

Docker Plug-Ins for HPE 3PAR, EMC (XtremIO, VMAX, Isilon), and NetApp?

There are Docker storage plugins to automate storage provisioning from HPE 3PAR, EMC (XtremIO, VMAX, Isilon), and NetApp. How is their approach superior / inferior / different from the StorageOS approach, we wondered.

The storage plugins for EMC, NetApp and HP are just API endpoints and require the hardware based array to present the volumes of storage to operating system instances running the container environment.

These storage systems are presenting storage through a storage fabric such as FC or iSCSI to operating system instances. This means that as containers move about, there are complex operating system and storage fabric configurations that need to take place to present and un-present storage volumes between nodes.

Due to the complexity and limitations involved in un-presenting and re-presenting volumes to the operating systems, many plugins limit the ability to move volumes and can thus impose an unacceptable delay when moving containers between nodes.

Many of the plugins for hardware arrays cannot handle dynamic provisioning and adopt a pre-provisioned approach. This may be due to API limitations or the speed at which the array can provision a new volume. Effectively, the capacity on the storage array is split up into volumes of different sizes and the orchestrator or plugin attempts to perform a best match on the requested volume attributes (like size or replication) against the volumes that are available. Not only is this hugely inefficient (the array capacity must be carved up whether the volumes are actually in use or not) but often results in containers being allocated volumes that are overprovisioned (e.g. a 100GB volume was requested but only 1TB volumes are available). StorageOS has full API integration with Docker and Kubernetes and fully supports Dynamic Provisioning and is able to immediately instantiate a volume in less than a second so the container can be started and volumes created dynamically as needed. If the volume usage increases over time, the volume can be resized online without restarting the container.

Importantly, hardware based storage systems that we’re discussing in this section can only be used within an on-premises data center and cannot be utilized in a cloud environment. I discuss software-defined storage systems (such as EMC ScaleIO) in the next section.

The block products (e.g. VMAX) have complex compatibility matrices with strict requirements on operating system version, patch levels, network drivers and storage fabric drivers such that upgrade requirements at the server end or at the array often require a carefully choreographed upgrade process across large sections of the estate. StorageOS works at the application level without any kernel dependencies and therefore has the broadest compatibility across any Linux environment in both on-premises and cloud environments.

Most of these systems do not support the full suite of data services and have complex limitations on how they can be deployed. For example, all the arrays listed here will only be able to replicate to a small number of models in a particular series or generation of the array. There are no options for cross platform replication or migration of data to or from the cloud.

Many of these systems were not architected for inline compression, dedupe or encryption and although the data service may have been retrofitted later, they often have performance limitations and/or perform services out-of-band and/or have limitations on which storage features can be used concurrently.

EMC ScaleIO?

Much of the same considerations apply to software-defined solutions such as ScaleIO. In addition, users also generally need to deploy and manage separate VMs (and Hypervisor licenses) for the ScaleIO software, although separate packages are also available. Kernel drivers are needed on all clients.

As ScaleIO is deployed as VMs, it is unclear if AWS is an option for production use cases; there may be some options for development use cases in AWS.

Customers we spoke to told us that ScaleIO is unwieldy, and often costs more than the equivalent hardware in many use cases and consumes considerable resources (CPU and memory) from each node that it is installed on.

Furthermore, this year, Dell EMC announced that ScaleIO will now be sold as part of the VxRack hardware range and will no longer be sold as a software only product.

StorageOS has been specifically designed to use a minimal CPU and memory footprint and can perform efficiently in hyper-converged deployments or small cloud instances. Other software defined solutions have much higher requirements (especially when using data services like compression or encryption) and therefore consume a higher percentage of the compute resources leading to larger cloud / on-premises costs.

Docker Storage Plugins from Hedvig, Nexenta, Kaminario?

We read many articles about many storage vendors writing Docker storage plugins. We had to figure out how this was different than the StorageOS approach.

Docker plugins provide the API endpoints that Docker can use to interact with a storage system. Many vendors are building Docker plugins, and these are available at Docker Hub. The StorageOS plugin is also published there.

Some Docker storage plugins are simple and elegant. But some are shims that interface to other complex frameworks or subsystems. As an example of a shim, EMC’s RexRay is a framework which contains support for a number of storage systems that are supported by EMC. RexRay is a framework with multiple components, services and even a database, which for starters makes installation a challenge. Numerous users have provided us with anecdotal feedback that the experience was complex and unreliable, and some of them aborted the install half way through after struggling to complete the process. Rexray is now a community driven project.

In general, most plugin frameworks will add an additional layer of complexity on top of their legacy technology, with more moving parts that can go wrong, and inevitable lock-in which makes it hard to move between environments.

StorageOS provides a whole storage system and all the API integrations in a single container. StorageOS is not just a plugin — it is the storage layer too. The plugin deployment is just a convenient way to integrate into the Docker platform as well as other Orchestrators. StorageOS also has both native drivers and Container Storage Interface (CSI) support to integrate seamlessly with Kubernetes.

Virtuozzo and Quobyte?

Virtuozzo and Quobyte came up in Google searches for Docker container storage (there are others). We had to know what they were doing that was different than StorageOS.

Quobyte and Virtuozzo use a distributed architecture similar in concept to Ceph and are not really designed for deterministic performance as needed by databases, message buses, and other primary data use cases.

Quobyte is a distributed filesystem (based on the less well known open source filesystem called xtreemfs) and is targeted at shared storage requirements. Virtuozzo is targeted at service providers and was previously optimized for VMs since it was part of the Russian company, Parallels. It is a platform for hosting providers (webhosts, VPS etc …) and the storage solution is focused on working within their overall hosting platform.

Both systems have a higher resource overhead and need larger deployments than StorageOS with much higher minimum node counts for an initial install and often require dedicated nodes just for the storage subsystem. Additionally, they require specialized clients (kernel and filesystem drivers), so are considerably harder to deploy in enterprise environments or in existing installs.

Gluster

Gluster is a distributed filesystem — it is fairly well known as it is open source, has been available for a long time, and is simpler to implement than Ceph. RedHat also recommends Gluster over Ceph for container environments, and has recently branded Gluster as Container Native Storage. As it is accessed through NFS (with all the implications & limitations of NFS, as previously described) or a fuse driver (which may have performance issues) it is targeted at applications that need shared filesystems, and is not optimised for database or primary data use cases.

What is Rook?

Rook is an unusual category of tool — a storage orchestrator. It is sponsored by Quantum and Upbound. As a storage orchestrator, Rook is not the actual storage system; instead, Rook automates the deployment and management of a storage system. Primarily, they have been focussed on orchestrating Ceph, as Ceph is a very complex product to install and configure. Rook helps to deploy Ceph within a Kubernetes cluster and once a volume is configured through the various Rook commands, the volume is then accessed via the Ceph volume drivers as Rook does not actually have direct Kubernetes drivers/integration itself. Rook has been voted in as a Sandbox project within the Cloud Native Computing Foundation (CNCF) — which highlights the need for persistent storage for cloud native applications. Rook is also looking to expand beyond Ceph and is adding support for other storage systems, although it is uncertain if Rook will be able to gain mindshare from other projects in the longer term.

This has come up on the CNCF Storage Working Group calls which include the founders of StorageOS, Alex Chircop, and leaders from all the major vendors, as well as a number of independent thought leaders. The working group has decided to better define the taxonomy and nomenclature to help reduce confusion in this space — hence products like Rook may be categorized as a Storage Orchestrator, products like RexRay may be categorized as an API framework and products like StorageOS may be categorized as an Orchestrated Storage Platform.

Is a Storage Orchestrator a good place to be?

It’s unlikely. Because probably one of two things will happen in the long term: either (a) the orchestrator itself (e.g. Kubernetes) will continue to add functionality to further allow the deployment of infrastructure level components or (b) the storage systems will improve how they deploy in orchestrated environments. This means that while there may be short term requirement for help with automating complex solutions like Ceph, in the long run those problems will be solved elsewhere.

By comparison, StorageOS has a simple deployment model which only requires a single container to be deployed to each node — and that includes a fully managed control plane — this makes it easy to automate with a DaemonSet or a Helm Chart such that Kubernetes itself will handle the deployment of StorageOS automatically on all nodes of the cluster. As a result, StorageOS does not have a requirement for a storage orchestrator — which provides a much better user experience for enterprise customers.

Portworx?

Portworx is probably most similar to StorageOS in terms of targeting the same orchestrated storage market, and had a head start in the market. However, it is missing key functionality that is needed by enterprises. StorageOS is focused on enterprise pain points, such as the following:

  • Kernel dependencies— Portworx has a dependency on their own propriety kernel module which has to be installed on customer Linux installs. This is a significant barrier to entry for most enterprise environments which typically lock down Linux build images. StorageOS has no dependency on proprietary kernel modules and is designed with a modular front end that utilizes built-in Linux capabilities to provision virtual volumes to containers. As a result, StorageOS can run anywhere that 64-bit Linux can be installed.
  • CPU and Memory overhead— the Portworx solution has significant CPU and memory overhead — documented minimum requirements are 4 CPU cores with 4GB of RAM. This means that a Portworx deployment will consume a higher proportion of hardware or cloud resources, which is not cost effective and amplifies the impact of other software license costs which are charged per core (e.g. VMWare, RedHat, Oracle etc.). StorageOS has been architected to minimize overhead and utilize all the acceleration features built into modern CPUs to provide good performance with 1 or 2 cores and 2GB of RAM.
  • External dependencies — Unlike Portworx, StorageOS is fully self-contained and has embedded all core functionality, which reduces the need for external dependencies such as a key value store — this simplifies deployment and operation of the storage platform.
  • Encryption — Portworx uses the Linux Device Mapper to wrap a Portworx volume and uses Linux to actually encrypt the volume — this adds additional storage layers and increases the overhead of implementing encrypted volumes. StorageOS has implemented inline encryption within the product using optimised code paths that reduce overhead and allow volumes to be securely encrypted in a transparent and portable manner.
  • Compression— Portworx does not support compression or any other data reduction method. StorageOS has a patent pending method for dynamically encoding compression on a block per block basis and uses inline compression technology to reduce the amount of storage required on the backend, as well as accelerate performance by reducing I/O overhead on the network for replication.
  • Replication— Portworx utilizes a message bus to distribute replication traffic on the network. StorageOS has a developed a mesh protocol for replication which is optimized for disk block throughput with minimal latency whilst supporting all the advanced data services such as compression and encryption.

Conclusion

Without StorageOS — or something like it — you will have difficulty scaling container-based applications that have state, and storage will be the Achilles Heel of your large container deployments.