Rook: A Storage Orchestrator to Run Stateful Workloads on Kubernetes with Ceph

Cagri Ersen
Devops Türkiye☁️ 🐧 🐳 ☸️
11 min readJun 8, 2020

TL;DR
This article contains detailed informations about Rook and explains how to integrate a high available Ceph Storage Cluster to an existing kubernetes cluster by using it. If you know what rook is and just looking for a ceph implementation tutorial, skip the next section and jump to the “Deploying a Ceph Cluster with Rook” section.

Kubernetes and Data Persistence

In the early versions of Kubernetes, we intended to use it for our stateless services as it was designed for these kind of workloads at the beginning. For those days, data persistence wasn’t a concern, which running stateful services that tightly coupled to data on Kubernetes wasn’t a common approach. (There was no durable local storage support in late 2015 yet. https://github.com/kubernetes/kubernetes/pull/1311)

However, as Kubernetes becomes more popular platform, it started to provide more reliable features that changed our mindsets about scheduling stateful services on it. Currently, we are able to use many type of storage volumes including GCE Persistent Disk, AWS Elastic Block Store, vSphere Volume, Glusterfs, NFS and Ceph. So we‘re now more comfortable to run services that need robust storage backends.

Especially, if your Kubernetes environments are placed in a cloud provider such as Google Cloud Platform or Amazon Web Services, it will be easy to use a storage backend for any data persistence requirements, since all cloud providers offer their own reliable storage backends built-in. For instance if your k8s cluster is based on GKE, then your stateful services can be backed by GCE persistent disks.

But when it comes to on-premise Kubernetes environments, there are still some challenges as you know.

On-premises Challenges

For sure, it’s a bit more difficult to implement a storage backend in on-premise Kubernetes environments unlike the cloud providers. In such situations, you need to choose a production grade software defined storage solution, which should be seamlessly integratable to your existing environment and more it should be a robust, scalable, easy to manage platform.

Fortunately, there are many good solution that can be used as storage backend on Kubernetes environments. You can find out many of them, by googling “software defined storage for Kubernetes” keywords. However, integrating a software-defined-storage (SDS) solution to a Kubernetes environment is not an easy peasy task at all. As the storage backends are one of the most important components, they should be reliable; thus you need to implement them carefully for production use. In other words, you need to know what you’re doing; otherwise, it will be inevitable to be challenged by the day 2 problems.

Well, at this point, projects like Rook, that designed to seamlessly implement these kind of important components, become a very important aspects for cloud-native environments. As a storage orchestrator, Rook aims to handle all storage related tasks on Kubernetes environments for you. This is why it is hosted by Cloud Native Computing Foundation as an incubation-level project.

So, in this article, I’ll mention Rook’s features and this will also be a “how to” guide about building and integrating a Ceph Storage Cluster on top of a Kubernetes.

What is Rook ?

As it defines itself:

“… an open source cloud-native storage orchestrator, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.

Rook turns storage software into self-managing, self-scaling, and self-healing storage services. It does this by automating deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management. Rook uses the facilities provided by the underlying cloud-native container management, scheduling and orchestration platform to perform its duties.”

So, you can use Rook to implement software defined storage building blocks for your Kubernetes environments in an automated manner. It runs as an operator in Kubernetes cluster and enables selected storage system to run on the same Kubernetes cluster. In this way, rook automatically configures a CSI (Container Storage Interface) driver to mount storage spaces to your pods.

While, in this article I’ll explain that how to implement a high available Ceph storage system to a Kubernetes cluster for providing block storage, object storage, and shared filesystems mounts to pods, you can to take a look for other Storage provider options by visiting https://github.com/rook/rook/blob/master/README.md#project-status

Ceph Cluster on top of Kubernetes

As you know, Ceph is a highly scalable distributed storage solution that has a very extensive usage on production infrastructures just as it defines itself:

Ceph is for you! Ceph’s foundation is the Reliable Autonomic Distributed Object Store (RADOS), which provides your applications with object, block, and file system storage in a single unified storage cluster — making Ceph flexible, highly reliable and easy for you to manage.

It can be implemented Kubernetes to provide storage for applications, however -as I mention above-, to achieve this, you need to know all aspects of Ceph’s dynamics to build a production ready cluster and you should make a reliable Kubernetes integration.

This is where Rook comes in… It’ll handle all Ceph installation and management tasks for us to provide Dynamic volume provisioning for our applications.

How it works ?

Before we go further, it’s worth to mention how Rook integrates Ceph to Kubernetes clusters.

As you can see in the image below, it uses Kubernetes primitives to run on Kubernetes.

Ceph rook architecture

So, you basically deploy Rook on your Kubernetes and it takes care of the rest to build, manage and monitor a Ceph cluster.

Typically, Rook uses custom resource definitions (CRDs) to create and customize storage clusters and these CRDs are implemented to Kubernetes during its deployment process. Also it has an operator within Kubernetes named Rook Operator for Ceph which automates configuration of storage components and monitors the cluster to ensure the storage remains available and healthy.

Rook, also runs a daemonset named rook-discover that start a discovery agent pod on every nodes of your Kubernetes cluster to discover any raw disk devices (no partitions or formatted filesystems) or raw partitions (no formatted filesystem) that can be used as Ceph OSD disk. If minimum required number of disks (for redundancy purposes should be at least 3) are discovered, rook operator uses these disks to build the Ceph Cluster. If you add more disks to your nodes, the operator discovers them via its agents and scales the ceph cluster automatically. As well Rook configures Ceph-CSI drivers to provide Block Storage and Shared Filesystem mount support for pods.

Also, for monitoring requirements, Rook enables Ceph Dashboard and each Rook Ceph cluster has some built in metrics collectors/exporters for that can be scraped by Prometheus.

Deploying a Ceph Cluster with Rook

A Ceph cluster deployment with rook is a quite straightforward process that we only need to clone a git repo and run a few Kubernetes manifests via kubectl. That’s all.

But first it will be good to clarify requirements and deployment options.

Requirements

OSD Disks and Rook Modes
Rook needs to use some storage areas to create Ceph OSDs which will be base the cluster. In order to provide these areas, there are two different mode that named “Host-Based” and “PVC-based”.

  • Host-based Cluster:
    In host based cluster mode, rook detects all raw (unformatted) devices or raw partitions on all (or selected) Kubernetes nodes and creates the ceph cluster. In this mode, all you need is to attach some raw disks to your nodes. (Note that, minimum three disks on three different nodes are required to build a high-available ceph cluster)
  • Note that, minimum 3 disks are required to build a high-available ceph cluster
  • PVC-based Cluster:
    On the other hand, PVC-based cluster requires a bit more things, like an existing StorageClass. Because in this mode rook dynamically claims number of PV from the StorageClass that you define, and uses them to create OSD disks. Also this mode requires, Kubernetes version =>1.13.0.

Since, Host-based cluster mode is the most simple way to build a ceph cluster, we’re gonna use this mode in this article. So on my setup, I have three raw disks are attached to three k8s nodes each.

Deployment

At the time of writing this article, the latest release of rook was 1.3.5 which we’re gonna deploy this version, but please check its releases page to be sure you’re on the latest version.

First step is cloning rook’s 1.3 branch.

At this point, we will apply three k8s manifests which reside in rook/cluster/examples/kubernetes/ceph directory to build the cluster.

So let’s apply them one by one:

common.yaml
This manifest, basicaly creates some “CustomResourceDefinitions”, “ServiceAccounts”, “ClusterRoles” that requires by cluster formation process.

So apply this yaml.

Rook common.yaml setup

As you see that a bunch of resources are created.

Now we’re ready to deploy Rook system components by applying operator.yaml manifest.

operator.yaml
This manifest simply creates a deployment that deploys a rook-ceph-operator pod and its configmap. This operator is the main component of Rook that responsible to automatically create and manage Ceph clusters on top of our Kubernetes cluster.

So, we just create it:

Rook operator.yaml

After few seconds, the components should be running like below:

Status of rook operator after its deployment

You can see that there is a “Deployment” named rook-ceph-operator which responsible to manage ceph clusters as I mentioned before.

Also, there are a daemon set named rook-discover which runs a three pods (as my test k8s cluster has three nodes). This service is regularly checks for any raw device or partitions on k8s nodes to use them as a member of Ceph cluster.

Note that, any rook related resources is placed in rook-ceph namespace.

cluster.yaml
Ok, rook is ready and we can deploy the ceph cluster by running cluster.yaml manifest. This manifest simply tells rook that how we wanted to create a ceph cluster.

By default it’ll use all nodes and all (raw) devices or partitions to create Ceph cluster as follows.

Rook ceph cluster definition

As I mentioned before, I have three raw disks attached to my three k8s nodes and with this setup, rook will automatically detect and use all of those disks to create Ceph OSDs.

Also note that, rook will use /var/lib/rook directory on each nodes to cache ceph mon ad osd configuration data. If you have previous rook managed ceph cluster, you need to delete the data on this directory.

So, we apply the manifest:

Rook ceph cluster deployment

After a while, rook will create all necessary resources to form up a ceph cluster. When you check the status of the deployment, you should see something like below:

What we see in this output is, the ceph cluster is successfully deployed and there are a bunch of new deployment.apps are in placed. For example, rook-ceph-mgr, rook-ceph-mon and rook-ceph-osd deployments, manages pods that related to ceph internals (which I don’t mention them in detail as this is not an entry level Ceph documentation. ) Also, rook has deployed some CSI plugins named “cephfs” (for mounting ceph volumes to pods) and “rbd” (for mounting raw blocks to pods).

Checking Ceph Cluster Health

As it seems our cluster has deployed, we can check its status by either using its dashboard or using ceph commands via ceph toolbox pod.

Accessing Ceph Dashboard

By default, rook enables ceph dashboard and makes it accessible within cluster via “rook-ceph-mgr-dashboard“ service. So we use kubectl’s port-forward option to access the dashboard from our local computer.

Now, should be accessible over https://locallhost:8443

Default username is admin , and its password can be collected from rook-ceph-dashboard-password secret like below:

So, it looks like everything is OK as it says on Cluster Status section below.

Rook ceph dashboard

Since ceph is high-available storage system, there are three Monitors is available in order to avoid create a SPOF (single point of failure). Also, we have three OSDs which uses the raw disk devices that attached to k8s nodes each. You can scale your OSDs by adding more raw devices to your nodes. As I mention before, Rook will automatically watch for new nodes and devices being added to your cluster to scale OSDs.

As you may notice, there is only one Manager daemon currently. Although you can think that it would be nice if it has some standby daemons, rook leverages Kubernetes auto healing feature on this. As manager service is about to provide additional monitoring and interfaces to external monitoring and management systems, Rook maintainers have decided to improve its eviction process and not implement standby manager daemons as they discussed it at (https://github.com/rook/rook/issues/1796)

Accessing Ceph Cluster via CLI

Another options to interact rook ceph cluster is using rook toolbox. It’s a container that contains common tools used for rook debugging and testing ceph; also the toolbox is based on CentOS, so you can install any other ceph related tools on it via yum.

In order to create this container, we can use “toolbox.yaml” manifest which resides in /rook/cluster/examples/kubernetes/ceph:

At this point, we can attach the toolbox.

Now, we’re ready to use ceph commands. For example, to get cluster status you can use “ceph status” command:

To see osds status:

For other ceph control commands, you can check its documentation at https://docs.ceph.com/docs/giant/rados/operations/control/

Ceph Cluster Storage Modes

OK, we have a ceph cluster is in place and now it can be utilized.

Typically rook ceph cluster provides three different storage mounting mode for pods. While, the mostly used ones is “block storage”, “shared filesystem” and “object storage” are also supported.

Block storage, allows a single pod to mount storage that typically used for databases, nosqls etc. (in this article we’ll use this mode as an example)

On the other hand a shared filesystem can be mounted with read/write permission from multiple pods. It’s suitable if your application requires to access to the same set of data.

And object storage exposes an S3 API to the storage cluster for applications to put and get data over http.

Creating a StorageClass for Rook Managed Ceph

In order to provision block storages a “StorageClass” and a “CephBlockPool” should be created first.

To do this we’ll apply a “storageclass.yaml” manifest which placed in “cluster/examples/kubernetes/ceph/csi/rbd/storageclass.yaml

But first, let’s take a look at the manifest:

As you see, it declares two Kubernetes resource that I mentioned.

CephBlockPool definition will create a ceph pool named “replicapool” by using three OSDs to set three replica; so our example will require at least 1 OSD per node, with each OSD located on 3 different nodes; because the “failureDomain” is set to “host” and the “replicated.size” is set to “3” in the manifest.

And StorageClass definition will create a SC which will utilize the “replicapool CephBlockPool. So whenever you want to dynamically provision a storage space for your pods, you can request it from this storageclass to get the space from ceph cluster.

OK, we create the resources:

Now, you should see a storageclass named “rook-ceph-block”:

And also, “replicapool” ceph pool should be in place:

Note: If you didn’t run toolbox deployment, the command above won’t work obviously. In this situation you can check the pool from ceph dashboard.

Right now, we’re ready to dynamically provision storage spaces for pods.

Dynamic Provisioning Example

In this section, we’re gonna create a sample wordpress app by using example “mysql.yaml” and “wordpress.yaml” manifests reside in rook/cluster/examples/kubernetes/ directory. These manifests will dynamically provision two persistent volume which will be created by Rook for mysql and wordpress data placement.

By default, example wordpress manifest creates a service for port 80 by using LoadBalancer service type, however my test Kuberbetes hasn’t a cloud controller to interact any cloud based external load balancer; so I’ll change the type as ClusterIP which the service will be available from inside the Kubernetes cluster as this example is for demonstration purpose.

So let’s create the app:

As you see from the output above, two “PersistentVolumeClaim” has created for mysql and wordpress. Let’s see the PersistentVolumes and PersistentVolumeClaims:

It looks like, they are created and successfully bounded.

And here is the output of the deployment, pods and services related to wordpress app.

Now, we can fetch wordpress enrty page from one of our Kubernetes nodes by using wordpress service’s ClusterIP:

It seems, our wordpress app is running and writes its data (both mysql and wordpress http server) to the Rook managed ceph persistent volumes that created dynamically.

Wrapping Up

Just we’ve seen it in action, rook is a very promising project that simplify ceph cluster deployment and integration processes.

As, it’s encourages people to run stateful applications -that requires reliable storage systems within on-premises environments-, I’ve always liked such kind of good quality projects, which expands Kubernetes’ use cases and boost the ecosystem.

--

--