Deploy Rook/Ceph on ICP

Recently IBM API Connect v2018 has identified some “ severe performance degradation” when using GlusterFS. Please checkout the notes on IBM Knowledge Center.

Its the time to explore other storage options other than GlusterFS. One option is the Ceph Storage. In this story, we will explore Rook, which

runs as a cloud-native service for optimal integration with applications in need of storage, and handles the heavy-lifting behind the scenes such as provisioning and management.

Rook use the Kubernetes Operator pattern to automate and manage the storage for the cluster.

Lets see how I deployed Rook onto ICP 2.1.0.3.

Step 1. Add Raw Disks

Lets first add additional raw disks to the workers’ VM. As usual, I use the govc command line to manage vCenter.

govc vm.disk.create -ds=ICP_datastore -name ceph_disk_dev-work1 -size=100GB -vm dev-work1

New raw disk /dev/sdc is then created for the three worker nodes.

Step 2. Deploy Rook Operator

Lets clone the git repo first, git clone https://github.com/rook/rook.git

The rook operator yaml file located under rook/cluster/examples/kubernetes/ceph/operator.yaml

Deploy it with kubectl apply -f operator.yaml

After deploy the operator, the expected ceph-agent, rook-discover is not running.

$ kubectl -n rook-ceph-system get all
NAME READY STATUS RESTARTS AGE
pod/rook-ceph-operator-86776bbc44-vxsl6 1/1 Running 0 4h
NAME                             DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/rook-ceph-agent 0 0 0 0 0 <none> 3h
daemonset.apps/rook-discover 0 0 0 0 0 <none> 3h
NAME                                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rook-ceph-operator 1 1 1 1 4h
NAME                                            DESIRED   CURRENT   READY     AGE
replicaset.apps/rook-ceph-operator-86776bbc44 1 1 1 4h

Check the daemonset which has 0 pods running.

kubectl describe daemonset.apps/rook-ceph-agent -n rook-ceph-system

I found the below events,

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 19s (x101 over 4h) daemonset-controller Error creating: pods "rook-ceph-agent-" is forbidden: unable to validate against any pod security policy: [spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used]

In ICP 2103, since the Pod Security Policy is turned on by default, we need to assign the hostNetwork and privileged ClusterRole to the service account rook-ceph-system,

kind: ClusterRoleBinding
metadata:
name: sc-rook-ceph-system-privileged
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: privileged
subjects:
- kind: ServiceAccount
name: rook-ceph-system
namespace: rook-ceph-system

Save the above as a file, then run

kubectl apply -f rook-ceph-system_bind.yaml

Now the agent and discover pods are running,

$ k get -n rook-ceph-system pods
NAME READY STATUS RESTARTS AGE
rook-ceph-agent-6khsn 1/1 Running 0 15s
rook-ceph-agent-qj7sn 1/1 Running 0 15s
rook-ceph-agent-s8kdw 1/1 Running 0 15s
rook-ceph-operator-86776bbc44-248vj 1/1 Running 0 2m
rook-discover-bgsqn 1/1 Running 0 15s
rook-discover-dkk4k 1/1 Running 0 15s
rook-discover-gfp9x 1/1 Running 0 15s

Step 3. Deploy Rook Cluster

Modify the rook/cluster/examples/kubernetes/ceph/cluster.yaml file. Use the node name and the new disk just added,

apiVersion: v1
kind: Namespace
metadata:
name: rook-ceph
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rook-ceph-cluster
namespace: rook-ceph
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-cluster
namespace: rook-ceph
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: [ "get", "list", "watch", "create", "update", "delete" ]
---
# Allow the operator to create resources in this cluster's namespace
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-cluster-mgmt
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: rook-ceph-cluster-mgmt
subjects:
- kind: ServiceAccount
name: rook-ceph-system
namespace: rook-ceph-system
---
# Allow the pods in this namespace to work with configmaps
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-cluster
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rook-ceph-cluster
subjects:
- kind: ServiceAccount
name: rook-ceph-cluster
namespace: rook-ceph
---
apiVersion: ceph.rook.io/v1beta1
kind: Cluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
dataDirHostPath: /var/lib/rook
serviceAccount: rook-ceph-cluster
mon:
count: 3
allowMultiplePerNode: true
dashboard:
enabled: true
network:
hostNetwork: false
placement:
all:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: role
operator: In
values:
- storage-node
podAffinity:
podAntiAffinity:
tolerations:
- key: storage-node
operator: Exists
resources:
storage:
useAllNodes: false
useAllDevices: false
deviceFilter:
location:
config:
databaseSizeMB: "1024"
journalSizeMB: "1024"
nodes:
- name: "dev-worker1"
devices:
- name: "sdc"
- name: "dev-worker2"
devices:
- name: "sdc"
- name: "dev-worker3"
devices:
- name: "sdc"

Before the deploy, lets label the nodes for all the three worker nodes so that the rook will be only running on the worker nodes.

kubectl label node dev-worker1 role=storage-node
kubectl label node dev-worker2 role=storage-node
kubectl label node dev-worker3 role=storage-node

Deploy the cluster yaml file, kubectl apply -f cluster.yaml

Since the Rook job require the privileged pod, we need to bind the clusterrole to the service account, as what is shown in the following yaml file

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: rook-ceph-cluster-privileged
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: privileged
subjects:
- kind: ServiceAccount
name: rook-ceph-cluster
namespace: rook-ceph

Check all pods again, all the OSD are running now.

k  -n rook-ceph get pods
NAME READY STATUS RESTARTS AGE
rook-ceph-mgr-a-6fcd7c87c9-6g76j 1/1 Running 0 30s
rook-ceph-mon0-frnb4 1/1 Running 0 1m
rook-ceph-mon1-rqz9j 1/1 Running 0 52s
rook-ceph-mon2-nctft 1/1 Running 0 44s
rook-ceph-osd-id-0-779449457c-hxj89 1/1 Running 0 12s
rook-ceph-osd-id-1-567bd4468f-w5dqh 1/1 Running 0 10s
rook-ceph-osd-id-2-6849fd79df-x5d6d 1/1 Running 0 9s
rook-ceph-osd-prepare-dev-worker1-b2v24 0/1 Completed 0 26s
rook-ceph-osd-prepare-dev-worker2-lj527 0/1 Completed 0 26s
rook-ceph-osd-prepare-dev-worker3-z2btm 0/1 Completed 0 25s

Step 4. Deploy StorageClass

Now we are ready to create the storageclass, create the following file,

apiVersion: ceph.rook.io/v1beta1
kind: Pool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: ceph.rook.io/block
parameters:
pool: replicapool
clusterNamespace: rook-ceph
fstype: xfs

Deploy it, kubectl apply -f storage_class.yaml

Step 5. Test PVC

As the last step, we can test PVC provisioning. Create the following PVC request file,

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-rook-ceph-test
spec:
storageClassName: rook-ceph-block
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi

Apply it, watch the PVC are bound.

Tips

In the case you need to redeploy the Rook, remove the Kubernetes object by reversing order against the creation.

You will also need remove the files and clean up the partition created by last deployment, on all the 3 worker nodes

sudo rm -rf /var/lib/rook/*
sudo wipefs -a /dev/sdc