Cloud Pak for Data Solutions Explained

Published in

IBM Data Science in Practice

7 min readApr 6, 2021

A real service backup and restore solution

a grassy field with birds on the edge of a body of water with a pier jutting into the water. It is sunset or sunrise

It was the third meeting I had within a week — a customer who use Cloud Pak for Data wanted to have a backup and restore solution to meet their business requests. This meeting was for discussing their backup and restore architecture and proposal and also to answer their technique questions.

When I was the SRE architect on Cloud Pak for Data, meetings like the above happened often. Many customers reached out to me to discuss the Cloud Pak for Data backup and restore support, and to come up with a backup and restore solution based on my suggestions.

Typically, in these kind of meetings, I often asked the customer about the Cloud Pak for Data services they planned to deploy, and also their requests for the backup and restore. Based on their deployment plan and requests, and also based on my knowledge of the services, I came up with suggestions to meet customer requests. With this blog, this is also what I am going to go through. I am using a customer case with service IBM Open Data for Industries to demo how customers can plan and create an optimized Cloud Pak for Data backup and restore solution.

Use case:
A customer deploys Open Data for Industries service within a Cloud Pak for Data cluster. The customer wants to set up a daily backup which happens in the middle of every night for in-place restore in case of data lost.

Before we start discussing the solution, let me give some quick overview of Cloud Pak for Data backup and restore support.

Cloud Pak for Data Backup and Restore support

Cloud Pak for Data provides backup and restore support with lots of flexibility to meet customer use cases and requests. Its v3.5.x backup and restore tool — cpdbr supports following commands:

quiesce/pause services of an instance
backup volumes or snapshot
unquiesce/resume services
restore volumes or snapshot

When preforming Backup, Steps 1, 2 and 3 can be executed in a single step or separately. When going about the restore procedure, steps 1, 4, 3 can be executed in a single step or separately. This gives customers flexibility in the way to execute these steps.

Then, we need to design the customized and optimized solution:

a created image with a computer chip labeled “AI” in the middle of a group of activities, including robots using a vertical interface, a robot arm in manufacturing, and a VR headset

Generally, when a customer is planning for the backup and restore, their main focuses are:

Backup: (1) try to reduce the disruptive time, or have no disruption to end users; (2) how quickly does each backup take (This is not Recovery Point Objective (RPO) per se, but affects RPO)?
Restore: (1) reliable restore — restore needs to succeed and get the environment back to the expected state; and (2) When restore needed, get the environment back as quick as possible — Recovery Time Objective (RTO).

With above, in order to come up with an optimized solution, we need to think from following points:

The Cloud Pak for Data backup and restore features and support (as briefly mentioned above, and also documented here).
The deployed services and their behavior. For the use case with this blog, it is Open Data for Industries. Its behaviors are mentioned below.

Open Data for Industries have following behaviors:

Deployment:

all Open Data for Industries core services are stateless and deployed in “osdu” namespace;
all Open Data for Industries utility services are in different namespaces.

2. Raw data and metadata:

Raw data are saved through S3 compatible MinIO into its storage,
Metadata are saved with CouchDB,
Indexes are managed by Elasticsearch,
Users and groups are managed with Keycloak (Red Hat SSO)

a model of the Open Data for Industries solution with Red Hat OpenShift Container Platform that uses storage, cluster management, monitoring and logging and the namespace components — Open Data for Industries architecture. Image copied from Cloud Pak for Data documentation.

Based on the above behaviors, backup and restore solution needs to:

back up all Open Data for Industries namespaces that are running its core services and utility services.
quiesce services in “osdu” namespace will be enough to guarantee Open Data for Industries instance with a consistent, reliable backup or restore.

To keep the quiesce time (disruptive time) and backup/restore time to the shortest, the following will be a high level optimized procedure:

install cpdbr tool in all above namespaces.
create PVCs for cpdbr in all namespaces.
init cpdbr in all namespaces.
quiesce services in namespace “osdu”: this action will quiesce Cloud Pak for Data Control Plane services and also Open Data for Industries core services.
backup PVCs in all Open Data for Industries namespaces.
unquiesce namespace “osdu”.

When restore needed, the procedure can be:

quiesce service in namespace “osdu”.
restore PVCs in all Open Data for Industries namespaces.
unquiesce namespace “osdu”.

Coming up with detailed steps

One person helping another while climbing on a sharp mountain crest

Environment preparation:

Download cpdbr executable from https://github.com/IBM/cpd-cli/tree/master/cpdbr/2.0.0

2. scp the cpdbr executable to the cluster infra node

3. Login to the cluster infra node, check cpdbr version:

[root@osdu-ocs-bvt-inf ~]# ./cpdbr version

4. Run the below command to install cpdbr-aux to all the namespaces you plan to backup. Using “osdu” as example:

IMAGE_REGISTRY=`oc get route -n openshift-image-registry | grep image-registry | awk '{print $2}'`
echo $IMAGE_REGISTRY
NAMESPACE=`oc project -q`
echo $NAMESPACE
CPU_ARCH=`uname -m`
echo $CPU_ARCH
BUILD_NUM=<build-number>
echo $BUILD_NUM

# Pull cpdbr image from Docker Hub
podman pull docker.io/ibmcom/cpdbr:2.0.0-${BUILD_NUM}-${CPU_ARCH}
# Push image to internal registry
podman login -u kubeadmin -p $(oc whoami -t) $IMAGE_REGISTRY --tls-verify=false
podman tag docker.io/ibmcom/cpdbr:2.0.0-${BUILD_NUM}-${CPU_ARCH} $IMAGE_REGISTRY/$NAMESPACE/cpdbr:2.0.0-${BUILD_NUM}-${CPU_ARCH}
podman push $IMAGE_REGISTRY/$NAMESPACE/cpdbr:2.0.0-${BUILD_NUM}-${CPU_ARCH} --tls-verify=false

5. Run “oc get is” to verify that image loaded:

[root@osdu-ocs-bvt-inf ~]# oc get is -n osdu

6. Create a yaml file to create a PVC: cpdbr-vol.yaml in all namespaces

oc apply -f cpdbr-vol.yaml

cpdbr-vol.yaml content:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cpdbr-pvc-osdu
spec:
  storageClassName: ocs-storagecluster-cephfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 200Gi

7. Init cpdbr in all namespaces

[root@osdu-ocs-bvt-inf ~]# ./cpdbr init -n osdu --log-level=debug --verbose --pvc-name cpdbr-pvc-osdu --image-prefix=image-registry.openshift-image-registry.svc:5000/osdu --provider=local

Backup steps

Quiesce the osdu core services in the namespace “osdu” to make sure all data and metadata are sync’ed when doing backup.

[root@osdu-ocs-bvt-inf ~]# ./cpdbr quiesce -n osdu

2. Backup volumes in “osdu” namespace

[root@osdu-ocs-bvt-inf ~]# ./cpdbr volume-backup create -n osdu osdu-volbackup1 --skip-quiesce=true

Note:
After backup “osdu” namespace, Do NOT unquiese the services as generally recommended. You need to wait till all OSDU related namespaces are done with backup, then come back to “osdu” namespace to unquiesce the core services.

3. Check the backup status to make sure it done and succeed

[root@osdu-ocs-bvt-inf ~]# ./cpdbr volume-backup status -n osdu osdu-volbackup1

4. To download the backup file (to restore into a different cluster), run:

[root@osdu-ocs-bvt-inf ~]# ./cpdbr volume-backup download -n osdu osdu-volbackup1

following file will be downloaded to local folder, for example as below:

-rw-r — r — 1 root root 1821214720 Oct 29 10:56 cpd-volbackups-osdu-volbackup1-data.tar

5. Run step 9, step 11, step 12 for all other namespaces

6. Unquiesce the services in “osdu” namespace

[root@osdu-ocs-bvt-inf ~]# ./cpdbr unquiesce -n osdu

7. Tar all backup archived files downloaded from these namespaces

8. Move the final tar file to a safe location, or move to another cluster.

When restore needed:

stylized computer screens with arrows pointing into each other

1. Get the final backup tar file saved at step 9 above

2. untar the backup file, get tar files for each namespace — osdu, osdu-couchdb, osdu-amq, osdu-elastic, osdu-keycloak, osdu-minio.

3. Upload the backup file to each namespace. Using “osdu” namespace as an example:

[root@osdu-ocs-bvt-inf ~]# ./cpdbr volume-backup upload --namespace=osdu -f cpd-volbackups-osdu-volbackup1-data.tar

4. Quiesce the osdu core services in namespace “osdu” to make sure all data. metadata are sync’ed.

[root@osdu-ocs-bvt-inf ~]# ./cpdbr quiesce -n osdu

5. restore the backup files to all the namespaces, using “osdu” namespace as an example:

[root@osdu-ocs-bvt-inf ~]# ./cpdbr volume-restore create -n osdu --from-backup osdu-volbackup1 osdu-volrestore1

6. Check status to make sure restore done.

[root@osdu-ocs-bvt-inf ~]# ./cpdbr volume-restore status -n osdu osdu-volrestore1

7. Unquiesce all services from “osdu” namespace.

[root@osdu-ocs-bvt-inf ~]# ./cpdbr unquiesce -n osdu

Conclusions

The above steps take advantage of the flexibility of Cloud Pak for Data backup and restore support. Executing these steps in a script will minimize the disruptive time for the end users.

For the use case with this blog, a customer can implement a Kubernetes job to run the script of backup at 12:00 am everyday. Restore can also be automated into a script to run as needed.

This use case above is just an example to show how to plan and optimize the Cloud Pak for Data backup and restore solution. For many customer cases, customers need to plan the backup (or restore) procedure to gain its best performance based on:

the cluster setup,
deployed services and behavior, and
their business requests

Customers can refer to following documentation links for Cloud Pak for Data Backup and Restore support, services architecture and behavior, and based on their request to come up with a best backup and restore solution:

Cloud Pak for Data Backup and Disaster recovery
Cloud Pak for Data cpdbr tool
Cloud Pak for Data services