Disaster Recovery — Notes on Velero and OKE, Part 1: Stateless Pods

Fernando Harris
Oracle Developers
Published in
9 min readNov 15, 2023
Photo by Chad Kirchoff on Unsplash

In this blog entry, I am going to share a few personal notes on how to set up Velero for Disaster Recovery with OKE (a.k.a) Oracle Container Engine, the OCI-managed Kubernetes service.

The scenario we are recreating is based on a multi-region Disaster Recovery strategy. It's a conceptual exercise, inheriting some insights from previous blogs shared in the reference, done on a detailed step-by-step approach. We assume you have OCI knowledge, and Kubernetes hands-on experience and have invested some time reading the Velero documentation. We are using velero-v1.12.0-linux-amd64 which you can install from here.

For the exercise, we are running 2 Kubernetes (v 1.27) clusters in OCI. The first, aka Region 1, is located in Frankfurt and the second, aka Region 2, is located in London. We are not going to explain here how to do all the needed networking tasks, especially those related to traffic management, including DNS changes to externally target Frankfurt or London. That’s out of scope for now.

The objective is simple: to back up the application running in Frankfurt and restore it in London.

To achieve our objective the cluster needs to reach Object Storage. That's how Velero works. The diagram for the operation would be something like the following:

Velero object storage backup and restore between 2 regions
  1. The backup from the OKE Frankfurt is persisted to an Object Storage in Frankfurt.
  2. Once the backup is successfully completed, all data must be explicitly replicated from the Object Storage in Frankfurt to the Object Storage in London.
  3. Only then, the restoration can be started from the Object Storage in London to the London OKE cluster.

So, let’s explore in detail what’s needed to accomplish this exercise:

Step 1) Network pre-configuration

As mentioned above, OKE will need to reach Object Storage whenever a backup is made or restored. So, make sure that both clusters have in place the ingress and egress rules for their respective regional OCI services. You should at least for the worker nodes subnet have an egress rule for All <<Region>> Services in Oracle Services Network, port 443, or explicitly create a rule for the OCI <<Region>> Object storage. Once the egress rules for the security list are set, create a Service Gateway for the Service you want to reach and the Route Rule targeting the Service Gateway as well. Find the documentation here.

Step 2) Creation of the Object Storage buckets

You will need to create an Object Storage bucket in the root compartment of your tenancy in each region. We are calling it bucket-velero.

Create in root compartment bucket-velero

Remember that when you finish this setup in Frankfurt, you will need to replicate steps 1 and 2 in London as well.

Step 3) Create a Customer Secret Key in OCI

We are going to use OCI Object Storage S3 compatibility API to support Velero AWS provider installation. The first thing we need to do is to create a secret key. Check the documentation here and here to learn more about the process.

Basically, in the OCI Console, you will need to go to Identity->Users->User Details and select Customer Secret Keys in Resources. Click Generate Secret Key. Give it a name and save its values in a safe place. You will need them to create a credentials file to be used by Velero.

Once you get the keys you can start playing around with it. Here you can get a simple Amazon S3 Java Client prepared to test your connection with OCI Object Storage through the compatibility API before installing Velero. You can actually create the bucket with it or simply use it to troubleshoot eventual issues or test your keys.

Step 4) Velero setup in the Frankfurt cluster (Region 1)

Confirm you have access to your OKE cluster in Frankfurt:

Create a velero-crendentials file with the values created in step 3.

velero-credentials file

Now let's run the command to install Velero in the Frankfurt cluster. Please replace <<objectstoragenamespace>> with your OCI Object Storage namespace (you can get it from the OCI console):

./velero install \
--provider aws \
--bucket bucket-velero \
--secret-file ./velero-credentials \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--use-volume-snapshots=false \
--backup-location-config region=eu-frankfurt-1,\
s3ForcePathStyle="true",\
s3Url=https://<<objectstoragenamespace>>.compat.objectstorage.eu-frankfurt-1.oraclecloud.com

After the installation is complete you can double-check the logs to confirm that all went well :

kubectl logs deployment/velero -n velero

The below figure presents an extract from the Velero log showing that installation is OK and the BackupStorageLocation is valid and available:

Velero log extract

We can then validate that the backup location for Velero was indeed well-created and is available with the following command:

./velero backup-location get

We should get something like the below figure:

Velero reaches Object Storage from within OKE

Step 5) Backup a stateless application in Frankfurt — Region 1

Now that Velero is in place, let's install the application that we want to back up. Run the following commands to clone the code, and create a namespace, a deployment and a service:

git clone https://github.com/fharris/k8cloudbooster;

kubectl create ns k8-booster;

kubectl apply -f k8cloudbooster/manifestDeployment.yml;

kubectl apply -f k8cloudbooster/manifestService.yml;

When the service is ready, run the following command to get the load balancer public IP. It might take a few seconds:

kubectl get services -n k8-booster;

You should see something like the picture below:

Now, copy-paste the External IP on your browser:

And our application is running in Frankfurt.

Now let's back it up with Velero with the following command:

./velero backup create k8booster-backup --include-namespaces k8-booster

We can get the details of the backup by running the following command:

./velero backup describe k8booster-backup

You should see something like the figure below:

And once the backup is completed we can visit the OCI Console to check our Frankfurt Object Storage:

And confirm that there is in our bucket-velero a folder called backups containing the k8booster-backup.

Step 5) Object storage replication

Our work is almost done in Frankfurt. We still need to copy Frankfurt’s bucket-velero and its structure to an equivalent bucket in London. There are different ways to do this. In this blog, we’re exploring the option that consists of copying the objects with oci cli and the OCI API. We have made available for you a script called oci-copy-objects-to-region.sh which you can get from here. You might need to review the OCI permissions and policies to copy objects from one region to another first. Check the documentation here to learn more about that. Alternative ways to do this with OCI Object Storage Replication features can be found in the documentation here and here.

If you go with the script option, please replace the values in oci-copy-objects-to-region.sh:

#Requirements: oci cli installed and available in PATH
#Documentation: https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/copyingobjects.htm

export SBUCKET=bucket-velero
export DBUCKET=bucket-velero
export NAMESPACE=<<objectstoragenamespace>>
export D_REGION_ID=uk-london-1

# save object store list in a file
oci os object list \
--bucket-name=$SBUCKET --all | \
grep name | awk '{ print $2 }' | sed 's/"//g' | sed 's/,//g' > object.list;

# Bulk objects copy to Destination region:
for id in $(cat object.list)
do oci os object copy \
--bucket-name $SBUCKET \
--source-object-name $id \
--destination-bucket $DBUCKET \
--namespace-name=$NAMESPACE \
--destination-namespace=$NAMESPACE \
--destination-region=$D_REGION_ID
done

Run the script with the following command:

./oci-copy-objects-to-region.sh

You should wait and see the opc-work-request-id being launched by the script. Something like the figure below :

When finished, if you visit your bucket-velero in the London Object Storage, the same list of objects and structure should have been replicated successfully. Of course, you can easily automate this script to run periodically as a Kubernetes job, for example.

Objects successfully replicated to London Object Storage bucket- velero

Step 6) Restoring the stateless application in London — Region 2

Backup is done. And data was replicated in London. Now we can finally try to restore the application with Velero in the Region 2 cluster in London, but first, we still need to install Velero as we did for Frankfurt. Confirm you have access to your OKE cluster in London:

The velero-credentials file should be the same. All we need is to change the install command to point to London:

./velero install \
--provider aws \
--bucket bucket-velero \
--secret-file ./velero-credentials \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--use-volume-snapshots=false \
--backup-location-config \
region=uk-london-1,s3ForcePathStyle="true",\
s3Url=https://<<namespace>>.compat.objectstorage.uk-london-1.oraclecloud.com

Check the Velero pod logs, or validate that you now have an available connection with bucket-velero Object Storage in London.

./velero backup-location get

If all is good we’ll move and see if there are any available valid backups with the below command:

./velero backup get

We should be able to see that there is a backup completed and available to restore. The one we just created a few minutes ago in Frankfurt and replicated in London:

Now, let's finally try to restore the application in London by running:

./velero restore create --from-backup k8booster-backup

Check the restore by running the following command:

./velero restore describe k8booster-backup-20231030123741

The below figure shows the result of this query to Velero:

Once completed you can also check the logs if you need more information. Let's test if the restoration went well. A namespace k8-booster, must have been created with its pods and service inside:

Run:

kubectl get all -n k8-booster

To see the namespace created and all the objects recreated inside:

Check if the service is running by copying pasting the load balancer external IP on your browser. You might need to run first:

kubectl -n k8-booster get service

And that’s it.

This initial exercise was fairly simple because we picked a stateless application. Anyway, the exercise is valid to learn Velero's basic concepts and how to apply them in OCI. Working with a stateful application brings different challenges as we need to manage as well the Persistent Volumes,(block storage services or file sharing services). We are going to address that complexity in the following blogs. Stay tuned!

Documentation and references:

--

--