Data Protection

K8s backup & restore using Velero

Srinivasa Vasu
Geek Culture
8 min readApr 11, 2021

--

Velero is an open-source backup tool to save and restore the Kubernetes cluster state. The K8s cluster state, including the mounted data, is persisted in an external object store used to restore it either in the same cluster if lost/deleted by accident or in a new cluster elsewhere in a different cloud. It takes volume snapshots using the cloud providers’ native capability and facilitates vendor-neutral backups using restic.

Context: Let’s look at leveraging Velero to move data between K8s clusters running on two different clouds. This would use the restic integration instead of the native cloud provider snapshot capability.

Pre-reqs: Running K8s clusters in two different clouds’/dcs’ and an external pre-configured S3 compatible object store.

Init: Source is a Tanzu Kubernetes Grid(TKG) cluster provisioned on AWS via Tanzu Mission Control (TMC) and target is a native GKE cluster and Minio as the external object store.

Let’s install a Spring reactive r2dbc app with an external state dependency on a Postgres DB. The Postgres storage is mounted as a PVC on an EBS volume. The source and K8s resource manifest to deploy to the TKG cluster are available here.

Source Ops

Create a new namespace kubectl create ns todo, clone the above repo to the local filesystem, and deploy the app to this namespace using a tool like skaffold-cli/waypoint. Using these tools, build and deploy are integrated and done via a single manifest definition.

App-Deploy

skaffold run -n todo --tail

Navigate to the target k8s folder in the cloned source repo and run the above command to build and deploy the application to TKG. Ensure to update the image reference in the deployment and skaffold manifests to your registry and repo reference.

╰─ kubectl -n todo get pods
NAME READY STATUS RESTARTS AGE
todo-79855b98fb-5vxj6 1/1 Running 0 41h
todo-postgres-6dd5c84546-dj7kl 1/1 Running 0 41h
╰─ kubectl -n todo get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
todo-postgres-pvc Bound pvc-54ded26c-7451-4129-957a-fc4d949c57b5 5Gi RWO standard 41h
╰─ kubectl -n todo get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
todo ClusterIP 10.100.101.246 <none> 8080/TCP 41h
todo-postgres ClusterIP 10.103.2.225 <none> 5432/TCP 41h

Let’s send some POST/PUT requests to the todo app to make changes to the backing Postgres persistent store using httpie or any tool of your choice.

Data Persistence

╰─ kubectl -n todo port-forward svc/todo 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
╰─ http :8080/todo
HTTP/1.1 200 OK
Content-Type: application/json
transfer-encoding: chunked
[
{
"id": "3fc179e3-353b-440b-8164-12f98df3a545",
"status": true,
"task": "My first todo"
}
]
╰─ http POST :8080/todo id="" task="Velero data protection"
HTTP/1.1 201 Created
Content-Length: 92
Content-Type: application/json
{
"id": "fba98524-e830-4569-912f-3a4cdecee1dc",
"status": false,
"task": "Velero data protection"
}
╰─ http POST :8080/todo id="" task="Velero data recovery" status=true
HTTP/1.1 201 Created
Content-Length: 89
Content-Type: application/json
{
"id": "3724f8e5-e2ac-44e5-ae03-5b25c04191ef",
"status": true,
"task": "Velero data recovery"
}
╰─ http PUT :8080/todo id="fba98524-e830-4569-912f-3a4cdecee1dc" status="true" task="Velero data protection"
HTTP/1.1 200 OK
Content-Length: 91
Content-Type: application/json
{
"id": "fba98524-e830-4569-912f-3a4cdecee1dc",
"status": true,
"task": "Velero data protection"
}

Jump back to check the datastore state,

╰─ http :8080/todo
HTTP/1.1 200 OK
Content-Type: application/json
transfer-encoding: chunked
[
{
"id": "fba98524-e830-4569-912f-3a4cdecee1dc",
"status": true,
"task": "Velero data protection"
},
{
"id": "3fc179e3-353b-440b-8164-12f98df3a545",
"status": true,
"task": "My first todo"
},
{
"id": "3724f8e5-e2ac-44e5-ae03-5b25c04191ef",
"status": true,
"task": "Velero data recovery"
}
]

Init Velero

Let’s install velero using the CLI,

velero install --use-restic \
--default-volumes-to-restic \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.2.0 \
--bucket <bucket_store_name> \
--secret-file <bucket_access_creds_file_path> \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=<s3_endpoint_url>
default-volumes-to-restic
- will enable the restic volume backup by default
secret-file
- content of the file would be like,
[default]
aws_access_key_id=<val>
aws_secret_access_key=<val>
backup-location-config
- point it to the S3 minio endpoint and region location

Check the status of the installation,

╰─ velero backup-location get
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE
default aws oss-dp Available 2021-04-10 21:02:24 +0530 IST ReadWrite
╰─ kubectl -n velero get all
NAME READY STATUS RESTARTS AGE
pod/restic-lxl8g 1/1 Running 0 49s
pod/restic-tlwx6 1/1 Running 0 49s
pod/velero-65b78c9b54-z5nx2 1/1 Running 0 50s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/restic 2 2 2 2 2 <none> 52s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/velero 1/1 1 1 54s

Init back-up

Initiate the velero backup for the todonamespace,

╰─ velero backup create todo-bkp --include-namespaces todo
Backup request "todo-bkp" submitted successfully.

Check the backup status,

╰─ velero backup describe todo-bkp
Name: todo-bkp
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.19.4+vmware.2
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=19
Phase: CompletedErrors: 0
Warnings: 0

The S3 minio bucket would have created the appropriate structure with the artifacts for the initiated backup.

minio object-store

Target Ops

Let’s switch the context to the GKE target cluster and install velero the same way as above using the CLI. Check the cluster state,

velero install --use-restic \
--default-volumes-to-restic \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.2.0 \
--bucket <bucket_store_name> \
--secret-file <bucket_access_creds_file_path> \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=<s3_endpoint_url>
- point the velero controller to the same external minio store.╰─ velero backup-location get
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE
default aws oss-dp Available 2021-04-10 21:25:21 +0530 IST ReadWrite
╰─ k get ns
NAME STATUS AGE
default Active 5m57s
kube-node-lease Active 5m59s
kube-public Active 5m59s
kube-system Active 5m59s
velero Active 63s

Init restore

Restore the todonamespace in this cluster,

╰─ velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
todo-bkp Completed 0 0 2021-04-10 21:05:23 +0530 IST 29d default <none>
╰─ velero restore create --from-backup=todo-bkp todo-restore
Restore request "todo-restore" submitted successfully.
Run `velero restore describe todo-restore` or `velero restore logs todo-restore` for more details.

Check the restore status,

╰─ velero restore describe todo-restore
Name: todo-restore
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: Completed╰─ kubectl get ns
NAME STATUS AGE
default Active 10m
kube-node-lease Active 10m
kube-public Active 10m
kube-system Active 10m
todo Active 2m2s
velero Active 5m53s
╰─ kubectl -n todo get pods
NAME READY STATUS RESTARTS AGE
todo-5b76c86c69-rvtgd 1/1 Running 1 3m39s
todo-postgres-d95749c96-lx2bq 1/1 Running 0 3m39s
╰─ kubectl -n todo get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
todo-postgres-pvc Bound pvc-d53829e5-a379-424c-af03-78dd796b251a 5Gi RWO standard 4m5s

Check the datastore state,

╰─ http :8080/todo
HTTP/1.1 200 OK
Content-Type: application/json
transfer-encoding: chunked
[
{
"id": "fba98524-e830-4569-912f-3a4cdecee1dc",
"status": true,
"task": "Velero data protection"
},
{
"id": "3fc179e3-353b-440b-8164-12f98df3a545",
"status": true,
"task": "My first todo"
},
{
"id": "3724f8e5-e2ac-44e5-ae03-5b25c04191ef",
"status": true,
"task": "Velero data recovery"
}
]

Restore is completed successfully in the target cluster with the same state as the source. All these happen without making any changes to the cluster files/definitions. By default, it uses restic for the persistence as instructed during the installation.

Clean up the namespace kubectl delete ns todo

Incremental backups

Let’s switch the context to the source TKG cluster and initiate a scheduled backup flow. Velero takes incremental backups for the schedule created. Cron schedule is created depending on the RTO/RPO requirements. In this case, cron runs every 5 minutes to backup the specified namespace resource.

╰─ velero schedule create todo-incremental-bkp --schedule="*/5 * * * *" --include-namespaces=todo
Schedule "todo-incremental-bkp" created successfully.
╰─ velero schedule get
NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR
todo-incremental-bkp Enabled 2021-04-10 21:44:33 +0530 IST */5 * * * * 720h0m0s 44s ago <none>

Let’s POST more updates,

╰─ http POST :8080/todo id="" task="Velero incremental backups" status=true
HTTP/1.1 201 Created
Content-Length: 95
Content-Type: application/json
{
"id": "0bde3c1e-70b8-4545-b73e-f02ef9f6e2b1",
"status": true,
"task": "Velero incremental backups"
}
╰─ http :8080/todo
HTTP/1.1 200 OK
Content-Type: application/json
transfer-encoding: chunked
[
{
"id": "fba98524-e830-4569-912f-3a4cdecee1dc",
"status": true,
"task": "Velero data protection"
},
{
"id": "3fc179e3-353b-440b-8164-12f98df3a545",
"status": true,
"task": "My first todo"
},
{
"id": "3724f8e5-e2ac-44e5-ae03-5b25c04191ef",
"status": true,
"task": "Velero data recovery"
},
{
"id": "0bde3c1e-70b8-4545-b73e-f02ef9f6e2b1",
"status": true,
"task": "Velero incremental backups"
}
]

Jumping over to the minio console, you would notice the incremental backup updates.

scheduled incremental back-ups

Init restore again

Let’s switch the context back to the target cluster to initiate the restore process.

╰─ kubectl get ns
NAME STATUS AGE
default Active 38m
kube-node-lease Active 38m
kube-public Active 38m
kube-system Active 38m
velero Active 33m

Now restore the todo namespace data from the incremental backups,

╰─ velero restore create --from-schedule=todo-incremental-bkp todo-incremental-restore --include-namespaces=todo
Restore request "todo-incremental-bkp" submitted successfully.
╰─ velero restore describe todo-incremental-restore
Name: todo-incremental-restore
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: CompletedStarted: 2021-04-10 22:02:08 +0530 IST
Completed: 2021-04-10 22:02:36 +0530 IST
Backup: todo-incremental-bkp-20210410163024

Check the restore status,

╰─ kubectl get ns
NAME STATUS AGE
default Active 43m
kube-node-lease Active 43m
kube-public Active 43m
kube-system Active 43m
todo Active 79s
velero Active 38m
╰─ kubectl -n todo get pods
NAME READY STATUS RESTARTS AGE
todo-5b76c86c69-rvtgd 1/1 Running 1 5m41s
todo-postgres-d95749c96-lx2bq 1/1 Running 0 5m41s
╰─ kubectl -n todo get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
todo-postgres-pvc Bound pvc-32a3329f-55d5-40d7-90bf-046b2960b6d6 5Gi RWO standard 6m38s

Recheck the datastore state,

╰─ http :8080/todo
HTTP/1.1 200 OK
Content-Type: application/json
transfer-encoding: chunked
[
{
"id": "fba98524-e830-4569-912f-3a4cdecee1dc",
"status": true,
"task": "Velero data protection"
},
{
"id": "3fc179e3-353b-440b-8164-12f98df3a545",
"status": true,
"task": "My first todo"
},
{
"id": "3724f8e5-e2ac-44e5-ae03-5b25c04191ef",
"status": true,
"task": "Velero data recovery"
},
{
"id": "0bde3c1e-70b8-4545-b73e-f02ef9f6e2b1",
"status": true,
"task": "Velero incremental backups"
}
]

Velero has restored the namespace state fully with all the resources and data intact. This is a common use case where Velero is used to schedule regular backups of K8s clusters for disaster recovery.

--

--

Srinivasa Vasu
Geek Culture

Aspiring Software Artist | views expressed on this blog are solely mine |