Backups Unleashed: Fortify Your K8s Cluster
No Backups. I too like to live dangerously.
As I mentioned in a previous blog post, wellD’s infrastructure is entirely hosted on GKE (Google Kubernetes Engine). Our Kubernetes clusters are configured with multiple node pools, each tailored to the specific resource requirements of our services. Furthermore, all of the node pools are configured to autoscale. Autoscaling enables the addition or removal of nodes from our pool based on the load of the node pool.
Ensuring that your Kubernetes cluster is backed up is essential for disaster recovery, migration, and more. However, the question remains: how can we effectively back up all of these dynamic nodes? Typically in GCP, we establish a snapshot policy for our VM instance disks. However, applying the same approach to GKE nodes is not feasible. Thankfully, Google provides a backup tool for GKE that allows to set up backup schedule, retention policy, namespaces, and more. Once configured, the backup tool provides a list of backups, restores, and restore plans.
Recently, one of our customers came up with a request to put in place a backup system for all of their K8s vanilla clusters. Here below, a description of how this need was addressed.
How to Backup a Vanilla K8s cluster (but not only)
Even though there are many solutions for backing up a K8s cluster, we were specifically looking for a tool that was tightly integrated with K8s and had the ability to backup and restore all objects in our clusters, including PersistentVolumes. Ideally, the tool would also allow us to create on-demand or scheduled backups and upload them to an object storage solution, such as AWS S3.
After conducting a thorough search and comparison on the web, we have chosen Velero as our designated backup tool for Kubernetes.
Velero is an open-source tool that can help you safely backup and restore Kubernetes cluster resources and persistent volumes.
How does it work?
Velero is an open-source tool that uses the Kubernetes API to capture the state of cluster resources and restore them when necessary. Velero offers a wide range of operations, including on-demand backups, scheduled backups, as well as restores. Each Velero operation is a custom resource stored in etcd.
Velero consists of a Kubernetes deployment, the BackupController
, which runs on our server, and a command-line interface used for creating Velero custom resource definitions.
The backup operation uploads a tarball of copied Kubernetes objects into cloud object storage and calls the cloud provider API to make disk snapshots of persistent volumes if specified. When you run velero backup create test-backup
, the Velero client makes a call to the Kubernetes API server to create a Backup
object. The BackupController
notices the new Backup
object and performs validation, then begins the backup process. It collects the data to back up by querying the API server for resources. The BackupController
makes a call to the object storage service to upload the backup file.
By default, velero backup create
makes disk snapshots of any persistent volumes. Additional flags can be specified to adjust the snapshots, which can be disabled with the option --snapshot-volumes=false
.
Velero supports various storage providers for storing backups, and it is possible to find a comprehensive list here.
The official documentation explains how Velero works, and it is instead collected here.
How we use it
We set up Velero with Helm and hooked it up to our fluxcd repository. Besides, we made sure Velero knew how to stash our backups on AWS using its plugin.
The following YAML code blocks are a portion of the values.yaml file for the Velero deployment. You can find the complete at https://github.com/vmware-tanzu/helm-charts/blob/main/charts/velero/values.yaml.
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.7.0
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
Instead of using AWS S3 as object storage, we opted for Wasabi, a cost-effective S3-compatible alternative. With Wasabi, you can get 1TB of storage for $6.99 per month.
configuration:
backupStorageLocation:
- name: my-k8s-backup
provider: aws
bucket: k8s-backup-my
prefix: papasmurf
credential:
name: wasabi-credentials
key: cloud
config:
region: eu-central-2
s3ForcePathStyle: true
s3Url: <https://s3.eu-central-1.wasabisys.com>
garbageCollectionFrequency: 24h
defaultBackupTTL: 72h
defaultBackupStorageLocation: my-k8s-backup
uploaderType: restic
logFormat: text
logLevel: info
namespace: velero
Since most of our services are stateless and involve relatively few volumes, we have disabled the volume snapshot, by setting snapshotsEnabled
to false
.
Following this, the ultimate step involves configuring the backup schedule policy. This entails determining the desired backup frequency for our cluster, specifying the included namespaces, and establishing the retention duration for the backups.
schedules:
daily-backup:
disabled: false
schedule: "0 0 * * *"
useOwnerReferencesInBackup: false
template:
ttl: "72h"
storageLocation: my-k8s-backup
include-cluster-scoped-resources: true
include-namespaces:
- '*'
excludeNamespaces:
- default
- kube-system
- kube-public
- kube-node-lease
- calico-apiserver
- calico-system
Backup
*** In the next section, I will use the Velero CLI. You can find the documentation to install it here.
Velero is configured to perform daily backup of our cluster at midnight. To verify this configuration, you can execute the following command:
velero describe schedules
The output will be:
Name: velero-daily-backup
...
Phase: Enabled
Paused: false
Schedule: 0 0 * * *
Backup Template:
Namespaces:
Included: *
Excluded: <none>
Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>
Storage Location: my-k8s-backup
Velero-Native Snapshot PVs: auto
TTL: 72h0m0s
CSISnapshotTimeout: 0s
ItemOperationTimeout: 0s
Hooks: <none>
Last Backup: <never>
To check the backup history, simply execute the following command:
velero backup get
And that is the output:
velero-daily-backup-20230509145227 Completed 0 0 2023-05-09 16:52:27 +0200 CEST 2d my-k8s-backup <none>
You can obtain more information about the backup by running the following command:
velero describe backup
The output should be similar to this:
Name: velero-daily-backup-20230509145227
...
Phase: Completed
Namespaces:
Included: *
Excluded: <none>
Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>
Storage Location: my-k8s-backup
Velero-Native Snapshot PVs: auto
TTL: 72h0m0s
CSISnapshotTimeout: 10m0s
ItemOperationTimeout: 1h0m0s
Hooks: <none>
Backup Format Version: 1.1.0
Started: 2023-05-09 16:52:27 +0200 CEST
Completed: 2023-05-09 16:52:53 +0200 CEST
Expiration: 2023-05-12 16:52:27 +0200 CEST
Velero-Native Snapshots: <none included>
Restore
If a disaster occurs, you have the ability to restore your cluster to a prior state. To review the backup history, execute the following command:
velero backup get
Then you can restore the cluster by running the following command:
velero restore create --from-backup <backup-name>
The cluster will be restored to the state of the backup by retrieving the backup from the storage location, Wasabi in our case. Lastly, you can assess the status of the restoration process by executing the following command:
velero restore get
Conclusion
In this blog post, we have explored the essential steps that enable us to backup and restore our K8s clusters. This practice is not just recommended; it’s a critical safeguard against potential disasters. By following the procedures outlined here, you can ensure your cluster can be restored to its previous state.
But that’s not where the journey ends. Imagine migrating your cluster to another cloud provider or to another region of the same cloud provider. Velero empowers you to effortlessly, back up your cluster and restore it in a new location.
That’s all folks! 🐰
So long and thanks for all the fish. 🐟