Migrate Kubernetes Persistent Volume Claims to AKS with Velero

Published in

Microsoft Azure

5 min readNov 30, 2023

When you’re looking to move an existing project to Azure Kubernetes Service, you’ll face the challenge of migrating data to the newly created Kubernetes cluster. This task can be tackled in various ways, depending on the type of data involved.

Databases typically come equipped with built-in export and import functions for seamless data transfer between different installations.
Unstructured data stored in object storage is relatively easy to duplicate and relocate.

However, not all applications offer straightforward data migration patterns. As a platform engineer, you might encounter difficulties when moving a Kubernetes Persistent Volume disk containing unknown application data. In this context, the documentation page “Migrate to AKS” provides a useful reference to Velero on Azure.

One particularly interesting scenario involves migrating Kubernetes stateful applications from environments outside of Azure to an AKS cluster. In my past experience with virtual machine volumes, I’ve used Restic for backups. Restic operates as an agent within the VM, creating backups of a file system path to a Restic repository. The Restic repository is a data abstraction layer that can contain incremental snapshot and can be stored easily in a object storage backend like Azure Blob Storage.

Now, with the absence of VMs and the shift to containers with Kubernetes persistent volume claims, a crucial question arises: How can we efficiently perform the backup and restoration of this data to a different cloud provider?

Velero 101

Velero is an open-source project supported by VMware, designed for backing up and migrating Kubernetes resources and persistent volumes. After exploring the documentation, I believe it’s crucial to grasp Velero’s two distinct operational modes: Snapshot Backup and File System Backup.

Snapshot Backup: This operational approach uses the existing volume snapshot technology of the cloud hosting Kubernetes. It’s excellent for backup and disaster recovery but isn’t ideal for cross-cloud migration because the snapshots can’t be moved. You can’t transfer an AWS EBS snapshot to an Azure Managed Disk, for example.
File System Backup: In this operational mode, Velero uses either Restic or Kopia to back up the volumes attached to a pod, similar to how I employed Restic for backing up my VM disk. A privileged DaemonSet pod on each Kubernetes node is responsible for reading the volumes mounted by our application pod and backing them up with Restic or Kopia. This operational mode is the one to use for cross-cloud migration, because Restic and Kopia provide the snapshot abstraction layer that is reusable across different clouds.

Installing Velero for Azure Migration

I’m sharing some operational notes in a GitHub gist. I conducted tests on backing up Persistent Volumes from Minikube and restoring them to Azure. Based on hands-on experience, I’ve learned that in a migration scenario to Azure from another cloud provider, the Velero installation should utilize the “provider Azure” option on both clouds, irrespective of whether the initial Kubernetes cluster runs on Minikube or a different cloud provider.

Disabling the use of Azure snapshots is crucial, because you don’t want to add configuration that is Azure specific that needs Managed Identities. For the authentication configuration, use the storage account access key. By avoiding reliance on Azure Managed Identities, we ensure that access to the Storage Container remains possible even from clusters not running on Azure.

Here’s how the installation looks:

cat << EOF  > ./credentials-velero
AZURE_STORAGE_ACCOUNT_ACCESS_KEY=${AZURE_STORAGE_ACCOUNT_ACCESS_KEY}
AZURE_CLOUD_NAME=AzurePublicCloud
EOF

velero install \
  --provider azure \
  --use-node-agent \
  --plugins velero/velero-plugin-for-microsoft-azure:v1.8.0 \
  --bucket velerobackups \
  --secret-file ./credentials-velero \
  --use-volume-snapshots=false \
  --backup-location-config storageAccount=velerogist,storageAccountKeyEnvVar=AZURE_STORAGE_ACCOUNT_ACCESS_KEY

Follow the instructions in the gist to deploy a sample application, back it up, and restore it in a different location.

Limits of the Velero File System Backup mechanism

What are the potential issues with the File System Backup (FSB) mechanism? Having delved into the limitations outlined in the File System Backup documentation and conducted hands-on testing, I would highlight the following challenges that are significant to me:

The application Pod must be running during the backup process because the volume is mounted and accessible for reading at the Kubernetes Node only while the Pod is active. This implies that a backup is performed while the application may potentially be modifying the data. To tackle this challenge, backup hooks can be utilized to execute commands in the container before initiating the backup. The documentation provides an example using fsfreeze to lock filesystem operations during the backup, ensuring filesystem integrity is preserved. While this method works in the documented example, a critical consideration is the duration for which an application can effectively operate with a frozen file system. If the backup takes several seconds and the application encounters issues, such as crashing or becoming unresponsive, there’s a risk that a Kubernetes health check could terminate the pod, leaving the backup in an incomplete state.
When restoring a Persistent Volume Claim backup taken from a different cloud, a significant challenge arises due to the immutability of the Storage Class name in the API. This implies that when transitioning from GKE to Azure, the restored PVC in Azure must align with the same Storage Class name used in GKE. If custom storage class names are employed, this is not a major issue. However, creating Azure storage class names to mirror the default names in GKE can lead to a readability problem in the platform configuration. I would consider this a minor technical debt.

Conclusion

Velero offers a mechanism for backing up and restoring Kubernetes Persistent Volumes seamlessly across multiple clouds, providing a familiar approach for those transitioning from working with virtual machines to the container era. Because Velero has its limitations, I recommend leveraging specific application data export/import APIs rather than attempting to back up raw data on the underlying disk. For instance, exporting and reimporting a MongoDB database to a new instance using mongoexport and mongoimportis preferable to moving the raw disk containing MongoDB data. However, in scenarios where you’re migrating a legacy application that does not provide a data export mechanisms, Velero proves highly valuable for both migration and disaster recovery.

Bibliography

To write this blog post, I read the following materials: