Backing Up Rancher and Its Clusters + ETCD: A Complete Guide

Published in

HangiKredi

6 min readAug 2, 2024

Introduction

Kubernetes cluster management simplifies container orchestration, but its complexity necessitates data protection. Regular backups safeguard against data loss due to hardware failures, software bugs, or human errors. This guide explores how to back up Rancher and its associated Kubernetes clusters.

Understanding Rancher Backups

Rancher backups encompass several critical data components:

Rancher Server Data: This includes configurations, settings, and the overall system state.
Kubernetes Cluster Data: All data related to workloads running within the clusters.
ETCD Data: Essential for Kubernetes, ETCD stores the cluster’s current state.

Backup Tools and Strategies

The primary tool for Rancher server data backup is the Rancher Backup Operator. It’s available in Rancher v2.5 and above. While other tools like etcdctl exist for specific needs, we'll focus on the Backup Operator here.

What is Rancher Backup Operator?

Developed by Rancher Labs, the Rancher Backup Operator facilitates backups and restores of Rancher server data. It’s functional in Rancher versions 2.5 and above.

The backup-restore operator needs to be installed in the local cluster, and only backs up the Rancher app. The backup and restore operations are performed only in the local Kubernetes cluster.

How to Install Rancher Backup Operator

We’ll use Helm commands for clarity. While a graphical method exists, commands offer better control in this context:

helm repo add rancher-charts https://charts.rancher.io
helm repo update
helm install --wait --create-namespace -n cattle-resources-system rancher-backup-crd rancher-charts/rancher-backup-crd
helm install --wait -n cattle-resources-system rancher-backup rancher-charts/rancher-backup

How to Configure Rancher Backup Operator

The Rancher Backup Operator utilizes Custom Resource Definitions (CRDs) and two key components:

Backup Configurations: Define the backup schedule, retention policy, etc.
Storage Locations: Specify where backups are stored (S3 bucket, Persistent Volume Claim).

S3 as Storage Location Example

Here’s how to create a Wasabi S3 secret with access keys and a backup configuration referencing the S3 location:

Create an S3 Secret:

kubectl create secret generic rancher-backup-s3 \
  --from-literal=accessKey=<access key> \
  --from-literal=secretKey=<secret key> \
  -n cattle-global-data

Backup Configuration (YAML):

This example Backup custom resource would create encrypted recurring backups in S3. The app uses the credentialSecretNamespace value to determine where to look for the S3 backup secret

YAML

apiVersion: resources.cattle.io/v1
kind: Backup
metadata:
  name: nightly
  namespace: cattle-resources-system
spec:
  encryptionConfigSecretName: ''
  resourceSetName: rancher-resource-set
  retentionCount: 30
  schedule: '@midnight'
  storageLocation:
    s3:
      bucketName: <bucket-name>
      credentialSecretName: rancher-backup-s3
      credentialSecretNamespace: cattle-global-data
      endpoint: s3.<region>.amazonaws.com
      folder: <s3-folder-name>
      insecureTLSSkipVerify: true
      region: <region>

View Backup Logs:

kubectl -n cattle-resources-system logs -l app.kubernetes.io/name=rancher-backup

ETCD Snapshots

ETCD, a distributed key-value store, is integral to Kubernetes as it maintains the cluster’s state data. Given its critical role, regular backups of ETCD are essential for maintaining cluster integrity and facilitating disaster recovery. ETCD snapshots provide a straightforward method for backing up the ETCD datastore.

To automate ETCD snapshots, we use the rke CLI tool, which offers dynamic functionality to meet our backup requirements. Specifically, we leverage this tool to schedule regular ETCD snapshots and upload them to an S3-compatible storage service.

Here’s how to configure a cron job on your master node to take ETCD snapshots and upload them to AWS (Amazon Web Services) S3 bucket:

1. Create an S3 Service Account in AWS and Assign Necessary Permissions

To securely manage backups and other operations involving Amazon S3, you should create an AWS IAM (Identity and Access Management) service account (IAM user) with the appropriate permissions. Here’s how you can do it:

Creating the IAM Service Account

Sign in to the AWS Management Console:

Go to the AWS Management Console.
Sign in with your credentials.

2. Navigate to IAM:

In the AWS Management Console, go to the IAM (Identity and Access Management) dashboard. You can find it by searching for “IAM” in the search bar.

3. Create a New User:

In the IAM dashboard, click on Users in the sidebar.
Click on Add user.
Enter a user name (e.g., backup-s3-user).
For Access type, select Programmatic access. This will provide an access key and secret key that you can use to interact with AWS S3 programmatically.

4. Set Permissions:

On the Set permissions page, choose Attach existing policies directly.
Search for and select the AmazonS3FullAccess policy. This policy grants full access to S3, which allows you to create, read, and manage objects and buckets. Alternatively, you can create a custom policy if you want to restrict permissions more granularly.
Click Next: Tags (optional to add tags) and then Next: Review.

5. Review and Create:

Review the settings and click Create user.
Note down the Access key ID and Secret access key. These credentials will be used to configure access to your S3 bucket.

Assigning Permissions

Ensure that the IAM user or role has the following permissions (if you created a custom policy):

Assigning Permissions:
Ensure the IAM user or role has the following permissions (if using a custom policy):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::rancher-backups",
                "arn:aws:s3:::rancher-backups/*"
            ]
        }
    ]
}

You can use the AWS Policy Generator or the AWS IAM console to create a custom policy if you need more granular control.

Do not skip to add bucket policy which should similar to this json:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<AccountID>:user/backup-s3-user"
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:PutObject",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::rancher-backups",
                "arn:aws:s3:::rancher-backups/*"
            ]
        }
    ]
}

2. Set Up the Cron Job

Add the following entry to your crontab file:

0 4 * * * rke2 etcd-snapshot save - s3 - s3-bucket=rancher-backups - s3-access-key=<ACCESS-KEY> - s3-secret-key=<SECRET-KEY> - etcd-s3-endpoint=s3.eu-central-1.amazonaws.com - etcd-s3-region=<region> - etcd-s3-folder=<folder-name>

This cron job will run daily at 4 AM, capturing an ETCD snapshot and uploading it to the specified S3 bucket. Ensure you replace <ACCESS-KEY>, <SECRET-KEY>, <region>, and <folder-name> with your actual S3 credentials and configuration.

2. Configuration Details

s3: Enables S3 storage for the snapshot.
s3-bucket: Specifies the S3 bucket where snapshots will be stored.
s3-access-key and — s3-secret-key: Provide the necessary authentication credentials for S3 access.
etcd-s3-endpoint: Defines the endpoint for the S3-compatible API.
etcd-s3-region: Indicates the AWS region where the S3 bucket is located.
etcd-s3-folder: Designates the folder within the S3 bucket to organize snapshots.

This setup ensures that ETCD snapshots are systematically backed up and stored securely, leveraging S3’s durability and scalability. Regularly review and adjust your snapshot strategy to align with your operational needs and compliance requirements.

Troubleshooting

If you encounter errors uploading backups to S3, try these solutions:
1. Compress the Backup File: Use the `-snapshot-compress` flag to reduce file size
2. Increase Timeout Settings: Use the `-etcd-s3-timeout` flag to extend the timeout:
Adjust the timeout value based on your needs.

Conclusion

Implementing robust backup strategies for Rancher and its Kubernetes clusters is essential for ensuring data integrity and operational continuity. By utilizing tools like the Rancher Backup Operator and configuring automated ETCD snapshots, you can effectively safeguard critical data and system configurations.

However, creating backups is only part of the equation. Equally important is the ability to restore from these backups when necessary. Regularly testing your backup and restore procedures is crucial to confirm that your backups are reliable and that your team can execute a successful recovery in the event of a disaster.

Conducting periodic drills to restore from ETCD snapshots and Rancher backups will help you verify the integrity of your backup data and the effectiveness of your restoration processes. These drills ensure that you are prepared for unexpected failures or data loss scenarios, minimizing downtime and maintaining business continuity.

By integrating regular backup and restore testing into your operational practices, you enhance your resilience against data loss and ensure that your backup strategies are robust and effective when they are needed most.

References:

Etcd Backup and Restore | RKE2

In this section, you'll learn how to create backups of the rke2 cluster data and to restore the cluster from backup.

docs.rke2.io

Backing up Rancher | Rancher

In this section, you'll learn how to back up Rancher running on any Kubernetes cluster. To backup Rancher installed…

ranchermanager.docs.rancher.com

Comprehensive Guide to Backing Up Rancher and its Clusters

Learn how to effectively back up Rancher and its Kubernetes clusters, ensuring data safety and continuity.

support.tools

Operating etcd clusters for Kubernetes

etcd is a consistent and highly-available key value store used as Kubernetes' backing store for all cluster data. If…

kubernetes.io

#devops #kubernetes #etcd #rancher #backup #restore