Automating End-to-End Data Protection with Terraform

Clumio
Clumio Engineering
Published in
6 min readMay 23, 2022

Authors:
Lawrence Chang, Director of Engineering
Prakash Radhakrishnan, Member of Technical Staff

Clumio Partners with Terraform

A fundamental principle in recent years for IT and DevOps has been to consider infrastructure as code. Similar to how application code has defined syntax and formatting as well as a reproducible binary, a company’s infrastructure should be managed and provisioned in analogous fashion. One of the most well-known infrastructure as code tools is Terraform which provides key syntax and structure to help companies minimize environment drift and automate end-to-end reproducibility of cloud-based environments. As such, Clumio has embraced Terraform from the start for both our production and internal cloud SaaS environments. However, given our extensive use of Terraform, we began to ask “What about backup?” Should it not be as easy for our customers to maintain and reproduce their data protection environment?

Such were the questions that propelled us to the recent development of our own, full-fledged Terraform provider and led to Clumio becoming a Hashicorp (the makers of Terraform) Technology Partner. Clumio’s provider exposes a set of rich resources as well as a configurable module that abstracts the use of the Clumio backup as a service for AWS. From connecting multiple AWS accounts and regions, to setting up policies and protection rules, to adding users and creating organizational units, the Clumio provider supplies customers with an easy to define, reproducible data protection environment.

The following is a quick overview of how to get started with the Clumio provider. As the provider uses APIs to abstract the use of the Clumio cloud, you should create an API key from the Clumio UI or retrieve an existing one. For help with creating an API key, please refer to the Clumio documentation. The subsequent steps assume that such an API key is available to you.

Preparing Your Terraform Automation

Start by setting up the following environment variables to allow the Clumio provider to interact with the Clumio cloud on your behalf. For allowed API base URLs, please refer to the Clumio provider documentation:

The AWS Terraform provider is used by the Clumio module to provision the resources required to perform data protection in the AWS account to be protected. As such, set the following additional environment variables:

The following starter Terraform configuration sets up for the required Clumio and AWS providers. Download the providers with terraform init:

terraform {
required_providers {
clumio = {
source = "clumio-code/clumio"
version = "~>0.2.4"
}
aws = {}
}
}

Connecting Data Environments

Next, add the following to the Terraform configuration to instantiate a Clumio connection to the AWS account associated with the AWS environment variables setup during Preparing Your Terraform Automation. us-west-2 is specified as the region in which to install the Clumio module.

# Instantiate the AWS provider
provider "aws" {
region = "us-west-2"
}
# Retrieve the effective AWS account ID and region
data aws_caller_identity current {}
data aws_region current {}
# Register a new Clumio connection for the effective AWS account ID and region
resource "clumio_aws_connection" "connection" {
account_native_id = data.aws_caller_identity.current.account_id
aws_region = data.aws_region.current.name
description = "My Clumio Connection"
}
# Install the Clumio Protect template onto the registered connection
module clumio_protect {
providers = {
clumio = clumio
aws = aws
}
source = "clumio-code/aws-template/clumio"
clumio_token = clumio_aws_connection.connection.token
role_external_id = "my_external_id"
aws_account_id = clumio_aws_connection.connection.account_native_id
aws_region = clumio_aws_connection.connection.aws_region
clumio_aws_account_id = clumio_aws_connection.connection.clumio_aws_account_id
# Enable protection of all data sources.
is_ebs_enabled = true
is_rds_enabled = true
is_ec2_mssql_enabled = true
is_dynamodb_enabled = true
is_s3_enabled = true
}

Confirm your work thus far with terraform init to download the Clumio module and then terraform plan to inspect what resources will be provisioned. NOTE the above Terraform configuration enables support for data protection on all AWS data sources. When ready run terraform apply:

Your AWS account and region are onboarded! You can confirm this from the AWS Environments page on the Clumio UI:

Automating Data Protection

To get started with backup, include the following in the Terraform configuration to create a Protection Group for S3, define a policy for it, and associate the two together. As a result, any S3 bucket with the tag key-value clumio:blog will be protected:

# Create a Clumio protection group that aggregates all S3 buckets
# with the tag "clumio:blog"
resource "clumio_protection_group" "protection_group" {
name = "My Clumio Protection Group"
bucket_rule = "{\"aws_tag\":{\"$eq\":{\"key\":\"clumio\", \"value\":\"blog\"}}}"
object_filter {
storage_classes = [
"S3 Intelligent-Tiering", "S3 One Zone-IA", "S3 Standard", "S3 Standard-IA", "S3 Reduced Redundancy"
]
}
}
# Create a Clumio policy for protection groups with a 7-day RPO and
# 3-month retention
resource "clumio_policy" "policy" {
name = "S3 Gold"
operations {
action_setting = "immediate"
type = "protection_group_backup"
slas {
retention_duration {
unit = "months"
value = 3
}
rpo_frequency {
unit = "days"
value = 7
}
}
advanced_settings {
protection_group_backup {
backup_tier = "cold"
}
}
}
}
# Assign the policy to the protection group
resource "clumio_policy_assignment" "assignment" {
entity_id = clumio_protection_group.protection_group.id
entity_type = "protection_group"
policy_id = clumio_policy.policy.id
}

Again confirm your work with terraform plan (terraform init is not required) to inspect what resources will be provisioned. When ready run terraform apply:

… and that’s it! Any S3 bucket with the tag key-value clumio:blog will start to seed and subsequently backup every 7 days.

More Resources and What’s Next

With just the above steps, you can take your data protection infrastructure and start to manage it as code. While the above walks you through a simple data protection setup, you can find more examples in our Clumio provider documentation. This includes how to connect and protect multiple AWS accounts and regions as well as how to organize and manage multiple users and organizational units. Documentation for each custom resource supplied by the Clumio provider can also be found.

Additional improvements and plans for the provider are continuously in-discussion and support for new features and AWS data sources will be added in subsequent releases. If you happen to already be using Terraform to manage your infrastructure, the Clumio provider is the perfect complement for your data protection needs (and if not using Terraform, this is a great chance to give infrastructure as code a try). We certainly welcome additional feedback from the community as we look to improve upon the provider. Better yet, if you want to contribute to our repository, we’re happy to take pull requests. Happy provisioning!

Learn more about Clumio Protect for S3

--

--