S3 Lifecycle Rules and Encryption Set Up using Terraform

How to Set Up S3 Lifecycle Rules and Encryption in Terraform.

Published in

CodeX

8 min readMay 7, 2024

Photo by Caspar Camille Rubin on Unsplash

In this post, we will learn how to encrypt S3 buckets and set up lifecycle rules for S3 buckets using Terraform. We will learn about how to set up S3 bucket accessibility, lifecycle transitions, and bucket versioning using Terraform.

Attaching lifecycle rules to AWS S3 buckets using Terraform

Now let’s attach a lifecycle rule to our S3 bucket. Very importantly, for us to attach a lifecycle policy to a bucket, it is advisable to attach the policy when we define the bucket resources. Also, it is worth noting that having a single lifecycle rule might not be enough for your bucket. Luckily for us, we can have multiple lifecycle_rule blocks on it, you can define these as external blocks using the Terraform list variable and passing the value of the variable into the S3 bucket.

Lifecycle policies are required when we have tons of files in our S3 bucket and we want to store them for improved accessibility and maintain them efficiently as time changes. In usual cases, when the files stored in the bucket are infrequently accessed, it is much more valuable to move our S3 bucket to an archive class such as Glacier after 60 days. The implication of this is that our bucket will be clutter-free and also be less costly to store data in the bucket. The Terraform code below shows how to define a lifecycle policy for our S3 bucket. But wait a minute, we need to configure our terminal to allow us to communicate with AWS, in our case, we will give Terraform access to our AWS service by using export AWS_ACCESS_KEY_ID='xxxxxxxxxxxxxxxx'and export AWS_SECRET_ACCESS_KEY='xxxxxxxxxxxxxxxx' from the terminal.

resource "aws_s3_bucket" "bronze_data" {
  bucket = "bronze-data-bucket"

  versioning {
    enabled = true
  }

  lifecycle_rule {
    id      = "archive"
    enabled = true
    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }
    transition {
      days          = 60
      storage_class = "GLACIER"
    }
  }

  tags = {
    Name        = "Data Lake Bronze Bucket"
    Environment = "Dev"
  }
}

resource "aws_s3_bucket_acl" "bronze_bucket_acl_permission" {
  bucket = aws_s3_bucket.bronze_data.id
  acl    = "private"
}

AWS S3 bucket encryption with Terraform

A very important topic: as data engineers, we would always get requirements to encrypt data at rest. Luckily, according to AWS, Amazon S3 now applies server-side encryption with Amazon S3 managed keys (SSE-S3) automatically as the base level of encryption for every bucket in Amazon S3. Starting January 5, 2023, all new object uploads to Amazon S3 are automatically encrypted at no additional cost and with no impact on performance.

The automatic encryption status for the S3 bucket default encryption configuration applies to all buckets and for new object uploads. This encryption feature is available in AWS CloudTrail logs, S3 Inventory, S3 Storage Lens, the Amazon S3 console, and Amazon S3 API response header in the AWS Command Line Interface and AWS SDKs.

But, aside from the SSE-S3 algorithm, if we need other server-side encryption types, how do we do that? It’s pretty easy, all we need to do is define a server_side_encryption_configuration block that specifies the encryption settings for the bucket. Here, the bucket is configured to use the “aws:kms” algorithm for server-side encryption. First, we need to generate an AWS KMS key for our encryption, define a policy for the key and then use this for our KMS algorithm. The Terraform code below defines the KMS key block and attaches the necessary policy to the KMS key.

resource "aws_kms_key" "key" {
  description = "My KMS Key for SSE-KMS"
}

resource "aws_kms_key_policy" "key_policy" {
  key_id = aws_kms_key.key.id
  policy = jsonencode({
    Id = "key"
    Statement = [
      {
        Sid = "Enable IAM User Permissions"
        Effect = "Allow"
        Principal = {
          AWS = "*"
        }
        Action = "kms:*"
        Resource = "*"
      },
      {
        Sid = "Allow use of the key"
        Effect = "Allow"
        Principal = {
          AWS = "*"
        }
        Action = [
          "kms:Encrypt",
          "kms:Decrypt",
          "kms:ReEncrypt*",
          "kms:GenerateDataKey*",
          "kms:DescribeKey"
        ]
        Resource = "*"
      },
    ]
    Version = "2012-10-17"
  })
}

The KMS and its policy resources defined above are now been used for the encryption of our S3 bucket as shown below. We have defined a “server_side_encryption_configuration” block that defines the block rule and then specifies the “kms_master_key_id” and then specifies the “sse_algorithm”: in our case AWS KMS.

resource "aws_s3_bucket" "bronze_data" {
  bucket = "bronze-data-bucket"
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        kms_master_key_id = aws_kms_key.key.arn
        sse_algorithm = "aws:kms"
      }
    }
  }

  versioning {
    enabled = true
  }

  lifecycle_rule {
    id      = "archive"
    enabled = true
    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }
    transition {
      days          = 60
      storage_class = "GLACIER"
    }
  }

  tags = {
    Name        = "Data Lake Bronze Bucket"
    Environment = "Dev"
  }
}

One of the common uses of S3 bucket encryption configuration is using it to encrypt the S3 bucket used for Terraform remote state files on the S3 bucket. This is very important for data engineers since it allows the collaboration of different engineers to work simultaneously on a Terraform project. They can continue to provision resources and Terraform will help to coordinate the state of infrastructure and no conflict of resource provisioning will happen.

Let’s quickly look at how we can design a remote Terraform state for our development backend. In our main.tf file, we were using the local Terraform state, now let’s move this to the remote state control on S3 and use S3 as our Terraform backend. As part of this, we will be showcasing encryption: our new main.tf configuration file will look like the one below.

# --------------------main.tf----------------------

terraform {
  backend "s3" {
    bucket         = "terraform-for-data-engineering-tf-state"
    key            = "tf-for-data_engineers/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-locking"
    encrypt        = true

  }
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "4.61.0"
    }
  }
  required_version = "1.8.2"
}

Now let’s create another file for the S3 configuration, we are going to call the file terraform-state.tf, and we will put all the Terraform code that pertains to the Terraform state in this file including the DynamoDB configuration needed for Terraform state locking. When running Terraform in a team, it’s important to ensure that concurrent runs don’t overwrite each other’s changes. One way to achieve this is by using state locking, which prevents multiple runs from modifying the same resources simultaneously. The code below creates an S3 bucket, and enables bucket versioning on the S3 state bucket so that we can roll back to the previous version of the state if things should go wrong with deployment updates.

# -------------------------- terraform-state.tf --------------------------------

resource "aws_s3_bucket" "terraform_state" {
  bucket        = "terraform-for-data-engineering-tf-state"
  force_destroy = true
}

resource "aws_s3_bucket_versioning" "terraform-state-versioning" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_dynamodb_table" "terraform_locks_table" {
  name         = "terraform-state-locking"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
  attribute {
    name = "LockID"
    type = "S"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform-state-bucket-encryption" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

Define S3 bucket versioning, DynamoDB table, and Server Side Encryption Configuration in Terraform

In the Terraform code above, we have used the aws_s3_bucket_server_side_encryption_configuration block to define the server-side encryption for our S3 bucket. The S3 bucket id is used to specify which of the buckets in the account should be encrypted. In the block, we have used “AES256” a highly secure encryption algorithm that provides robust protection for data at rest in cloud storage systems. The code is shown below;

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform-state-bucket-encryption" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

Our Terraform complete code is provided down below:

## ----------------------- main.tf -----------------------------
terraform {
  backend "s3" {
    bucket         = "terraform-for-data-engineering-tf-state"
    key            = "tf-for-data_engineers/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-locking"
    encrypt        = true

  }
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
  required_version = "1.8.2"
}

The complete code for our S3 bucket encryption and lifecycle rules are given as shown below:

# ----------------------- s3.tf ------------------------------
# ------------------- KMS Server-Side Encryption--------------

resource "aws_kms_key" "key" {
  description = "My KMS Key for SSE-KMS"
}

resource "aws_kms_key_policy" "key_policy" {
  key_id = aws_kms_key.key.id
  policy = jsonencode({
    Id = "key"
    Statement = [
      {
        Sid = "Enable IAM User Permissions"
        Effect = "Allow"
        Principal = {
          AWS = "*"
        }
        Action = "kms:*"
        Resource = "*"
      },
      {
        Sid = "Allow use of the key"
        Effect = "Allow"
        Principal = {
          AWS = "*"
        }
        Action = [
          "kms:Encrypt",
          "kms:Decrypt",
          "kms:ReEncrypt*",
          "kms:GenerateDataKey*",
          "kms:DescribeKey"
        ]
        Resource = "*"
      },
    ]
    Version = "2012-10-17"
  })
}


resource "aws_s3_bucket" "bronze_data" {
  bucket = "bronze-data-bucket"
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        kms_master_key_id = aws_kms_key.key.arn
        sse_algorithm = "aws:kms"
      }
    }
  }

  versioning {
    enabled = true
  }

  lifecycle_rule {
    id      = "archive"
    enabled = true
    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }
    transition {
      days          = 60
      storage_class = "GLACIER"
    }
  }

  tags = {
    Name        = "Data Lake Bronze Bucket"
    Environment = "Dev"
  }
}

The Terraform state code for the DynamoDB table, locks, bucket versioning and encryption is shown in the code below:

# ---------------------- terraform_state/terraform-state.tf -------------------

resource "aws_s3_bucket" "terraform_state" {
  bucket        = "terraform-for-data-engineering-tf-state"
  force_destroy = true
}

resource "aws_s3_bucket_versioning" "terraform-state-versioning" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_dynamodb_table" "terraform_locks_table" {
  name         = "terraform-state-locking"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
  attribute {
    name = "LockID"
    type = "S"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform-state-bucket-encryption" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

Now let’s perform terraform init, then terraform plan, and then terraform apply — —auto-approve.

With our development and testing, we can now destroy all the resources using terraform destroy — —auto-approve. With that, we are good to get up and running with developing with Terraform.

Very importantly, make sure your S3 bucket has been created and the DynamoDB for remote storage is available as well. This is very important.

Conclusion

In this post, we learn about how to set lifecycle rules, remote state storage for Terraform, and encryption of an S3 bucket for cost and operations effectiveness using Terraform. Terraform makes life easy for engineers in software or data, and knowing how to perform the operations discussed in this post is valuable for our success as a data engineer.