Serve Single-Page Applications from S3 through CloudFront with external WAF — a Terraform guide (Part 1)

Adrian Arba
13 min readMay 12, 2023

--

I’m going to present how I tackled an interesting challenge, of deploying an infrastructure stack through Terraform, to serve Single Page Applications (SPA) in AWS from an S3 bucket, through a CloudFront distribution using OAC (Origin Access Control) and connected to external DNS and WAF providers.

This solution ties into a globally accessible application that calls these SPAs on application user request. Using AWS (or any Cloud provider) to serve the SPAs brings benefits such as low costs and high availability (serverless solution, S3 storage), security (always using the latest security standards in AWS), and speed (CDN serves cached content globally) with very little administrative overhead (just update the Terraform code for changes).

For this solution, I used generally available terraform resources that I implemented as modules. I won’t be covering all the code I used (this article would become pretty sizeable) but I want to point out some quirks I met and tackled while working on this, especially since most articles I’ve found mention static websites in S3 or using OAI instead of OAC and do not specify why you may be getting some weird error messages along the way.

It’s going to be a 2 part article, one focusing on the modules and another one focusing on the configuration applied to the modules.

In part 1, we are going to take a look at the solution architecture, using the diagram below, and at the asks and needs of the feature, and then I’ll outline the structure of the repository and I’ll focus on the modules, with explanations along the way.

Since the team is using an external WAF provider, I won’t be touching that

Proposed solution diagram

Reading the diagram:

  • Clients accessing the SPAs would have to use the domain name URL going through an external WAF provider, instead of the CloudFront URL, this means there’s a need to decide if a connection is allowed/dropped before actually reaching CloudFront — we decided to use Lambda@Edge functions for this purpose;
  • Certificates would be added to ACM by a separate team, we would just reference them in the Terraform code to link them to the distribution;
  • CloudFront distribution to use Origin Access Control to serve the private S3 bucket objects;
  • Encryption in transit is enabled at all steps;
  • All objects in the bucket would benefit from the following:
    - versioning
    - object locking
    - MFA delete protection
    - legal holds automatically applied
    - encryption with a dedicated KMS key
  • All applied policies are as limiting as possible, to allow only the designated actions to happen;
  • A CI/CD pipeline process would require a specific IAM user with very limited permissions in order to deploy new (versions of) objects to the bucket (the CI/CD pipeline part is out of the scope of this article);
  • CloudFront and S3 logging be enabled (out of the scope of this article);
  • Configuration of the external DNS and WAF to point to CloudFront (out of the scope of this article).

DevOps tooling I used:

  • Terraform code to build and deploy the infrastructure — v1.3.8;
  • Terragrunt v0.45.2 wrapper to separate the environments and pass different layers of variables down to Terraform at different levels i.e. Application level, Environment level, and Project level (out of the scope of this article);
  • AWS CLI v2.8.12 + Console for operating the solution services, testing, etc.

The code is structured into 2 repositories:

  • Repository A hosts the Terragrunt environments setup and is usually the place I pass values for variables from;
  • Repository B hosts the Terraform configuration and modules (both in one place, for simplicity) — this is also the repository I’ll be focusing on in this article;

Here’s what the structure of Repository B looks like:

├── repository_a
│ └── (...) # terragrunt - out of scope

└── repository_b

├── configurations
│ │
│ └── single_page_app
│ │
│ ├── backend.tf
│ ├── cf.tf
│ ├── data.tf
│ ├── script.py
│ ├── iam.tf
│ ├── kms.tf
│ ├── lambda.tf
│ ├── locals.tf
│ ├── outputs.tf
│ ├── s3.tf
│ └── variables.tf

└── modules

├── cloudfront
│ │
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf

├── s3_app
│ │
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf

└── lambda

├── data.tf
├── main.tf
├── outputs.tf
├── providers.tf
└── variables.tf

I’m going to present each file below, per folder, but I won’t be going into details for each variable, only what I consider to have been more complex than a generic “string” or “list”.

Module s3_apps:

# modules/s3_app/main.tf

resource "aws_s3_bucket" "this" {
bucket = var.m_s3_bucket_name

# Defining object lock configurations, if any
dynamic "object_lock_configuration" {
for_each = var.m_s3_ol_configs == null ? [] : [var.m_s3_ol_configs]

# Enabling object lock
content {
object_lock_enabled = m_s3_ol_configs.value.object_lock_enabled

dynamic "rule" {
for_each = m_s3_ol_configs.value.rule == null ? [] : [m_s3_ol_configs.value.rule]

content {

# Setting the default object lock retention mode and time
dynamic "default_retention" {
for_each = rule.value.default_retention == null ? [] : [rule.value.default_retention]

content {
mode = default_retention.value.mode
days = default_retention.value.days
years = default_retention.value.years
}
}
}
}
}
}
}

# Disabling ACLs
resource "aws_s3_bucket_ownership_controls" "this" {
bucket = aws_s3_bucket.this.id

rule {
object_ownership = var.m_s3_acl_config
}
}

# Enabling bucket versioning and configuring possible mfa_delete protection (disabled here)
resource "aws_s3_bucket_versioning" "this" {
bucket = aws_s3_bucket.this.id

versioning_configuration {
status = var.m_s3_bucket_versioning
mfa_delete = var.m_s3_mfa_delete
}

lifecycle {
ignore_changes = [
versioning_configuration["mfa_delete"]
]
}
}

# Making sure the bucket is privately accessible only
resource "aws_s3_bucket_public_access_block" "this" {
bucket = aws_s3_bucket.this.id

block_public_acls = var.m_s3_block_public_acls
block_public_policy = var.m_s3_block_public_policy
ignore_public_acls = var.m_s3_ignore_public_acls
restrict_public_buckets = var.m_s3_restrict_public_buckets
}

# Enabling encryption at rest using a specific KMS key
resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
bucket = aws_s3_bucket.this.id

rule {
apply_server_side_encryption_by_default {
sse_algorithm = var.m_s3_algorithm
kms_master_key_id = var.m_s3_kms_key_id
}
}
}

# modules/s3_app/main.tf

resource "aws_s3_bucket" "this" {
bucket = var.m_s3_bucket_name

# Defining object lock configurations, if any
dynamic "object_lock_configuration" {
for_each = var.m_s3_ol_configs == null ? [] : [var.m_s3_ol_configs]

# Enabling object lock
content {
object_lock_enabled = m_s3_ol_configs.value.object_lock_enabled

dynamic "rule" {
for_each = m_s3_ol_configs.value.rule == null ? [] : [m_s3_ol_configs.value.rule]

content {
# Setting the default object lock retention mode and time
dynamic "default_retention" {
for_each = rule.value.default_retention == null ? [] : [rule.value.default_retention]

content {
mode = default_retention.value.mode
days = default_retention.value.days
years = default_retention.value.years
}
}
}
}
}
}
}

# Handling ACLs
resource "aws_s3_bucket_ownership_controls" "this" {
bucket = aws_s3_bucket.this.id

rule {
object_ownership = var.m_s3_acl_config
}
}

# Setting ACLs as private
resource "aws_s3_bucket_acl" "this" {
depends_on = [aws_s3_bucket_ownership_controls.this]

bucket = aws_s3_bucket.this.id
acl = var.m_s3_acl
}

# Enabling bucket versioning and configuring possible mfa_delete protection (disabled here)
resource "aws_s3_bucket_versioning" "this" {
bucket = aws_s3_bucket.this.id

versioning_configuration {
status = var.m_s3_bucket_versioning
mfa_delete = var.m_s3_mfa_delete
}

lifecycle {
ignore_changes = [
versioning_configuration["mfa_delete"]
]
}
}

# Making sure the bucket is privately accessible only
resource "aws_s3_bucket_public_access_block" "this" {
bucket = aws_s3_bucket.this.id

block_public_acls = var.m_s3_block_public_acls
block_public_policy = var.m_s3_block_public_policy
ignore_public_acls = var.m_s3_ignore_public_acls
restrict_public_buckets = var.m_s3_restrict_public_buckets
}

# Enabling encryption at rest using a specific KMS key
resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
bucket = aws_s3_bucket.this.id

rule {
apply_server_side_encryption_by_default {
sse_algorithm = var.m_s3_algorithm
kms_master_key_id = var.m_s3_kms_key_id
}
}
}

Note the “dynamic” blocks, these get populated only if the “for_each” condition does not evaluate to a “null” variable, otherwise, they get ignored, meaning if another configuration does not want to handle object locking, this module can still be called for that use case too.

On the “object_lock_configuration” dynamic block, this can be replaced with the newer “aws_s3_bucket_object_lock_configuration” resource — https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_object_lock_configuration as it will be deprecated soon, but leaving it as is here, because I prefer to have it declared dynamically.

# modules/s3_app/variables.tf

# ignoring most of the variables as they are just defined and contain a default
# keep in mind that all variables used within the module need to be declared in a module .tf file such as this one
(...)

variable "m_s3_ol_configs" {
type = object({
object_lock_enabled = string
rule = object({
default_retention = object({
days = optional(number)
years = optional(number)
mode = string
})
})
})

default = null
}

variable "m_s3_acl" {
type = string
default = "private"
}

variable "m_s3_acl_config" {
type = string
default = "BucketOwnerEnforced"
}

variable "m_s3_mfa_delete" {
type = string
default = "Disabled"
}

I’ve kept an example of the more complex “m_s3_ol_configs” variable used in the “for_each” loop and some defaults for other used variables.

You don’t have to go to such complex lengths when designing the variables like it’s the case for “m_s3_ol_configs", it’s only a matter of how dynamic your module needs to be to serve specific purposes, as we will see later, with cases where you want to have specific resource blocks added or not based on the variable values.

# modules/s3_apps/outputs.tf

output "bucket_id" {
value = element(concat(aws_s3_bucket.this.*.id, [""]), 0)
}

output "bucket_arn" {
value = element(concat(aws_s3_bucket.this.*.arn, [""]), 0)
}

output "bucket_regional_domain_name" {
value = element(concat(aws_s3_bucket.this.*.bucket_regional_domain_name, [""]), 0)
}

The outputs file contains a few elements we will be using in the configurations to link CloudFront to the bucket as well as other resources.

Module cloudfront:

# modules/cloudfront/main.tf

resource "aws_cloudfront_distribution" "this" {
comment = var.m_cf_comment
enabled = var.m_cf_enable_distribution
is_ipv6_enabled = var.m_cf_ipv6_enable
price_class = var.m_cf_pc

# it generally takes 15 minutes for the distribution to be deployed, you can ask terraform to wait for completion
wait_for_deployment = var.m_cf_wait

# the root object is optional, usually maps to index.html, you can leave it blank if you want
default_root_object = var.m_cf_default_root_obj

# aliases are optional but if you plan to use a custom SSL certificate and a friendly domain name, set them in a variable
aliases = var.m_cf_aliases

# if there are multiple origins, they will be added dynamically
dynamic "origin" {
for_each = var.m_cf_origins

content {
origin_id = origin.value.origin_id
domain_name = origin.value.domain_name
origin_access_control_id = origin.value.origin_access_control_id
# we don't use the origin_path, but uncomment and set if used
# origin_path = coalesce(origin.value.origin_path, "")

# same for any custom headers per origin
dynamic "custom_header" {
for_each = origin.value.custom_headers == null ? [] : origin.value.custom_headers
iterator = header

content {
name = header.value.name
value = header.value.value
}
}
}
}

default_cache_behavior {
allowed_methods = var.m_cf_default_cache_methods.allowed_methods
cached_methods = var.m_cf_default_cache_methods.cache_methods
target_origin_id = var.m_cf_target_origin_id

forwarded_values {
query_string = var.m_cf_query_string

cookies {
forward = var.m_cf_default_cache_methods.cookies_forward
}
}

min_ttl = var.m_cf_default_cache_methods.min_ttl
default_ttl = var.m_cf_default_cache_methods.default_ttl
max_ttl = var.m_cf_default_cache_methods.max_ttl
viewer_protocol_policy = var.m_cf_default_cache_methods.viewer_policy

# if there are any lambda associations for the default cache, they will be added dynamically
dynamic "lambda_function_association" {
for_each = var.m_cf_default_cache_lambda_associations == null ? [] : [var.m_cf_default_cache_lambda_associations]

content {
event_type = lambda_function_association.value.event_type
lambda_arn = lambda_function_association.value.lambda_arn
include_body = lambda_function_association.value.include_body
}
}
}

# if there are any ordered cache behaviours, they will be added dynamically
dynamic "ordered_cache_behavior" {
for_each = var.m_cf_ordered_cache_behaviors
iterator = behavior

content {
target_origin_id = behavior.value.target_origin_id
path_pattern = behavior.value.cache_behavior.path_pattern
viewer_protocol_policy = behavior.value.cache_behavior.viewer_protocol_policy

allowed_methods = behavior.value.cache_behavior.allowed_methods
cached_methods = behavior.value.cache_behavior.cached_methods
compress = behavior.value.cache_behavior.compress
trusted_signers = behavior.value.cache_behavior.trusted_signers
smooth_streaming = behavior.value.cache_behavior.smooth_streaming
realtime_log_config_arn = lookup(behavior.value.cache_behavior, "realtime_log_config_arn", null)
field_level_encryption_id = behavior.value.cache_behavior.field_level_encryption_id

# set these to null to enable use of cache-control headers.
# if using cache-control headers, TTLs must be set as: default: 1day; min: 0s; max: 365days
min_ttl = behavior.value.cache_behavior.min_ttl
default_ttl = behavior.value.cache_behavior.default_ttl
max_ttl = behavior.value.cache_behavior.max_ttl

# same for any forward rules per cache behaviour
dynamic "forwarded_values" {
for_each = behavior.value.cache_behavior.forwarded_values[*]

content {
query_string = forwarded_values.value.query_string
query_string_cache_keys = forwarded_values.value.query_string_cache_keys
headers = forwarded_values.value.headers

cookies {
forward = forwarded_values.value.cookie_forward_type
whitelisted_names = forwarded_values.value.cookie_whitelisted_names
}
}
}
}
}

restrictions {
geo_restriction {
restriction_type = var.m_cf_geo_restriction_type
locations = var.m_cf_geo_locations
}
}

viewer_certificate {
# these 2 options will only be enabled if not using the default CloudFront certificate
minimum_protocol_version = var.m_cf_minimum_tls_version
ssl_support_method = var.m_cf_ssl_method

# this should be set to disabled if you use your own domain certificate for the CF distribution instead of a default one
cloudfront_default_certificate = var.m_cf_cloudfront_default_certificate

# a valid domain certificate needs to be uploaded to ACM and the ARN needs to be retrieved and set here
acm_certificate_arn = var.m_cf_cloudfront_certificate_arn
}
}

I tried keeping the module as dynamic as possible for the task at hand, meaning it can be reused in multiple setups and potentially with static websites as well, or multiple origins and cache behaviors.

Few important notes:

  • setting a default root object (i.e. “index.html) is optional and it works by you being pointed to that file if you don’t specify anything after the CloudFront URL address. And if you do set one, you will constantly get an access denied message until you actually put that object inside the bucket:
  • aliases are also optional, and you can leave the list blank if, for instance, you are only testing the scenario and you plan to use the default CloudFront certificate generated automatically, but you can populate them if you plan to use your own production-grade domain certificate, uploaded to ACM, to benefit from friendly names. Below is what a default vs custom certificate would look like:
  • if you do use a default CloudFront SSL certificate instead of your own, some security features like the latest TLS policy will not be enabled, so bare this in mind, such deployments should not be considered production worthy.
  • if you don’t have any geo-restrictions in place, you will have to add these values to the following variables:
    - m_cf_geo_restriction_type: “none”
    - m_cf_geo_locations: []
  • you can check all the features enabled in your CloudFront distribution with this CLI command:
aws --profile <profile_name> cloudfront get-distribution-config --id <CFDISTID>

Moving on with the terraform code, here are a few variables examples

# module/cloudfront/variables.tf

(...)

variable "m_cf_origins" {
type = list(object({
origin_id = string
domain_name = string
# origin_path = string
origin_access_control_id = string # [required when using OAC] OAC ID
custom_headers = list(object({
name = string
value = string
}))
}))

default = []
}

variable "m_cf_default_cache_methods" {
type = object({
path_pattern = string
allowed_methods = list(string)
cache_methods = list(string)
viewer_policy = string
min_ttl = number
default_ttl = number
max_ttl = number
forwarded_headers = string
cookies_forward = string
})

default = null
}

variable "m_cf_default_cache_lambda_associations" {
type = list(object({
event_type = string # values: viewer-request, origin-request, viewer-response, origin-response
lambda_arn = string
include_body = string
}))

default = null
}

variable "m_cf_ordered_cache_behaviors" {
type = list(object({
target_origin_id = string
# (...) long list of parameters you can find in the resource documentation
max_ttl = number
forwarded_values = object({
query_string = bool
query_string_cache_keys = list(string)
headers = list(string)
cookie_forward_type = string
cookie_whitelisted_names = list(string)
})
}))

default = []
}

(...)

I only chose the ones that are the most complex, but this file should contain all the variables used in the module.

# module/cloudfront/outputs.tf

output "cf_distribution_domain_name" {
value = aws_cloudfront_distribution.this.domain_name
}

output "cf_distribution_arn" {
value = aws_cloudfront_distribution.this.arn
}

output "cf_distribution_id" {
value = aws_cloudfront_distribution.this.id
}

Module lambda:

# module/lambda/main.tf

resource "aws_lambda_function" "this" {

# required set to True for Edge functions
publish = var.m_lmb_at_edge ? true : var.m_lmb_publish

timeout = var.m_lmb_at_edge ? min(var.m_lmb_timeout, 30) : var.m_lmb_timeout
filename = "${var.m_lmb_name}.zip"
function_name = var.m_lmb_name
role = var.m_lmb_iam_role_arn
handler = "${var.m_lmb_source_file_name}.${var.m_lmb_handler}"
runtime = var.m_lmb_runtime
source_code_hash = data.archive_file.lambda.output_base64sha256

# environment variables work on regular functions, not on edge functions
# we define it as dynamic so that if we do use edge functions, this section will get ignored
dynamic "environment" {
for_each = length(keys(var.m_lmb_env_vars)) == 0 ? [] : [true]
content {
variables = var.m_lmb_env_vars
}
}
}

We have a lightweight module here, with the following gotchas:

  • in order for a Lambda function to become a Lambda@EDGE function, we need to specify the “publish = true” parameter in the resource;
  • roles need to be created and assigned to Lambdas — and we create the role in the configurations, so that any specific permissions get templated at that level, per needs, and not at module level;
  • the “handler” parameter takes the value of “lambda_function_file_name” without an extension + the defined function name within that file, in this case, “handler” (as we will see below);
  • we use “source_code_hash” to trigger updates to the function (important especially when testing your code);
  • environment variables are not supported in edge functions, so we declare a “dynamic” environment section that does not get populated for edge function, but, if we use the module with regular functions, it will be there to allow taking in any environment variables — I’ll show you how I pass variables to the function when getting to the configurations section.
# module/lambda/variables.tf

(...)

variable "m_lmb_env_vars" {
description = "Map of environment variables to be passed to the lambda"
type = map(string)
}

variable "m_lmb_publish" {
description = "Whether to publish creation/change as new Lambda Function Version."
type = bool
default = false
}

(...)

These are the most complex variables in the Lambda module :D

# module/lambda/outputs.tf

output "function_arn" {
value = aws_lambda_function.this.arn
}

We will use the function’s ARN to link the function to the CloudFront distribution later.

The data.tf file designates uploading a local zip file:

data "archive_file" "lambda" {
type = var.m_lmb_archiver
output_path = "${var.m_lmb_name}.${var.m_lmb_archiver}"

source_file = "path/to/your/script.py"
}

I’ve also added a python script at the configuration level, that drops all connections going directly to the CloudFront distribution, but allows connections coming from the external WAF:

def handler(event, context):
request = event["Records"][0]["cf"]["request"]
host_key = request["headers"]["host"][0]["key"]
host_value = request["headers"]["host"][0]["value"]
distribution = event["Records"][0]["cf"]["config"]["distributionDomainName"]

## Drop the request if it's going directly to the CloudFront URL
if host_key == "Host" and distribution in host_value:
return {
"status": "403",
"statusDescription": "Forbidden",
"body": "Direct Access To the Cloudfront URL is Denied",
}

## Else return the request unchanged
return request

I used this AWS page to understand how a viewer request event looks like to set up the basic script above (basic, but hey, it does its job :) ).

The providers.tf section contains only the AWS provider:

terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}

That’s it for the 3 modules I used for this solution. Hopefully, you won’t have any difficulties filling in the blanks with the more simple variables used.

In part 2 — the configuration, we will add parametrization on top of the modules and we will also pass on things like policies, lambda code, etc to deploy the entire solution.

Leave a message or a clap if you enjoyed or found this article useful and see you in part 2!

--

--