Centralize AWS Multi Account VPC Flow Logs in S3 Bucket

Aizhamal Nazhimidinova

Published in

aKumoSolutions-DevOps

8 min readFeb 1, 2024

Table of content:

Introduction
Set up S3 buckets and Enable Flow Logs
Configure Cross-Account Replication
Implementation
Conclusion

Introduction

VPC Flow Logs serve as a critical element within AWS networking, offering comprehensive insights into the traffic within your Virtual Private Cloud (VPC). With the ability to capture detailed network traffic data, encompassing source and destination addresses, ports, and protocols, VPC Flow Logs provide a powerful tool for analyzing and monitoring network activity. This information proves invaluable for troubleshooting, security analysis, and compliance monitoring, contributing to an elevated level of visibility and operational efficiency within your AWS infrastructure. In this article, we will delve into the process of setting up S3 buckets, enabling VPC Flow Logs, and configuring cross-account replication to enhance the overall management of network data.

Set up S3 buckets and Enable VPC Flow Logs

In the scope of our AWS infrastructure, we are instituting the deployment of Amazon S3 buckets within our Dev and Prod accounts. These S3 buckets will serve as repositories for VPC Flow Logs, contributing to enhanced network visibility. Additionally, a thoughtful data lifecycle management strategy will be implemented, with a lifecycle rule designed to automatically purge objects from these buckets after a period of specific amount of days, promoting efficient resource utilization.

Roles and Policies

Central to these initiatives is the implementation of cross-account execution roles. These roles serve as the cornerstone for secure and seamless orchestration of operations across accounts. In addition, we will leverage IAM roles within the S3 buckets to enforce fine-grained access controls, ensuring that only authorized entities can interact with the data stored within the buckets.

Establish Centralized S3 bucket

In parallel, a Centralized S3 bucket, specifically designated for the aggregation of VPC Flow Logs, will be established within the Logging account. This Centralized bucket ensures streamlined log management and facilitates comprehensive analysis.

Configuring Cross-Account Replication

To enhance data resilience and accessibility, a robust replication mechanism is being implemented between multiple AWS Accounts. This strategic approach not only improves data redundancy but also provides support for Centralized Logging and Disaster Recovery scenarios.

Retention Policy

As part of comprehensive data governance practices, retention policies will be enforced within the S3 buckets for Centralized VPC Flow Logs. Objects residing in this S3 bucket will automatically transition to Glacier storage after a specified period. This archival strategy not only aligns with best practices for long-term data storage but also optimizes costs associated with data preservation.

Implementation

To achieve cross account replication and centralize logs create a Centralized destination S3 bucket in Logging account with a policy that allows other accounts to replicate objects. There will be source Dev and Prod accounts each with an S3 bucket to capture VPC flow logs within the account. The buckets have replication enabled with a Centralized bucket as a destination. Jenkinsfile has utilized every file in this repository structure below.

.
├── Jenkinsfile
├── README.md
├── destination_s3_bucket
│   ├── backend.tf
│   ├── main.tf
│   ├── providers.tf
│   ├── var.tfvars
│   └── variables.tf
├── dev
│   ├── backend.tf
│   ├── var.tfvars
│   ├── main.tf
│   └── variables.tf
└── prod
    ├── backend.tf
    ├── main.tf
    ├── var.tfvars
    └── variables.tf

3 directories, 15 files

Steps

Destination_s3_bucket

backend — fill out your backend with your project information or use local backend

terraform {
  backend "s3" {
    bucket         = ""
    key            = ""
    region         = ""
    dynamodb_table = ""
  }
}

main.tf — I am using aws_caller_identity to be able to get account id’s that will be replicating to destination centralized bucket, meaning of my Dev and Prod. The role that is being created has a policy that allows accounts to replicate to my Centralized Destination bucket.

data "aws_caller_identity" "dev" {
  provider = aws.dev
}

data "aws_caller_identity" "prod" {
  provider = aws.prod
}

resource "aws_s3_bucket" "destination" {
  bucket = var.bucket_name
}

resource "aws_s3_bucket_lifecycle_configuration" "bucket-config" {
  bucket = aws_s3_bucket.destination.id
  rule {
    status = "Enabled"
    id     = "transition-to-glacier"
    transition {
      days          = 60
      storage_class = "GLACIER"
    }
  }
}

resource "aws_s3_bucket_versioning" "destination" {
  bucket = aws_s3_bucket.destination.id
  versioning_configuration {
    status = "Enabled"
  }
}

data "aws_iam_policy_document" "destination" {
  statement {
    sid = "replicate-objects-from-source-to-destination-bucket"

    actions   = [
      "s3:ObjectOwnerOverrideToBucketOwner",
      "s3:ReplicateDelete",
      "s3:ReplicateObject",
    ]

    resources = [
      aws_s3_bucket.destination.arn,
      "${aws_s3_bucket.destination.arn}/*",
    ]

    principals {
      type = "AWS"
        identifiers = [
        "arn:aws:iam::${data.aws_caller_identity.shs.account_id}:root",
        "arn:aws:iam::${data.aws_caller_identity.dev.account_id}:root",
      ]
    }  
  }
}

resource "aws_s3_bucket_policy" "destination" {
  bucket = aws_s3_bucket.destination.bucket
  policy = data.aws_iam_policy_document.destination.json
}

variables.tf

variable "bucket_name" {
  description = "Name of the destination S3 bucket"
}

var.tfvars

bucket_name = "your-destination-centralized-bucket-name"

Dev — environment folder

backend — fill out your backend with your project information or use local backend
main.tf

Fill out the files with the corresponding data.

provider "aws" {
  region = "us-east-1"
  alias = "dev"
  default_tags {
    tags = {
      OU          = ""
      Acc-Name    = ""
      Region      = ""
      Env         = ""
      Purpose     = ""
      Application = ""
      Owner       = ""
      Project     = ""
      Repo-URL    = ""
    }
  }
}


module "s3_replication" {
  source = "git@bitbucket.org://path/to/my/repo"

  # Specify variables for the module
  account_id         = var.account_id
  role_name               = var.role_name
  policy_name             = var.policy_name
  source_bucket_name      = var.source_bucket_name
  destination_bucket_name = var.destination_bucket_name
  destination_bucket_arn  = var.destination_bucket_arn
  vpc_id          = var.vpc_id
  tags            = var.tags
}

variables.tf

variable "source_bucket_name" {
  description = "Name of the source S3 bucket"
}

variable "destination_bucket_name" {
  description = "Name of the destination S3 bucket"
  type        = string
}

variable "destination_bucket_arn" {
  description = "ARN of the destination bucket"
  type        = string
}

variable "role_name" {
  description = "Name of the IAM role for replication"
  type        = string
}

variable "policy_name" {
  description = "Name of the IAM policy for replication"
  type        = string
}

variable "vpc_id" {
  description = "VPC ID"
  type        = string
}

variable "tags" {
  description = "A map of tags to assign to the AWS Flow Log resource"
  type        = map(string)
  default     = {}
}

variable "account_id" {
  type    = string
  description = "AWS destination account ID"
}

var.tfvars

source_bucket_name       = ""
destination_bucket_arn   = ""
destination_bucket_name  = ""
role_name               = ""
policy_name             = ""
vpc_id                  = ""
tags = {
    Name        = ""
  }
account_id              = ""

Prod — environment folder. Copy same code as above for corresponding files and fill them out for Prod account
backend
main.tf
variables.tf
var.tfvars

Modules

There is a utilization of s3_replication module that has all the necessary resources for creating source buckets, enabling VPC Flow Logs, policies and other configurations. Below are the resources that will be created using this module.

s3_replication

main.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 4.67.0"
    }
  }
}

// IAM Role for Replication
resource "aws_iam_role" "replication" {
  name = var.role_name

  assume_role_policy = data.aws_iam_policy_document.assume_role.json
}

data "aws_iam_policy_document" "assume_role" {
  statement {
    effect = "Allow"

    principals {
      type        = "Service"
      identifiers = ["s3.amazonaws.com"]
    }

    actions = ["sts:AssumeRole"]
  }
}

// IAM Policy for Replication
data "aws_iam_policy_document" "replication" {
  statement {
    effect = "Allow"

    actions = [
      "s3:GetReplicationConfiguration",
      "s3:ListBucket",
      "s3:GetBucketLifecycleConfiguration", 
    ]

    resources = [aws_s3_bucket.source.arn]
  }

  statement {
    effect = "Allow"

    actions = [
      "s3:GetObjectVersionForReplication",
      "s3:GetObjectVersionAcl",
      "s3:GetObjectVersionTagging",
    ]

    resources = ["${aws_s3_bucket.source.arn}/*"]
  }

  statement {
    effect = "Allow"

    actions = [
      "s3:ReplicateObject",
      "s3:ReplicateDelete",
      "s3:ReplicateTags",
    ]

    resources = ["${var.destination_bucket_arn}/*"]
  }
}

resource "aws_iam_policy" "replication" {
  name   = var.policy_name
  policy = data.aws_iam_policy_document.replication.json
}

resource "aws_iam_role_policy_attachment" "replication" {
  role       = aws_iam_role.replication.name
  policy_arn = aws_iam_policy.replication.arn
}

// S3 Bucket Configuration
resource "aws_s3_bucket" "source" {
  bucket = var.source_bucket_name
}

resource "aws_s3_bucket_versioning" "source" {
  bucket = aws_s3_bucket.source.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_lifecycle_configuration" "l1" {
  bucket = var.source_bucket_name
  rule {
    status = "Enabled"
    id     = "expire_all_files"
    expiration {
        days = 7
    }
  }
}

resource "aws_flow_log" "example1" {
  log_destination      = aws_s3_bucket.source.arn
  log_destination_type = "s3"
  traffic_type         = "ALL"
  vpc_id               = var.vpc_id
  tags = var.tags
}

// IAM role for flow logs
resource "aws_iam_role" "example1" {
  name = "example1"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "vpc-flow-logs.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
}

resource "aws_iam_role_policy" "example1" {
  name = "example1"
  role = aws_iam_role.example1.id

  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}
EOF
}

// Configure replication rule
resource "aws_s3_bucket_replication_configuration" "source-to-destination" {
  role   = aws_iam_role.replication.arn
  bucket = aws_s3_bucket.source.id
  rule {
    id     = "replicate-all"
    status = "Enabled"
    destination {
      bucket        = var.destination_bucket_arn
      storage_class = "STANDARD"
      #account attribute used in cross-account replication
      account = var.account_id 
      #also used for cross-account replication only
      access_control_translation {
      owner = "Destination"
      } 
    }
  }
}

variables.tf

variable "source_bucket_name" {
  description = "Name of the source S3 bucket"
}

variable "destination_bucket_name" {
  description = "Name of the destination S3 bucket"
  type        = string
}

variable "destination_bucket_arn" {
  description = "ARN of the destination bucket"
  type        = string
}

variable "role_name" {
  description = "Name of the IAM role for replication"
  type        = string
}

variable "policy_name" {
  description = "Name of the IAM policy for replication"
  type        = string
}
variable "vpc_id" {
  description = "The ID of the VPC for flow logs"
  type        = string
}

variable "tags" {
  description = "A map of tags to assign to the AWS Flow Log resource"
  type        = map(string)
  default     = {}
}

variable "account_id" {
  type    = string
  description = "AWS account ID"
}

Below is my Jenkinsfile.

pipeline {
    agent any
    parameters {
        choice(
            name: 'tfaction',
            description: 'Select command',
            choices: ['apply', 'destroy']
        )
    }
    stages {
        stage('Run Terraform for Destination S3 Bucket') {
            steps {
                dir("destination_s3_bucket") {
                    script {
                            sh "terraform init"
                            sh "terraform validate"
                            sh "terraform plan -var-file=/path/to/your/var.tfvars"
                            sh "terraform ${params.tfaction} -var-file=/path/to/your/var.tfvars -auto-approve"
                    }
                }
            }
        }

        stage('Run Terraform for Dev') {
            steps {
                dir("dev") {
                    script {
                            sh "terraform init"
                            sh "terraform validate"
                            sh "terraform plan -var-file=/path/to/your/var.tfvars"
                            sh "terraform ${params.tfaction} -var-file=/path/to/your/var.tfvars -auto-approve"
                    }
                }
            }
        }

        stage('Run Terraform for Prod') {
            steps {
                dir("prod") {
                    script {
                            sh "terraform init"
                            sh "terraform validate"
                            sh "terraform plan -var-file=/path/to/your/var.tfvars"
                            sh "terraform ${params.tfaction} -var-file=/path/to/your/var.tfvars -auto-approve"
                    }
                }
            }
        }
    }
    post {
        always {
            cleanWs()
        }
    }
}

The pipeline has three stages, each representing different accounts (destination_s3_bucket, dev, and prod).

Each stage performs Terraform operations for the corresponding environment:

Initialization (terraform init)
Validation (terraform validate)
Planning (terraform plan)
Execution (terraform apply or terraform destroy) based on the chosen action (params.tfaction)
The cleanWs() step is specified in the post-build section, ensuring that the workspace is cleaned up after each build, regardless of the build result.

After running this pipeline all resources will be created successfully.

Conclusion

We successfully configured VPC flow logs for two accounts in the same region and directed the logs to separate S3 buckets. Additionally, we accomplished the consolidation of all logs into a centralized S3 bucket across multiple accounts. This holistic S3 bucket management strategy adheres to industry standards, emphasizing strong security practices, optimized resource utilization, and a well-structured approach to data governance. With this setup, we have established a foundation to seamlessly forward logs to monitoring and logging tools, allowing for efficient issue detection and response.