Container orchestration in AWS.

Working with ECS.

Run scalable applications on multiple environments.

Karishma

Published in

CodeX

6 min readAug 18, 2023

Recently I worked on a project using a different container orchestrator— ECS with Terraform as IaC. I would like to share my learnings here.

Containers:

Containers provide a standard way to package application’s code, configurations, and dependencies into a single object. So it eliminates “this works on my machine” because of consistency in the setup 😃.

Containers share an OS. They run as resource-isolated processes, ensuring quick, reliable, and consistent deployments.

Container orchestration:

A container orchestration service is responsible for co-ordinating the “where” and “how” for containers.

Advantages of orchestrator:

Scale service in and out to meet the demand of users (of the app).
Replace unresponsive / failed apps.
Give insights into application logs, system metrics like CPU and Memory.

This article will focus on Amazon Elastic Container Service using EC2 launch type. ECS is a scalable, high-performance orchestrator that supports Docker containers. Now we’ll explore some concepts of ECS.

ECS CLUSTER

It is a logical grouping of tasks or services.

If you are running tasks or services that use the EC2 launch type, a cluster is also a grouping of container instances. Note : Each EC2 instance has an (ECS) agent that registers the instance to the cluster.
If you are using capacity providers, a cluster is also a logical grouping of capacity providers.
A Cluster can be a combination of Fargate and EC2 launch types.

resource "aws_ecs_cluster" "ecs_cluster" {
  name = "terraform-ecs-cluster-${terraform.workspace}"
}

TASK DEFINITION

The task definition describes one or more containers — up to a maximum of ten — that forms the application.

You can specify multiple parameters for a task definition depending on the launch type. Read more on Amazon ECS Task Definitions.

resource "aws_ecs_task_definition" "task_example" {
  family                = "service"
  cpu                   = 512
  memory                = 512
  container_definitions = jsonencode([
    {
      name         = "hello"
      cpu          = 512
      memory       = 512
      image        = data.aws_ecr_repository.ecr_example.repository_url
      essential    = true
      portMappings = [
        {
          containerPort = <port>
          hostPort      = <port>
        }
      ]
    }
  ])
  requires_compatibilities = ["EC2"]
  network_mode             = "awsvpc"
}

SERVICES

Service runs specified number of instances of a task definition, called task. Task / Service scheduler maintains the desired count of tasks in the service depending on the scheduling strategy used.

There are two service scheduler strategies available:

REPLICA

By default, this strategy places and maintains the desired number of tasks across Availability Zones. For more information, see Replica.

DAEMON

This strategy deploys one task on each active container instance that meets all of the task placement constraints specified in the cluster. It can also stop running tasks that do not meet the placement constraints anymore. There is no need to specify desired number of tasks, task placement strategy, or use Service Auto Scaling policies for this strategy. For more information, see Daemon.

resource "aws_ecs_service" "service_example" {
  name                  = "ecs-service-${terraform.workspace}"
  cluster               = var.ecs_cluster_id
  task_definition       = aws_ecs_task_definition.task_example.arn
  launch_type           = "EC2"
  desired_count         = 1
  wait_for_steady_state = true
  load_balancer {
    container_name   = "hello"
    container_port   = 3000
    target_group_arn = var.ecs_alb_tg_arn
  }
  network_configuration {
    subnets         = [var.first_pvt_subnet_id, var.second_pvt_subnet_id]
    security_groups = [var.ecs_security_group_id]
  }
}
//scheduling_strategy (not specified above) is `Replica` by default.

NETWORKING

For EC2 launch type, the allowable network mode depends on the underlying EC2 instance’s OS.
For Linux, awsvpc, bridge, host and none modes can be used.
For Windows, only the NAT mode is allowed.

awsvpc — The task is allocated its own elastic network interface (ENI) and a primary private IPv4 address. This gives the task the same networking properties as Amazon EC2 instances.
bridge — The task utilises Docker’s built-in virtual network which runs inside each Amazon EC2 instance hosting the task.
host — The task bypasses Docker’s built-in virtual network and maps container ports directly to the ENI of the Amazon EC2 instance hosting the task. As a result, you can’t run multiple instantiations of the same task on a single Amazon EC2 instance when port mappings are used.
none — The task has no external network connectivity.
NAT — Docker for Windows uses a different network mode than Docker for Linux.

You can read Amazon ECS task networking for details. For more information about Docker networking, see Networking overview.

Recommended awsvpc in more detail :

This mode allocates an elastic networking interface to each running task, providing a dynamic private IP address and internal DNS name. awsvpc allows tasks to run with full networking features on AWS.

Advantages of awsvpc:

Addressable by IP addresses and the DNS name of the ENI.
Attachable as ‘IP’ targets to Application Load Balancers and Network Load Balancers.
Observable from VPC flow logs.
Integration into CloudWatch logging and Container Insights.
Access controlled by security groups.
Enables running multiple copies of the same task definition on the same instance, without port conflicts.
Higher performance because there is no need to perform any port translations or contend for bandwidth on the shared docker0 bridge, as you do with the bridge networking mode.

CAPACITY PROVIDER

Capacity Providers manage autoscaling for EC2 instances (i.e. ECS service). It is paired with an Auto Scaling Groups (ASG) for cluster auto scaling.

Each cluster has one or more capacity providers and an optional default capacity provider strategy. The capacity provider strategy determines how the tasks are spread across the capacity providers.

With EC2 Capacity providers, you can spread tasks across different ASGs. Example, spread tasks across an on-demand ASG and an ASG for EC2 Spot instances. Check out the official AWS documentation for details.

You can also configure warm pool for ASG to reduce the launch time of instances.

resource "aws_autoscaling_group" "auto_scale" {
  name                  = "asg-${terraform.workspace}"
  max_size              = var.asg_max_size
  min_size              = var.asg_min_size
  desired_capacity      = var.asg_desired_capacity
  protect_from_scale_in = true
  launch_template {
    id = aws_launch_template.launch_config.id
  }
  vpc_zone_identifier = [var.first_pvt_subnet_id, var.second_pvt_subnet_id]
}

resource "aws_ecs_capacity_provider" "capacity_provider" {
  name = "capacity-provider-${terraform.workspace}"
  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.auto_scale.arn
    managed_termination_protection = "ENABLED"
    managed_scaling {
      target_capacity = 100
    }
  }
}

resource "aws_ecs_cluster_capacity_providers" "cluster_capacity_provider" {
  cluster_name       = aws_ecs_cluster.ecs_cluster.name
  capacity_providers = [aws_ecs_capacity_provider.capacity_provider.name]
  default_capacity_provider_strategy {
    base              = 1
    weight            = 100
    capacity_provider = aws_ecs_capacity_provider.capacity_provider.name
  }
}

ECS AUTO SCALING

This enables us to increase / decrease the number of ECS tasks i.e. application scaling.

Kinds of auto scaling :

Target tracking: Based on the target value of a specific CloudWatch metric. Three metrics on which such application scaling is possible — CPU utilisation, memory utilisation, ALB request count per target. Example, CPU utilisation in the cluster should remain at 40% at all times.
Step scaling : Based on a specified CloudWatch alarm. Example, if CPU goes above 70% (for a period) add units and if it goes below 30% (for a period) then remove 1 unit.
Scheduled scaling : based on specified date / time for predictable usage pattern.

Note: Scaling needs to be considered in two places for an EC2 backed cluster — for infrastructure (i.e. EC2 instances) and the application.

APPLICATION LOAD BALANCER (ALB)

You can run your service behind a load balancer. The load balancer distributes traffic across the tasks that are associated with the service. ALB takes the requests from the users and passes the request directly to the ECS tasks running in the cluster.

resource "aws_lb" "ecs_alb" {
  name                       = "ecs-alb-${terraform.workspace}"
  security_groups            = [var.alb_security_group_id]
  drop_invalid_header_fields = true
  subnets = [
    var.first_public_subnet_id,
    var.second_public_subnet_id
  ]
}

resource "aws_lb_target_group" "ecs_alb_target_group" {
  name        = "ecs-alb-target-group-${terraform.workspace}"
  port        = <port>
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"
}

resource "aws_lb_listener" "alb_listener" {
  load_balancer_arn = aws_lb.ecs_alb.arn
  port              = <port>
  protocol          = "HTTP"
  default_action {
    target_group_arn = aws_lb_target_group.ecs_alb_target_group.arn
    type             = "forward"
  }
  tags = {
    Name = "lb-listener-${terraform.workspace}"
  }
}

You can refer to my sample code here.

Architecture of the code linked above :

A VPC with 2 public and private subnets each.
The subnets were distributed in 2 availability zones each.
An internet gateway attached to the public subnets for internet accessibility.
NAT (created in one public subnet) attached to the private subnets for internet accessibility.
Security groups for ALB (accepts requests only from specified port) and ECS service (accepts requests only from the ALB).
ECS cluster is of EC2 launch type. A launch template specifies EC2 instance details. To fetch latest ECS optimised and cost-effective AMI, run :

aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2/recommended/image_id --region <AWS region>

ECS service and auto scaling group are in the 2 private subnets.
ALB is in the 2 public subnets.
ECS cluster could be setup on different environments. Here 2 environments are used — Prod and QA.
One or more ECS services could be deployed on each of these ECS clusters. Here 1 service has been setup.
Single AWS region is being used.

Hope this article gives you a fair overview of launching applications using ECS!

Container orchestration in AWS.

Working with ECS.

Run scalable applications on multiple environments.

Containers:

Container orchestration:

Written by Karishma