Note: Some terms and definitions may have changed. EKS may have superseded ECS.
You may be a growing business and have to spin and scale your docker containers/services dynamically as your demand grows. All those containers that are created dynamically have to be load balanced, monitored for failures or restarts, autoscaled based on network traffic or based on cluster metrics like CPU and RAM utilization etc.
A container cluster is essentially a collection of docker containers running on multiple machines/servers. In our context of ECS, each and every cluster is specific to a particular docker image that is being run as multiple replicas for high availability. In other words, it is not a composition of different docker containers/services created from different docker images.
At first, we need to understand some AWS jargon before we jump start creating a container cluster.
Load balancers: AWS provides multiple load balancers that can be used based on the requirement.
Elastic load balancer(ELB) is one of the classic load balancer that balances load between EC2 instances/machines with services running on one single port (Eg: port 80 for web services). This is mostly used for non dockerized applications.
Application load balancer(ALB) is most recent edition to load balancer family that has more features like dynamic port mapping(multiple host ports can be dynamically mapped load balancer), path based routing etc. We have used ALB as our load balancer.
VPC(Virtual Private Cloud): If your cluster has to interact with multiple other services in the cloud, they all should share the same VPC. VPCs are spread across availability zones within a region. More info
Subnet: A subnet is again an isolation of resources by ip address range with in an availability zone and it cannot span across availability zones.
IAM(Identity and Access management): Your cluster should also share the same IAM roles as other services if you have any IAM roles defined for either of the services.
Security group: They are basically network ACL rules that are configured for EC2 instances that regulates the port communications in and out.
availability zones: AWS spread across the world in multiple regions and each region is again sub divided into multiple availability zones. Your instances can exist in multiple availability zones for high availability. More info
Cluster: A cluster is a logical grouping of EC2 instances that ECS uses to start containers. Each cluster resource(EC2 instance) will have a container agent that facilitates spinning of docker containers within it.
Task definition: A task definition is one important chunk of configuration for ECS cluster. It basically is a composition of your Dockerfile and docker-compose.yml. Your docker image, hard and soft limits for RAM and CPU, type of EC2 instance to use/cluster to use, Environment variables and host port mapping are important things that you can configure. In case of ALB, you should set host port to 0 to make ECS figure out the port mapping of your instances with ALB. More info
Service: A service allows you to run specified number of tasks of particular task definition on the cluster and all these tasks/containers are monitored for failures and restarted automatically by service scheduler. More info
— Prerequisites for creating an auto scaling group: Launch configuration, Target groups, ALB
Auto scaling groups: As ECS takes care of scaling the docker container instances on the available cluster resources. Auto scaling group can take care of scaling the cluster resources like the count of EC2 instances available in the cluster. You can attach load balancer to autoscaling group and define policies to auto scale based on the load balancer health checks. There are few policies available for scaling like policies listed here.
— You can either attach a load balancer or target groups to auto scaling group. A load balancer in turn can be attached a target group. More info
Launch configuration: An auto scaling group can be created from a launch configuration. It defines the type of EC2 instance, image on those instances, detailed cloudwatch monitoring, ip addresses, security groups, region, IAM roles and key pairs to login to those EC2 instances. More info.
— Once you have launch configuration, you can start creating a auto scaling group with it. More info
Target groups: Target groups are used by the load balancer to perform health checks and route traffic to them. It takes port, protocol, VPC and health check configuration.After a target group is created, you need to register some targets with load balancer to route traffic to specific targets.
ECS(Elastic Container Service): It is a service provided by AWS for dynamically scaling a docker container cluster across instances/regions/availability zones based on metrics from cloudwatch, Load balancers etc. More info
ECR(Elastic Container Registry): It is similar to docker dub where docker images can be stored. You can have multiple repositories and each can hold multiple images. All the repositories under one AWS account logically grouped as a registry. More info
CloudWatch: Each and every EC2 instance created from AWS has a daemon that constantly runs and collects statistics of the instance to estimate its health and also provide dashboard view of the various metrics like CPU, RAM, network in/out etc.
There are two ways to create a container cluster using ECS.
- Using ECS first run wizard that walks through all the steps required to create a cluster like load balancer, task definition, auto scaling groups, target groups etc.
- Manual way, where you create a target group, load balancer, launch configuration, auto scaling group, task definition, container service, container cluster, IAM roles(if needed), security groups, VPC, subnet, availability zones etc.
Both the approaches have advantages and disadvantages:
Option 1 advantages: Instantly spin up container cluster by just following the on screen steps without knowing much about security policies, auto scaling groups, load balancers etc. Also, each and every cluster created by this approach creates a cloud formation stack template. Having a template is easy to cleanup resources/left overs by just removing the template from cloud formation stack
Option 1 disadvantages: It is not for production clusters. Every time you create a task definition, service definition, cluster definition, load balancer with default names and default IAM roles and VPCs. You don’t have much control on the naming conventions.
Option 2 advantages: Much control on each every service. All the service definitions can be altered separately. Create services only once and reuse them.
Option 2 disadvantages: Need to have good understanding on all of the above mentioned jargon to create a running cluster.
Join our community Slack and read our weekly Faun topics ⬇