Amazon ECS (EC2 Container Service) provides Container Management Service in AWS without needing to own Cluster Management Infrastructure. This story captures salient features of AWS Container Offerings as a quick reference including the Reinvent:2017 announcements
We will have a quick glossary of various components of ECS followed by key capabilities
Reinvent:2017 Announcements on Containers
Amazon EKS is a managed Kubernetes service in AWS. EKS manages 3 Kubernetes masters in 3 AZs and manages the software upgrades. It integrates with Elastic Loadbalancing, IAM and VPC.
EKS is in preview and will be GA in early 2018
AWS Fargate allows running containers without managing servers or clusters.
- Fargate mode: Package containers, Specify your Container specs, IAM and Networking and AWS will manage the containers for you
Traditional ECS (and EKS) run in EC2 mode. Use ECS or EKS to manage a cluster of servers and schedule tasks. You will be responsible for managing server lifecycle.
The sections below have a comprehensive coverage of Traditional ECS. In future articles, I will cover EKS and Fargate. Stay tuned.
Key facts and Glossary
Cost: Amazon ECS is free of cost and we only pay for the resources we use (EC2 instances/EBS Volumes etc..)
Service Scope: ECS is a regional service that can be deployed across multiple availability zones within a VPC
Containers: ECS supports Docker containers. Docker Containers contain everything that needs for an application to run including all dependencies like code, runtime, system tools and libraries.
Image: Containers are created from a read-only template called Image. Images are typically built from a Dockerfile
Registry: Images are stored in a Docker registry. You can use your own registry, or Docker Hub or AWS provided ECR (EC2 Container Service)
Task Definitions: Task Definitions are blueprints that define what containers to use, their resource specifications like memory/cpu, the ports that need to be exposed, the volumes that will be used, permissions of containers using IAM role, networking details.
Task: Task is the instantiation of Task Definition. Tasks can be run individually or as part of a Service Definition
Cluster: ECS Cluster is a logical grouping of container instances. ECS downloads the container images into the EC2 instances and run the containers through Task Definitions
Container Instances: Container Instances are EC2 instances that are part of ECS Cluster. These can be part of one or more ASGs and also be individual EC2 instances. Container Instances must be installed with Docker and ECS Agent or alternatively can use an ECS Optimized AMI from Amazon. Container instances must have the IAM role with ecsInstanceRole permissions. Container instances may not be relocated to a different cluster or you may not change the instance type.
ECS Service: An ECS Service is a way to run a specific version of Task Definition with specified number of tasks and a deployment plan
Container Agent: ECS Container agent is a docker container that runs in every ECS Container Instance. Container Agent syncs with ECS Service to run the tasks accurately and report status. Container Agents need an IAM role for ECS Container Instances to run with right permissions and also require connectivity to ECS API endpoints.
- Tasks can be run manually using RunTask API
- Tasks can be run in a schedule using CloudWatch events (like cron jobs)
- Custom schedulers like blox can be plugged in
- Tasks can be run using Services that allow configuration of a specific number of tasks to be run with a deployment configuration of how to update new versions. Check Services
Task Placement Strategies
When placing tasks, certain techniques can be applied to achieve desired results
- binpack places tasks based on least amount of available CPU or Memory minimizing the number of container instances
- random places tasks randomly
- spread places tasks based on key:value pairs for example, you can spread by AZ and then by instanceId
Task Placement Constraints
When placing tasks, certain constraints can be observed.
- distinctInstance: Places each task in a different instance
- memberOf: Places a task based on an expression for example place only on t2 instances or a specific AMI. Refer to the link for details
Task level IAM permissions
- Each Task Definition can be associated with an IAM role for fine-grained permissions
Task level Networking
- bridge mode: Take advantage of dynamic port mappings
- host mode: high performance, but container and host ports have to match, no dynamic ports
- awsvpc mode: Task will get an ENI and private IP. Security Groups can be associated with each Task Definition providing fine-grained security, The instance types will limit how many such tasks can be run due to ENI limits
- Using sourcePath attribute, containers can share a persistent volume
- Using an empty host, containers can share a scratch volume that’s not persisted across task stop/start
- You can mount a read-only volume (like docroot) across many containers
- You can mount volumes from other containers in same Task Definition using volumesFrom
Service Load balancing
- An ECS Service can be load balanced with ALB/NLB or ELB. Each Service can only be attached to 1 Loadbalancer
- Application Loadbalancer(ALB): Supports Application Layer (HTTP/HTTPS), Dynamic Ports, Path-based Routing, Priority rules and SSL Termination. No TCP load balancing
- Network Loadbalancer(NLB): Supports transport layer (TCP/SSL). High throughput. Supports dynamic ports
- Classic Loadbalancer(ELB): Supports both HTTP/HTTPS and TCP/SSL. Doesn’t support dynamic ports.
- If a task fails health check, it will be killed
- Service Autoscaling adjusts desired count within the boundaries of Min/Max Capacity
- Uses CloudWatch alarms to autoscale.
- Both ECS metrics based CW alarms and Custom CW alarms can be used as triggers
Scaling Container Instances
- If Container Instances are part of an ASG, they can be scaled using the ECS Console or by modifying the ASG desired
- ASG can be configured with ECS based Reservation and Utilization alarms to configure scaling policies
Container Registry (ECR)
Amazon ECR is a managed Docker registry service.
- ECR is account level Registry and regional service.
- The EC2 Container Instance should have IAM permissions to access ECR
- ECR only supports private images and needs authentication from an AWS account
- ECS Container Agent logs can be shipped to CW logs.
- The Container Instances will need appropriate IAM permissions
- Container logs can be sent to CloudWatch using awslogs Log driver
Draining Container Instances
- You can prevent scheduling tasks into Container Instance by changing its status to DRAINING using ECS Console or ECS API call. This capability can be used to do AMI updates
Remote Management of Container Instances
- You can use EC2 System Manager to remotely perform tasks like Cleaning up Docker images, perform security updates, view logs etc..
- Run Command will need appropriate IAM policy
Running Containers at startup time
- Many time, we may need System containers that have to be run exactly once in every instance for e,g security/monitoring agents. Running them in startup scripts won’t give resource visibility to ECS.
- You can run ECS managed tasks at start up by using a special User data section as described in this link. You will need runTask IAM policy for ecsInstanceRole to accomplish this
Private Registry Authentication
- Private Docker registries can be authenticated as described here
Image and task clean up
- Unused Images and finished tasks can be clean up using ECS agent settings as described here
Access Container and Agent Metadata
- The Container metadata can be accessed using an environment variable ECS_CONTAINER_METADATA_FILE. This can be used to query about various details of containers like Image, Port mappings
- ECS agent provides API access for introspection using
- Many corporates will be behind firewalls and will need Proxy configured
- ECS Agent can be configured with HTTP Proxy as described in the link
ECS Available Metrics
CW Event integration
- ECS publishes events that can be used as triggers for CW events and can invoke Lambda functions as targets to take actions
- Task Definitions should group containers with a common purpose. Arbitrarily grouping containers will make scheduling difficult
- If ECS Agent is disconnected, make sure that you de-register the Container Instance to prevent corrupted state
- Always stay up-to-date with ECS Container Agent versions
- Validate your Docker version with ECS Container Agent version