Is EC2 Container Service the Right Choice on AWS?
ECS architecture and its features
As of today, there are a handful of container cluster management platforms available for deploying applications in production using containers, Kubernetes, OpenShift Origin, DC/OS, Docker Swarm just to name few. Almost all of them can be deployed on any infrastructure including AWS. Nevertheless, AWS also provides their own container cluster management platform called EC2 Container Service (ECS). At a glance, some may think that ECS would be the right choice as it might be tightly integrated with AWS services. However, before taking a quick decision it might be worthwhile to go through ECS architecture and see how things work internally. In this article we will go through its features, EC2 resources required for setting up an ECS cluster, and finally evaluate and see whether ECS suites best for a container based deployment on AWS.
EC2 Container Service Architecture
ECS uses tasks for scheduling containers on the container cluster similar to DC/OS. A task definition specifies the container image, port mappings (container ports, protocols, host ports), networking mode (bridge, host) and memory limits. Once a task definition is created tasks can be created either using the service scheduler, a custom scheduler or by manually running tasks. The service scheduler is used for long running applications and manual task creation can be used for batch jobs. If any business specific scheduling is needed a custom scheduler can be implemented. Consequently, a task would create a container on one of the container cluster hosts by pulling the container image from the given container registry and applying the port mappings, networking configuration, and resource limits.
Once a container is created the ECS service will use the health checks defined in the load balancer and auto recover the containers in unhealthy situations. Healthy and unhealthy conditions of the containers can be fine tuned according to the application requirements by changing the health check configuration.
In ECS CloudWatch alarms needs to be used for setting up autoscaling. Here AWS has utilized existing monitoring features for measuring the resource utilization and taking scaling up/down decisions. It also seems to support scaling the EC2 instances of the ECS cluster.
Currently, in ECS container ports are exposed using dynamic host port mappings and does not use an overlay network. As a result, each container port will have an ephemeral host port (between 49153 and 65535) exposed on the container host if the networking mode is set to bridge. If the host network mode is used the container port will be directly opened on the host and subsequently, only one such container will be able to run on a container host. Load balancing for above host ports can be done by creating an application load balancer and linking it to an ECS service. The load balancer will automatically update the listener ports based on the dynamic host ports provided via the service.
It might be important to note that due to this design, containers on different hosts might not be able to directly communicate with each other without discovering their corresponding host ports. The other solution would be to use the load balancer to route traffic if the relevant protocols support load balancing. Protocols such as JMS, AMQP, MQTT and Apache Thrift which use client-side load balancing might not work with a TCP load balancer and would need to discover the host ports dynamically.
Container Image Management
ECS supports pulling container images from both public and private container registries that are accessible from AWS. When accessing private registries Docker credentials can be provided via environment variables. ECS also provides a container registry service for managing container images within the same AWS network. This service would be useful for production deployments for avoiding any network issues that may arise when accessing external container registries.
AWS recommends setting up any deployment on AWS within a Virtual Private Cloud (VPC) for isolating its network from other deployments which might be running on the same infrastructure. The same may apply to ECS. The ECS instances may need to use a security group for restricting the ephemeral port range only to be accessed by the load balancer. This will prevent direct access to container hosts from any other hosts. If SSH is needed, a key pair can be given at the ECS cluster creation time and port 22 can be added to the security group when needed. For both security and reliability, it would be better to use ECS container registry and maintain all required container images within ECS.
Depending on the deployment architecture of the solution the load balancer security group might need to be configured to restrict inbound traffic from a specific network or open it to the internet. This design would ensure only the load balancer ports are accessible from the external networks.
Any container based deployment would need a centralized logging system for monitoring and troubleshooting issues as all users may not have direct access to container logs or container hosts. ECS provides a solution for this using CloudWatch Logs. At the moment it does not seem to provide advanced query features such as with Apache Lucene in Elasticsearch. Nevertheless, Amazon Elasticsearch Service or a dedicated Elasticsearch container deployment could be used as an alternative. Need to find more information on that.
EC2 Resources Needed for ECS
ECS pricing gets calculated for the EC2 resources being used for a deployment. A typical ECS deployment would need following EC2 resources:
- A virtual private cloud (VPC)
- An ECS cluster definition
- EC2 instances for the ECS cluster
- A security group for the above EC2 instances
- ECS task definitions for containers to be deployed
- An ECS service for each task definition
- An application load balancer
- Target groups for the load balancer
- A security group for the load balancer
Choosing ECS on AWS over other Container Cluster Managers
At the time this article is written I was only able to notice one advantage of using ECS on AWS over any other container cluster managers. That is with ECS the container cluster manager controller is provided as a service. If Kubernetes, OpenShift Origin, DC/OS, or DockerSwarm is used on AWS, a set of EC2 instances would be needed for running the controller and its dependent components with high availability. The same may apply to Kubernetes when running on Google Cloud Platform (GCP) where the master and etcd nodes are provided as services. Nevertheless, in terms of the container cluster management features ECS still lacks some of the key features provided by other vendors. Some of them are overlay networking, service discovery via DNS, rollouts/rollbacks, secret/configuration management, and multi-tenancy.
In conclusion, it is clearly evident that ECS provides core container cluster management features required for deploying containers in production. Most of them have been implemented by reusing existing AWS services such as EC2 instances, elastic load balancing, CloudWatch alarms/logs, security groups, etc. Therefore, a collection of AWS resources is needed for setting up a complete deployment. Nevertheless, a CloudFormation template can be used for automating this process. For someone who is evaluating ECS it might be better to first identify the infrastructure requirements of the applications and verify their availability in ECS. If applications need container-to-container communication, use client-side load balanced protocols, expose multiple ports, ECS might not work well for those types of applications at the moment.
Recently at the AWS re:Invent 2017 a new managed service called Elastic Container Service for Kubernetes (EKS) was announced by AWS. This is much similar to Google Kubernetes Engine (GKE) where the container cluster manager is provided as a service without having the users to manage it. This might be a better option for running containers on AWS than ECS.