Using Amazon ECS at iFixit

Published in

iFixit Engineering

6 min readMar 13, 2018

At iFixit, we primarily run a monolith PHP web application on a LAMP stack to power nearly all aspects of iFixit.com and Dozuki. We recently started a project to set up a data warehouse, and decided to try running analytical and data aggregation tasks as Docker containers that are orchestrated by Amazon’s Elastic Container Service (ECS). Using Docker for things isn’t that big of news anymore, but it was a big first step for us. The following post is a quick summary of what we’ve learned about Amazon ECS so far.

Clusters

A cluster is effectively a pool of resources to run tasks on. A cluster is made up of multiple EC2 instances, which Amazon ECS will automatically schedule tasks across. If you want to run a lot of tasks that don’t take up a lot of memory or CPU time, you can make a cluster with a lot of small EC2 instances, or you can make a cluster with a few large EC2 instances if you have single tasks that need a larger amount of memory or CPU time.

Tasks

To run an application on Amazon ECS, you have to create a task definition. A task definition defines things like in what network context will the application run (the current VPC, a different VPC), which containers the task will run, and how much memory and CPU resources the task will need. Each container assigned to the task can be configured with environment variables, per container CPU and memory limits, exposed ports, docker volumes, etc. Basically any configuration you can give a Docker container.

We are currently using tasks primarily for running containers that run periodically to pull down data from third party services and place said data in our data warehouse.

NOTE: A task should not be used to run multiple containers that do not depend on each other. When one container exits (successfully or not), the task closes all other containers running within it. It is probably best to start out with one container per task.

Services

An Amazon ECS Service is responsible for running a specified number of instances of a given Task. This number can be scaled up or down to handle different loads of work. Amazon makes it easy to place these Tasks behind an Application Load Balancer (ALB), which will balance requests across all of your Tasks.

One hypothetical service could be used to run an “Dozuki” task, which would be a containerized version of our current “app” machine type for the Dozuki SaaS. The service could be configured to always run four app tasks, and place those behind a load balancer. If an app task went away or failed for whatever reason, the service would create a new app task and automatically replace the failed task. Changing the number of running app tasks would also be as simple as changing the number of tasks to maintain in the service configuration.

One service we use currently is called Accretion. Accretion is a sinatra app that listens for POST requests and dumps the JSON body of the POST request into a data warehouse backend. This is a private service that is used to provide an easy interface to store documents in our data warehouse. The ECS accretion Service maintains three Tasks that are each running an accretion HTTP server. Those Tasks are placed behind a load balancer, so that POST requests to the service are distributed across the three tasks. The accretion ECS Service performs regular health checks on each accretion Task, and will replace failed Tasks with new ones.

We’re currently looking in to running Pulldasher, a GitHub dashboard as an ECS Service, instead of running it on an EC2 instance.

Docker Image Registry

In order to run a container for a task or service, Amazon needs to be able to create that container from a Docker image. That image can be hosted in Amazon Elastic Container Registry (ECR), Docker Hub, or a self-hosted registry.

You can think of an image registry as a remote git repository. Whenever you update the code that runs in the container, you build a new Docker image, tag it with a version number, and upload it to a remote place that it can be grabbed from. Amazon ECR is a private Docker registry, so that’s a good place to put things that are either very iFixit-specific, or images that contain information that have keys or other credentials baked in. Storing credentials in docker images probably isn’t the best practice, but if you want to, the images won’t be public in the ECR. Consider using storing configuration and credentials in a service like etcd or passing them as environment variables to the containers.

How To Add A New Docker Image

Go to the ECS page on the AWS Console to see the Repository link in the sidebar. The repository view lists all existing Docker repositories and lets you add new ones.

To add a new one, click on the big blue “Create Repository” button.

The next page will ask you to name the repository.

Once you’ve named the repository, the next page will give you the specific AWS CLI commands to authenticate your docker client with Amazon’s Docker registry, and the correct docker commands to build, tag, and push your image to Amazon ECR.

Assuming you’ve gotten through all of those commands, you can now start Amazon ECS tasks from that docker image.

EC2 vs Fargate

There are two different methods of running ECS tasks. The first (EC2) requires you to provision and maintain the EC2 instances that ECS tasks are run on. The second, called Fargate, let’s Amazon handle all orchestration of where your ECS tasks run. With the given resource requirements for a task, ECS runs tasks somewhere in their cloud. They both have tradeoffs, so it’s nice to know what you’re getting and what you’re giving up with each method.

EC2-backed tasks

When using an ECS cluster with EC2 instances, it is your responsibility to know how many tasks the cluster can support at a time, and you have to know the largest number of resources a single container can use. If you have a cluster of 1000 t2.nano (0.5GB RAM each) instances, you may be able to run tens of thousands of small containers that each use under 100MB of memory, but you won’t be able to schedule a task that uses 1GB of memory, because there isn’t a single compute node that can handle that at one time.

However, with great responsibility comes great power (or something like that). Because you control the EC2 instances that are running those containers, you can SSH into those machines and inspect what is happening at a lower level. If some containers are randomly killed off and there aren’t any application logs for why that happened, you can inspect the stopped containers to get more information about why they were killed.

Fargate-backed tasks

With Fargate, Amazon handles all aspects of figuring out where are how to run your task. Given how many resources a task and its containers need, Amazon finds somewhere to run it among their large number of machines. This means that there is no idle cost for running EC2 instances while you are not running tasks, and you don’t have to worry about whether or not your EC2 cluster has enough resources to run all of your tasks. Fargate isn’t as “serverless” as a FaaS platform like AWS Lambda, but it does hide a lot of the complexity of how and where to run your containers.

Fargate does has a few drawbacks. The service is fairly new, so some integrations like scheduling tasks through Cloudwatch Events are not supported yet. You can also only schedule Fargate tasks from the US-East-1 region.

The biggest drawback in my opinion is that debugging failures is a little harder. There is no EC2 instance for you to go SSH into and poke around to see what happened. We had an issue where some containers in an ECS Task where being SIGKILLed by the host. By being able to SSH into the host that ran the container, we could use commands like docker inspect and other docker cli tools to debug the stopped containers.

Which Should I Pick?

For now, EC2-backed tasks have proved to us to be a nice middle ground between running applications on EC2 instances directly and moving towards the serverless side of things.

In the future I could see us migrating to Fargate-backed tasks once the team is a little more familiar with Docker and containers in general.