Lessons learned in containers

Jon Vines
Jon Vines
May 2, 2018 · 7 min read

This is the third post in the series discussing some small lessons learned whilst developing on AWS. For the context of the series and what we’re looking to achieve, please read the introductory post An introduction to some small lessons learned developing on AWS.

This post is going to focus on building and deploying containers on AWS using Docker and ECS. First, we’re going to take a look at what containers are. We’ll follow this up with some of the options currently available for deploying containers in AWS. We’re then going to take a look at some of the lessons we’ve learned whilst developing applications in containers on AWS.

What are containers

Perhaps surprising is the fact that containers themselves have been around for a very long time, since 1979. However, it was 2013 with the introduction of Docker that container usage became wide spread. The Docker site starts with a very good introduction into what is a container.

A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings.

A container is an abstraction at the app layer that packages code and dependencies together. This allows us to isolate the software packaged within the container from it’s surroundings, lower costs, have better resource management, a consistent environment and the ability to run anywhere.

Options in AWS

AWS currently offers three options for running containers: Amazon ECS, Amazon EKS and AWS Fargate. AWS also provides a private container repository know as Amazon ECR.

Amazon EKS is AWS’ offering of Kubernetes as a Service. At the time of writing, Amazon EKS is still in preview. This means you have to request access to the platform.

AWS Fargate is a serverless approach to hosting and running containers. This looks a great option, but is not something we have explored yet. Fargate is available in the Virginia region and more recently Ohio, Oregon and Ireland.

Amazon ECS is AWS’ container orchestration service. This means we register EC2 instances with the ECS cluster which then manages the lifecycle of our containers. It has deep integration with other AWS services. Given the availability of EKS and Fargate, this is the option we took.

Another option we did consider was to run our own Kubernetes cluster. The reason we didn’t go down this route is we didn’t want to manage the infrastructure of our container orchestration platform as this was not, and should not, be a core competency for the team getting set up.

A good overview of Choosing your container environment on AWS with ECS, EKS, and Fargate is provided by Nathan Peck.

How did we start

Our first foray into using containers was with Windows containers for our applications written using the .Net 4.6 framework. This worked OK, but we were beset by some early issues. Initially, the Windows image were pretty big, with a container size of approximately 1.5Gb. We also had issues with the AWS agent registration via Terraform which meant we struggled to get a cluster up and running and auto-scaling in an expected manner. Once we had the ECS cluster up and running, we then saw some inconsistent performance from the containers themselves. Some would freeze on startup and would require a redeploy of the service or a restart of the cluster instance.

It’s important to note that we experienced all of these issues whilst containers on Windows on AWS was not in general release. A lot of the aforementioned issues may well have gone away now. Experiencing these issues resulted in us migrating our code base to dotnet core 2.0. This opened up two quite exciting opportunities, the ability to host our containers on a Linux image and to build our applications using BitBucket pipelines.

Microsoft is obviously making some humongous gains within Linux and integration to Linux with the bash subsystem on Windows. Given the appetite shown by Microsoft, we gained huge confidence in our decision to migrate our code base in this direction.

What does it take to get a container running in ECS

If you’ve chosen to go down the ECS route, like us, the first step is to build an ECS cluster for the container to run in. We managed this through Terraform, specifying the AMI ID used by default when creating a new cluster via the dashboard. We included this as part of an auto-scaling group which triggered off CPU and memory usage.

An example of the Terraform script can be seen here:

Once our cluster is up and running, we can start to think about deploying our first container service. There are two steps here, first we need to deploy a container image to a container registry. Second, we need to create a task definition that is referenced by a service within the cluster. The task definition contains all the information we need to run the container and the service uses the task definition to run the container. One other thing to be ready for is whether your container needs load balancing. In this instance, you will need to specify an ALB to manage this. More on this later.

Publishing Docker images to ECR

Before we get to running our containers, we need to publish them to a container registry. This can be any public or private registry. For us, this was ECR. Initially, we had a separate registry for staging and production. This introduced additional cost and complexity that isn’t required. A good tagging strategy for your uploaded Docker images will suffice here.

As part of your build pipeline create the Docker image as normal. You’ll then need to tag your Docker image with a version number, log in to AWS ECR and push to the registry. An example approach can be seen here:

Building a Task Definition

Once we have our image in ECR, we can start to define our task definition. Some lessons we learned as we went along related to host port assignment, logging and log groups and targeting the correct Docker image from the registry.

To give context to the following points, an example task definition can be seen here:

If you want to load balance traffic, you must set the host port to 0. This is important for a number of reasons, the first being it allows us to enable load balancing. Secondly, it means we can run more than one instance of the container on each underlying ECS host. Another important thing to remember here is that if you enable auto port assignment, you must add the ephemeral port range to the security group ingress rules so that traffic flows in as expected.

When it comes to logging, it is best to utilise CloudWatch log groups to capture the logs produced by the Docker container. You will need to create the log group before you set this, but this can easily be handled inside Terraform. Once done, searching through the logs of containers as they are created or destroyed comes much easier. This also prevents you having to log on to an ECS host to debug the container instances. As an aside, a very good resource for Docker logging techniques can be seen in the post To Boldly Log by Ryan Davidson.

Finally, an important note to remember is how we reference the Docker image to be instantiated from the task definition. Initially, we always targeted :latest, but this meant we ran into some conflicts when deploying between different environments. Our recommendation is to be explicit in the version you are targeting from within the task definition.

Getting the container running

Once we’ve deployed our Docker image to ECR, and defined it’s running parameters via its task definition, we can define the service which allows our container to run. The service definition has two key features, specifying the task definition of the instance we wish to run, and determining how many instances of the container we want running. We also specify whether this should be in an auto scaling group of its own. Note, this is different to the auto scaling group of the host ECS cluster.

A final challenge we encountered came to deploying new container services. This actually requires two steps, firstly, creating a new task definition to point to the newly deployed ECR image. Secondly, updating the service to point to the new task definition. Updating the service triggers a rolling deploy of the new container instance as defined in the task. A first step for us here was to use the script ecs-deploy. This has initially worked well for us, but may be too simple an approach as our container environment grows. We’d also like to manage canary releases prior to a full roll out in finer detail.

In conclusion

We’ve learnt quite a bit on our journey of building, deploying and running containers in AWS. I believe we’re well set up to consider the new offerings, Fargate and EKS. With Fargate, we can find very simple solutions to our basic container deployment strategies. And with EKS we may achieve the more advanced scheduling we’re looking to achieve. With some custom tooling in our deployment pipeline, we can also extend our current scheduling capabilities using ECS. The container ecosystem in AWS means we can tailor our approach per container instance type which gives us great flexibility.

A lot of the challenges with containers rests in the scheduling and placement of where they actually run. At a recent AWS Builders day in Manchester, Abby Fuller gave a really good talk on Advanced Container Management and Scheduling which I’d highly recommend.

Overall, one of the great redeeming features of containers is the ability it provides to experiment and push new features. It also allows us to scale out based on the current demand automatically with very little input from the team.

Jon Vines

Written by

Jon Vines

Software Engineer and Team Lead at AO.com. Aspiring DevOps practitioner. Thoughts are my own.