Evolving our Infrastructure with Docker & ECS

Published in

Urban Company – Engineering

5 min readAug 10, 2017

With growing team and feature sets, the biggest challenge we started to face in terms of developer productivity was how to build and release functionalities in an independent and a consistent manner. To launch an application into production, a large context was shared between developers and ops engineers(configuration, operational characteristics, system dependencies, file paths, permissions, etc). Therefore, there was an increasing dependency on the Platform & DevOps teams which led to delays in the development cycle.

In our earlier infrastructure, we had a separate deployment stack for each service. This consisted of load-balancer, auto-scaling group, AMI, deployment scripts etc. Few of the complexities we faced in this architecture were:

Every service required their specific tools and dependencies which were included in AMIs.
We had to setup logging and alerting again on the new machines.
New deployment scripts were to be written with every service.

Therefore, provisioning a new infrastructure for a new service was time consuming and not scalable.

A couple of months back we relooked into our architecture and realized that we can heavily save the provisioning time and infra cost by having a single consistent architecture which manages all services. Docker was the way to go. Now, we have retired most of our old instances and their launch configs. Services now run as docker containers.

With this blog, I will be sharing our learnings which would be helpful on your way to dockerizing your production systems.

Build & Deployment process for individual services

With docker, now we have a single deployment script for all services. Most of the dependencies like (node, wget, td-agent) are kept as a part of a single base docker image. The service owner includes any other dependency in the respective dockerFile which is used in final docker build process.

Following diagram explains the build and deployment process for a service.

Overall architecture

In the new architecture, we run our services under ECS cluster. Some of the noteworthy points are:

Instances get attached to the cluster through auto-scaling groups having scale-up and scale-down policies set appropriately.
Services that get launched in the cluster are set to have a minimum and maximum count of containers. Auto-scaling is also set at service level with thresholds on memory or CPU utilization depending upon the type of service.
Each service has an active task definition which specifies the CPU units, memory, container to instance port binding, docker image path for the service.
We specify a dynamic host to container port mapping i.e. 0:xxxx . This means docker dynamically allocates a host port while application runs on xxxx port inside the container. With this we are able to deploy multiple containers of same type on same machine.

The diagram below shows the overall infrastructure under ECS cluster.

Reducing your docker image size

Keeping your docker image size under control has a lot of practical benefits. It makes things faster, more portable and less prone to breaks. Also, the build process takes lesser time which reduces the deployment time as well. With that in mind, let’s talk about a few steps that we took to reduce our initial image size from 1GB to ~250MB.

Remove legacy modules: We removed a lot of legacy modules which were no more in use. These were initially meant for client side rendering like Sass, compass, bower, grunt, Ruby, etc.
Keeping package.json clean: Using npm-check module, we identified a lot of unused node modules. By moving these modules under devDependencies and then running npm install — production during deployment, we were able to get rid of them.
.dockerignore file: Folders like .git, .tmp, bin, logs, test-run etc were added to .dockerignore file.
Club commands together: Since each line in dockerfile is committed as a separate layer in the final docker image, it’s better to club multiple commands together. For example:

`
RUN apt-get update -y \&& rm /bin/sh && ln -s /bin/bash /bin/sh \&& apt-get install wget curl -y \&& curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent2.sh | sh
`

How much memory and CPU does my application need?

It’s important to figure out the right set of resources needed by your application. We at UrbanClap have a parallel staging ECS cluster where we first load-test our application with the peak traffic and tune its CPU and memory requirements before launching it on production. Since AWS cloud-watch provides us with historical plots of Utilised memory(CPU)/ Reserved memory(CPU) (image 1.3), one can easily tweak these values with a central configuration file. For this purpose, we keep a configuration file which is used at the time of deployment. A sample configuration file which we use at UrbanClap has been provided here for convenience.

Logging and persistence

Since docker runs its containers as a separate machine, therefore its logs are not visible to the instance running the container. Once a container goes away it also removes all its logs. To solve this, we created mount points from our instance into container before starting one. These logs then gets created on the instance thereby keeping a persistent storage. We further pipeline these logs to our EFK stack and s3 through td-agent clients and crons respectively.

Code revert

When things break it's important to be able to rollback your changes to the previous build as soon as possible.

Under the new architecture, every deployment for a service releases a new version of task definition which has a new docker image linked to it. With a single click in ECS management console we can revert back to a previous version of task definition. New docker containers get launched and old ones are terminated automatically.

Apart from removing the complexity in architecture, docker has also helped us in other ways:

Uniform build and deployment process across services and across environments(dev, stage, prod).
Isolation of applications: Each container has its own resource isolated from other containers. For example, one version of node can be used by one application and other for other application on the same instance.
No more AMI changes: Developers can install their required packages on their own and give a final docker image for deployment.
With a couple of docker build and run commands anyone(QA, product, Devs) having a docker image can bring the service up and running regardless of OS/platform they use.
One time logging and alerting at instance level.
Version control of docker images and easy rollback.

Bottomline

Docker comes with a lot more benefits than shared in this article. It helps us achieve a faster development, application isolation and consistency among service deployments, therefore leading to a much better devOps experience.

It has now become easier to support greater number of developers and application scale with minimum dependency on platform/DevOps.

With all these architectural changes and its benefits, we feel more bullish on separating out our core services and getting them deployed as docker containers.