On Microservices and ECS @ Wrapp

Published in

Wrapp Tech

7 min readJul 14, 2016

This post is about how we run a microservice architecture at Wrapp. I will talk about the challenges associated with it like service orchestration, availability, scalability and service discovery and describe how we solved it. It will also discuss our transition from our own home-built solutions towards AWS EC2 Container service and how it simplified our infrastructure stack. Lastly, I will talk about the challenges that are yet to be solved and how we plan to tackle them.

At Wrapp, we follow a Microservice architecture and we use Docker container technology to leverage this architecture. We have our entire platform hosted on the Amazon cloud. To this date, we have roughly 90+ services and 50+ EC2 instances. Our original architecture was monolithic until we realized that it was becoming more complex and hence we had to break it out into smaller services and databases accordingly.

So how did we orchestrate our services? We decided to cluster our services into two groups. One cluster would host services related to the business itself — we call this the misc cluster. Here we run our user-facing API and other services related to our core business like rewards, notifications, emails and offers. The other cluster would host services related to operations around our infrastructure — we call this the ops cluster. You can expect services like riemann [1], logstash [2], sensu [3] and the like to run here.

In order to practically do this, we specified what services should run in each cluster via a configuration file in YAML that we called the runlist config. This file also contained details for each service’s configuration e.g the Docker image, version, environment variables, ports and volumes. They were named after their associated cluster and stored in a bucket in S3. The only part missing was the physical stuff — the computing resources where these cluster of services would be hosted. To tie the knot, we defined autoscaling groups, which were a one-to-one mapping to each cluster. So in our example, it meant having a misc-autoscaling and ops-autoscaling group.

To fit it all together, whenever an instance was launched in an Auto-scaling group, at boot time it would figure out which autoscaling group it belonged to, download the runlist from S3, read the file and finally run Docker containers for the images specified in that file. In addition, we also added each container process to Supervisord [4] to ensure service availability. This way we managed to do service orchestration. The following diagram conceptually explains the end result of our service orchestration procedure:

Service Orchestration @ Wrapp before ECS

The way we did deployments was via querying Serf [5] to figure out where the service was running, SSH:ing into each such machine, terminating the container and launching a new one from the new image. This was automated by bash scripts. Serf is a cluster tool from Hashicorp [6] that works on a gossip protocol. All our instances were registered into Serf and belonged to one large Serf cluster. A daemon that we built ran on all instances and figured out all containers running on that host. It then uploaded the service name, host IP and protocol for each container into Serf.

Service availability was done with the help of Supervisord which was basically responsible for ensuring that containers were automatically restarted if they crashed. However we could not scale services individually. Instead we had to scale the autoscaling group which in turn launched new instances thus allowing for more services to run in the event of load.

Lastly, we hosted our own Docker registry backed by S3 and hosted all our Docker images there.

All this was good, but not ideal. We had to deal with all this ourselves and of course like any other company we would like to focus more on our business. So along the way came AWS EC2 Container Service (ECS) to the rescue.

ECS solves a lot of this for us. The following diagram summarizes my thoughts on this and why we felt it was good for our use-case:

Transition towards AWS EC2 Container Service

As you can see, it does all the work for you. The only thing left was service discovery, which we still do ourselves and which is a kind of shortcoming in AWS ECS. The next part of my post will explain how we currently tackle this issue and I will present our proposal for a better and simpler way of addressing it.

Service discovery is the backbone of our microservice architecture. Currently at Wrapp, we have incorporated service discovery with the help of tools like Consul [7], Registrator [8] and HAProxy [9].

Registrator is a tool that runs on all our EC2 instances and its main responsibility is to register all running containers into Consul. It also registers the service’s name, port and protocol along with the instance IP address. Consul aggregates this information across all instances on a service level, hence it knows on what hosts a particular container of a service is running, the protocol it can communicate upon and the port it listens on. All this is vital for service discovery and therefore for services to talk to each other. HAProxy is the bridge that makes the connection between the two services. We used a tool called Consul Template that generates the HAProxy config from the service information in Consul and syncs this config over time. This makes up our service discovery implementation at Wrapp. The following figures explain it visually how it all works.

Service Discovery @ Wrapp via Consul, Registrator and HAProxy

As you can see in the left diagram, we run HAProxy, Registrator and Consul Agent on each of our EC2 instances. Registrator would feed in information to Consul about all services running on the host via the Consul Agent. The diagram on the right is just a zoom-in view of the diagram on the left. It describes how HAProxy’s configuration file is kept up-to-date about all the services running on the host. This is done with the help of Consul Template. An example configuration file is generated in this diagram (just to be a bit clearer) which makes it then possible to communicate with the service.

I think this approach is too complicated. Consider the following proposal for service discovery via internal (private) AWS Elastic Load Balancers (ELBs) without having Consul or Registrator as part of the solution. This assumes that the EC2 instances run inside a VPC.

Before you get to the costs, we’ve already thought about it and did the math. If we have around 90 services, that would mean 90+ ELBs and that would be too costly. Of course! But what if we could connect all our services to a single ELB?

We know that an ELB can be configured with multiple ports. We also know that if traffic ends up on an incoming port, that communication will be forwarded to the port listening on the host, which is attached to the ELB as part of the autoscaling group.

The ECS API allows your services to be defined in a way so that the service can be associated with an ELB and there is no restriction to associate many services with the same ELB.

Hence it is practically possible to at least direct traffic onto containers running on a host via the ELB. There are however two issues to this approach. To communicate to a service, you need to know the port at which it listens and secondly, there are no health checks on every port because the ELB only does health checking on a single port. This means when a service is down the ELB will timeout trying to connect to the service and that would cause latency ultimately resulting in a bad user-experience. Instead it would be better if the ELB would return back an HTTP 503. So yes! These are two major drawbacks.

But we also have a good fix for this! If we put HAProxy before the ELB, it solves all the problems. HAProxy has a way of providing health checks on a service level in its configuration file. This file can be generated via polling the AWS ECS API regularly as it is the source of truth for all our services including information about the ports and protocols they communicates on. The following diagram will hopefully give you a clear idea of what I mean to say:

New proposed way of doing Service Discovery via AWS ELB

Hence services now don’t need to know about the ports every other service listens on as HAProxy makes this transparent. Using the health checks it completes the missing functionality of the ELB. Thus using a single load balancer and with the help of HAProxy we can implement service discovery for our microservice architecture. At Wrapp, we have developed a small prototype for this and have tested it out — It seemed to work and we are now in the phase of moving it incrementally towards production.

Stay tuned on our next post to share with you our findings with this approach.

References:

[1] Riemann http://riemann.io/

[2] Logstash https://www.elastic.co/products/logstash

[3] Sensu https://sensuapp.org/

[4] Supervisord http://supervisord.org/

[5] Serf by HashiCorp https://www.serfdom.io/

[6] HashiCorp https://www.hashicorp.com/

[7] Consul by HashiCorp https://www.consul.io/

[8] Registrator https://github.com/gliderlabs/registrator

[9] HAProxy http://www.haproxy.org/

[10] Consul Template by HashiCorp https://github.com/hashicorp/consul-template

On Microservices and ECS @ Wrapp

Written by Jude Dsouza