Vitcord at scale

Our up and downs on migrating to micro-services

Published in

Vitcord Blog

4 min readNov 7, 2019

Recently, we have been sharing how are we crafting software and systems at Vitcord. In previous posts we have shown our direction towards micro-services, and our choices on the architecture of our first micro-service, in charge of our Notification System.

This time we want to focus on the bones of our system, through a series of posts. This first one will sum up the previous posts, the legacy code we started with, and the direction we took to solve the problems we faced.

A bit of context: From legacy to our goal

When we took on this project, the whole system relied on a single instance running on Heroku, plus some workers in charge of running the jobs enqueued by the API. And worse, since the instance and Puma server running Rails were not properly configured back then, that was a bottleneck when the growth team expected campaigns with high impact.

That caused us some trouble at the beginning and chased us in our worst nightmares for some time. Until we decided we needed to cut that. We needed to be reliable, we needed to be a strong platform. The whole team got involved on the road to become a highly available social network.

Therefore, we started to identify where were our bottlenecks, and where could we place our new and shiny infrastructure.

Bottleneck detection and first microservice

So, we started to look at our logs and APM’s to check out which routes on our monolithic API, were becoming stuck. Then, we discovered (not a surprise) that our Notifications system, was starting to saturate at some points, mostly given certain types of transactional notifications.

Since it was isolated, we decided to make our first approach to a micro-service oriented architecture, while moving out of our legacy monolithic backend. We thought about using different languages for the implementation of this new service (Elixir,Golang,Kotlin…) but since we are not a lot of people (for now) on the Tech Team, and given that Android team already knew Kotlin(a language that runs over JVM), we started an MVP with Kotlin + Spring + DynamoDB, to achieve a reliable solution.

We also wanted to walk away from Heroku, because the costs of maintaining a single app started to become too high. So while some people where in charge of the microservice, other members of the team started to think and research a feasible way to set up the new infrastructure, that would host any microservice that we could want to add.

Where do we go?

We were already using Amazon Web Services to host some development machines, and other instances to do some background processing. Most of the people in the team, had some experience with it, so after investigating the options in other platforms, we decided to explore its services.

One that particularly draw our attention, was AWS Elastic Container System, able to run docker containers as services, abstracting the numbers of instances behind a load balancer per service. This AWS service lets you take care of the instances(EC2 mode), or go one step further, and just allow ECS to control the instances(Fargate mode). We develop our applications inside docker containers, and with the years we have only seen benefits by using them, so we found a good fit here.

Unfortunately, managing this with Fargate mode, can be a little bit more expensive, and you lose some control of the machines, so we wanted to use the EC2 mode. This, also forces you to set up every single element on AWS by yourself… How can we overcome this, being just 7 engineers? How can we keep being agile?

Terraform to the rescue!

Terraform is a declarative language to represent infrastructure as code. This means you can version control your infrastructure, keep track, and automate infrastructure deploy. No more clicks to deploy and modify infrastructure, taking out any human error through the multiple configuration steps each service requires.

We managed to achieve our desired infrastructure, first by experimenting with a single service, the Notifications service we mentioned above, on AWS ECS. Once we were sure it was stable, we started to think on how we could extend this to isolate the internal network from the public network, protecting our systems from external and unwanted access.

Finally, using the composability Terraform allows you to have while building your infrastructure, we created a reusable module that contained everything we needed to run ECS services inside our private cloud, providing us the safe space to deploy our services.

This is a high level view of our current infrastructure. We are able to set clusters of services (a.k.a docker services), that have a load balancer for the entry point. Our services have access to each other, but they are isolated from the public internet by using an AWS VPC, and private networking, but they still can reach the public internet.

This is the setup for the normal services where we tipically have a server, but we also managed to set our Ruby on Rails workers by experimenting with AWS Fargate, an easier approach where AWS manages the machines, and you can take out the load balancer, saving costs on services, since they connect to the instances, and don’t require any incoming connection.

That’s it, we are out of Heroku. We managed to maintain most of the features we liked from Heroku, while setting a highly available, scalable, resilient infrastructure, with automated deploys. Do you want to know how?

Well, that is another story… Clap if you liked the content, and stay tunned to learn how we achieved all the features that make our day to day easier!