Making a Monolithic Rails App Scalable using Kubernetes

Published in

Herolens Blog

7 min readNov 8, 2018

Since the very first days of the company and over the past years, our dashboard has been a core application that has allowed us to manage clients, administrate information and set up campaigns. Along the years, development continued as we needed to deliver new features and, today, it happens to be our main application — in which a lot of people worked. Recently, as the company grew, we found ourselves in the need of improving our application in terms of infrastructure in order to achieve scalability and get rid of the limitations that come from having a monolith. We took a month to make the move and, even though we were familiar with the Kubernetes environment, making this migration was all of a challenge. So we thought it would be a great opportunity to share how we did it.

Technology stack before the migration

There weren’t many changes in the app’s architecture throughout time. Our stack consisted of a main Rails application running with Puma that consumed a Postgres database, Node bundles running inside the rails application, Sidekiq running multiple processes for report calculation, a Redis server used by Sidekiq to manage the jobs’ queues, a mounted file system where reports were written and consumed, and an NGINX to manage SSL and redirect requests to our rails application and to the Sidekiq dashboard.

Previous limitations and weaknesses

Scalability: No possibility to run multiple instances, without further configuration.
Deploying with fear: With the previous architecture and deployment system, deploying a change that didn’t start would take down the previous revision and result in downtime. Kubernetes would allow us to confidently deploy new features.
No self healing: If any service of the dashboard stopped working we would have to manually restart it.
Excessive use of resources: There was an excessive use of RAM and database connections, which was later unveiled by Kubernetes — one of the most common problems when leaving the monolith.
Several providers: We were moving our infrastructure to Google Cloud Platform.
Other configuration issues: The current configuration was quite a mess and it was not clear how to properly make a change, without mentioning that it implied several risks.

Discussing what to do

Before migrating the application to Kubernetes, we noticed that there would be some problems. On the one hand, the rails architecture and, on the other, the way we were handling some services (such as the report generation), could be the cause of future issues. We agreed on the fact that there were several features that needed to be re-written, but decided that we had to split the application into microservices before changing any feature, keeping the exact same behaviour, and that would allow us to make improvements faster later on.

Far from being concerned about Kubernetes, our challenge was to plan a proper migration process and a suitable architecture that remained compatible with the current behaviour of the dashboard. We listed the target architecture and wrote down, step by step, how we would migrate each service and in what order.

Taking the database out of the monolith

The first step was to migrate the database. We chose to move it on to a managed database using the GCP CloudSQL service. The reason we did this is because CloudSQL solves problems like automatic backups, replicas, high availability and metrics by default. We created a database in the same geographic zone as the rest of the application to avoid high latency, since after this change it would be running in a separate virtual machine.

Given that this part of the migration implied downtime, it took place during non-conventional working hours. In order to carry this out, we followed these steps:

Turn off the dashboard and redirect all the requests to a maintenance landing through NGINX.
Perform a backup of all the data and load it onto the new database hosted in CloudSQL.
Change the credentials in order to be able to connect to the new database.
Add the server IP as an exception in the database firewall.
Start the server and take the landing down.

Comments on this process:

The whole process took about 20 minutes.
We performed a dry run of the whole process before effectively doing it to solve any compatibility issues that could came up.
There was an easy way back in case anything went wrong.

Moving the app to Kubernetes

The first step we followed was Dockerizing the entire application and running it in a local environment. This was the one part in which we had to come up with a clear image building pipeline, considering that we did not only have to run Rails migrations but also to compile the node bundles using Webpack. After evaluating different options, we decided that it was fine to keep asset compilation and database migrations as a step of the image building, but we kept Webpack compilation outside and did it before starting to build the image and add the files later in another step. This was due to Webpack being too heavy and running better outside docker, and also, this allowed us to deploy a new change in rails without touching the bundles.

After all these issues were solved, we set up a server running the production configuration inside a container, and later we deployed it to Kubernetes and took some time to check that every feature was running correctly.

Another challenge was the connection to the database. When running services on Kubernetes, pods move among different nodes, new nodes appear and old ones die so there’s a constant change of these nodes’ IP addresses. On that account, connecting to the database through a firewall rule was no longer an option. As a workaround, we implemented a CloudSQL proxy as a sidecar: each pod started to run a container to hold the app and another one to run the database proxy which bypasses the firewall. This proxy listens on a socket and redirects requests to the database and since these two containers run on the same network, the app has access to the database “on localhost”.

Reporting service

To migrate the reporting system, we wanted to keep it simple so we deployed the same image generated by the rails app since these two applications share the same codebase. Each deployment has its own execution command specified in the Yaml file. We avoided compiling the node bundles because, at this point of the application, they were useless. That saved us time to deploy and resulted in smaller images (another point to keep Webpack compiling outside the image building flow).

Shortly afterwards, we added a Redis service and set up Sidekiq running on Kubernetes, as simply as it sounds. However, one of the biggest challenges we had to face was the use of shared storage among certain servers. On the one hand, we calculated reports using Sidekiq and, on the other, we read them from the dashboard. We evaluated the possibility of putting a new report system first in the roadmap before the migration, but in the end decided to make the move all the same and conserve the same behaviour. In order to accomplish this, we used a NFS persistent disk in Kubernetes, managed by a NFS server which was accessible from every pod. Each pod that needs access to this file system uses a k8s persistent volume claim that points to that server. We certainly have planned to re-write our reporting system by that time, but having the architecture split into microservices would make it easier to deliver this new feature.

Migration

The migration itself to the new dashboard was smooth and transparent for the users as we simply changed the DNS records in Cloudflare and pointed them to the new application. We had faced issues that caused some headaches while leaving the monolith. The first one is the amount of connections to the database remaining open, coming from the all the threads that Puma runs. This is something to be really careful about, given that the maximum amount of connections allowed by CloudSQL are given by the number of cores and RAM assigned to the computer, and this number doesn’t scale that much, so it’d be unacceptable to just “throw RAM” to the problem. This was solved by implementing a PgBouncer server and using it as a service to access the database. With this change, we had to move the CloudSQL sidecar to the bouncer pod.

The second issue was an excessive use of memory by ruby. We made improvements on this by upgrading the Rails version to one built with jemalloc. The fact that we were using Docker containers played an important role in us easily solving this issue.

Final thoughts

In January we started migrating our services to Kubernetes and today we can say we have migrated the great majority of them. These are our final thoughts after this experience:

Scalability: We are now able to add more servers as the demand increases.
Spending: Having splitted the whole app into microservices gave us a great insight on how much we are spending on the different parts of the application and why.
Rapidly adding new services: After making this move, for example, we were able to deploy a canary version of the dashboard running production configuration within seconds. It really makes our lives easier when it comes to delivering new features.
Stability: We benefit from Kubernetes health check, self healing and revision management.

Did you face a similar experience? Did something different? We’d love to know! You can write us at engineering@herolens.com

Interested in becoming a Hero? Check out our open positions!

Making a Monolithic Rails App Scalable using Kubernetes

Written by Juan Martín Pascale