Lessons learned from migrating the Rapido infrastructure

Published in

Rapido Labs

4 min readMay 14, 2020

In the previous part, we explored how Rapido migrated its infrastructure to Kubernetes without affecting its services. In this final part, we will cover the impact noticed and highlight some of the lessons we learned along the way.

Impact

While there was some direct business impact resulting from increased stability, reduced time to deployment, and increased availability, there were a lot of intangible impacts seen.

Boost in productivity

The move to Kubernetes gave an immense boost to the productivity of the software engineers because the entire infrastructure was well applied to the engineering workflows.

Given the ever-expanding Kubernetes ecosystem, we could rely on the existing tools exclusively built for cloud-native software, thereby saving us the trouble of having to build these tools ourselves and reducing the general complexity.

For example, some tools let you set up and systematize the deployment and testing workflow for every developer in the team, while others let you manage everything right from letting you create CI/CD pipelines to monitoring.

This helped the engineering in multiple ways — the tools helped us solve multiple complex problems, customized the processes as per our requirements, improved our engineering workflows, reduced release cycles drastically, and improved the software quality from development to production. It is worth mentioning here that most of these tools are open-source and free to use.

Future-proof Technology

One of the key reasons why we set out to migrate our services to Kubernetes, was to measure up to the latest technology standards used in the market. Keeping that in mind, we wanted to make the best of this exercise and build an infrastructure that would not require any kind of major changes for the next few years.

Kubernetes was the sweet spot for us given that all the major cloud vendors support Kubernetes and even provide out-of-the-box solutions for it. This means that even if we had to switch our cloud service provider in the future, the transition would be rather seamless.

Moreover, given the buzz around Kubernetes, it is safe to say that it will be around for the next couple of years. This means we as a company can focus on building more services and improving our services without having to worry about the infrastructure.

Cost-effective

Given the nature of our services which are seeing a massive surge in the peak hours and a normal consumption in the remaining time, we were earlier forced to pay for higher resource quota or face service unavailability during the surge. Either way, that was an unreasonably expensive option for us.

With Kubernetes, owing to the auto-scaling and load-balancing features, we were able to scale up the applications and the required resources during peak times.

Similarly, we could scale down the infrastructure during less busy times depending on the number of requests. This ensured that we managed a higher utilization of the services and saved us from paying for resources that we did not need.

Even the general computing needs of the platform are relatively economical for us given the scale of our application. Thus, making this the most cost-effective solution for us.

Attract talent

Another unexpected side-effect of moving to a technologically advanced infrastructure was that it acted as a magnet for good software engineers. Given how developers carefully evaluate the technology stack and processes before coming on board, this was appealing to our potential applicants.

Additionally, it gave a huge morale boost to the developers already working with us. They were excited about the opportunity to learn and work with such advanced and cutting-edge technology. This, in turn, improved the overall satisfaction of the engineering teams.

Learnings

While we were fortunate enough to have just a few issues during and after the migration, there are some learnings that we got during this exercise.

Reduced downtime during migration

During the migration, our focus was to ensure minimum downtime to our users. We managed to achieve that using NGinx load balancers which would route the traffic into the Kubernetes environment in a phased manner. This gave us the control of being able to roll back if things didn’t go as expected.

Managing two production environments

Till the complete migration was completed, we were deploying to the existing infrastructure and Kubernetes to make sure that it would have a backup. Given that the methodology and the amount of time to deploy was different, it was sometimes difficult to precisely control when the new deploy would be fully available to all requests. We needed to account for the possibility that different versions of the code could be active for a significant amount of time. This meant that every deployment had to be backward compatible.

Optimizing the infrastructure

We were attempting to strike the right balance between the pod size (CPU and memory), the number of application processes, and the number of threads per process. We realized this inherently differed based on the type of request — it was different for long-running requests and different for short requests. Similarly, CPU-bound requests would behave differently compared to IO-bound requests.

Training and upskilling

Considering that Kubernetes was new to some of the team members it was important to get them upskilled to manage the new infrastructure. Also considering that the migration was already in progress, it was very crucial to support the learning team members.

This is the final part of an on-going series that explores the transformational journey at Rapido. In the first part (link) we explored some of the challenges that Rapido was noticing with its platform, and in the second part (link) we saw the process of migrating into a highly advanced and scalable infrastructure without affecting the services.

We are always looking out for passionate people to join our Engineering team in Bangalore. Check out the link for open roles: https://bit.ly/2V08LNc