Yet Another Kubernetes Migration Journey — Part 1

Published in

upday devs

4 min readSep 11, 2019

Most of us would have read or came across a blog post or two talking about migration to Kubernetes as an infrastructure strategy.

This post is something similar, but attempts to talk about it from both, a Business and an Engineering point of view.

This is a story in 2 parts of upday’s Kubernetes Migration Journey.

Part 1 talks about why we decided to adopt Kubernetes. It is targeted for Decision Makers.
Part 2 talks about our Preparation, Execution and Learnings from our Kubernetes Migration . It is targeted for Engineers.

Why did upday decide to adopt Kubernetes?

Everything has a history!

No so long ago, there was an Engineering team that engineered the backend and data APIs that powered the awesome upday application.

Like any sane startup in 2015, they harnessed the cloud (AWS) for running their workloads. They gradually adopted all the best practices, adopted a proper DevOps culture, developers automated their Infrastructure (using Infrastructure as Code tools like Terraform), used AWS Services and even adopted rudimentary application orchestration systems like ElasticBeanstalk to run the workloads.

Around mid 2018, their trustworthy orchestrator ElasticBeanstalk started to show the signs of age. With more and more services coming in, taking between 10 and 30 minutes for full deployment rollout, lack of extensibility for observability, slow auto-scaling during thundering herd scenarios, the team realised that they have to refine their current system or adopt alternatives.

Meanwhile, the business grew and management started to expect that the teams should deliver more business value, the team also realised that they were unable to do that as a considerable part of their time is spent on maintaining their infrastructure.

At the same time, the Infrastructure standards have evolved and the world had started to move on to mature, faster, feature-rich, hyper-scale way of running workloads. Buzzwords like Docker, Kubernetes, Istio, etc. were all the rage.

Therefore the team finally decided that it is time they revisit and rebuild their operations expertise.

It is important that everyone is clear on why the Infrastructure has to be agile!

Early 2019, the complete Engineering team, now comprising of an experienced DevOps person, Backend Engineers and CTO listed up the top 3 things that needed to be addressed in Infrastructure of upday.

They came up with this:

Optimise for quicker auto-scaling (seconds vs. 10s of minutes)
Availability of Managed/Native features
Optimise for costs

Optimise for quicker auto-scaling

upday is a personalised news app. Whilst most of our content is cached heavily in a CDN, some of the personalisation is done via a synchronous HTTP call and has to be handled and scaled as required.

Sometimes, when we deliver breaking news via push notifications, it potentially will trigger a few million requests by the users. Depending on the target market, time of the day, interest of the user and of-course the message of the news, the volume of the hit differs.

AWS auto-scaling is utilised, but it takes too long to become operational. In best case scenarios, it takes up-to 10 minutes to have a completely ready machine. Pre-warm up of AutoScalingGroups (ASGs) are either not fast enough or cause extreme over-provisioning.

Optimising AMI boot up time with pre-baked AMIs helped and brought down the time to a few minutes, but it still doesn’t beat the startup times of Docker containers.

At this time, the team figures Orchestrators like Kubernetes are able to scale up multiple containers in a matter of seconds.

Availability of Managed/Native features

The Engineering team’s primary expertise is Java (and Kotlin). So every-part of the stack was engineered using a Java stack based approach.

The team figured out that many parts of their current stack could have been avoided and better run with Infrastructure components like Nginx, Haproxy, etc.

They also realised that costly development effort to validate access tokens for each request could be offloaded to be validated at the Edge and transparent to the application.

It was only the tip of the iceberg when the developers realised that Service Meshes help in Traffic Mirroring, Shadow Testing, A/B testing, Distributed tracing without implementing complicated code to support that.

Orchestrators like Kubernetes abstract the configuration of infrastructure components to simple YAML files.

While some of these features could also be used along with ElasticBeanstalk and AWS, making infrastructure components production ready involves a steep learning and optimisation path.

Optimise for costs

When running Infrastructure, costs come in different forms. There are direct costs charged by the Infrastructure provider. There are indirect costs when engineers spend days and weeks optimising, maintaining and running the infrastructure.

In our case, the time spent to onboard a new software to be run on ElasticBeanstalk will entail a few hours. It comes down to few minutes with a modern orchestrator like Kubernetes.

Again, saved time of developer productivity = more time to deliver business features = more €$₹ for the business.

Also, when running 100s of tiny, small and some large instances the costs quickly add up. While AWS EC2 Spot-fleets could address this to an extent, lightweight containers orchestrated on larger machines in Kubernetes seemed even more attractive on basis of performance, usability, rapid deployment and hence cost.

—

Therefore, it became clear that by adopting Kubernetes, we could address all the challenges described above and potentially benefit even more with its ecosystem.

And thus started our journey to Kubernetes.

The key takeaway for us and anyone thinking to adopt Kubernetes is:

“Everybody does it” should not be a reason to venture into this. There should be a real engineering need!

Continued on Part 2, is our Preparation, Execution and Learnings from our Kubernetes Migration.

Yet Another Kubernetes Migration Journey — Part 1

Why did upday decide to adopt Kubernetes?

Optimise for quicker auto-scaling

Availability of Managed/Native features

Optimise for costs

Written by Shyam Sundar C S