Kubernetes Migration To a Service using Canary Pattern

Itzik Dabush
BigPanda Engineering
3 min readNov 8, 2021

At BigPanda we have dozens of microservices, written in Scala and Node.js and communicate with each other via Kafka, HTTP, and RabbitMQ and most of our services were running on AWS EC2 instances.

AWS EC2 instances were great, but as the company grew up and the data rates increased dramatically, we needed to scale rapidly, and we wanted to improve the scalability and velocity of our services and our application, so we decided to migrate our microservices to Kubernetes.

Kubernetes Logo

The options for the migration

BigPanda’s platform is a real-time machine learning-based application that aggregates multiple data sources and alerts into specific, manageable incidents for enterprise IT teams.

Once we’ve decided to migrate a service to Kubernetes and implemented all the changes needed in the service to support the Kubernetes environment, we had to choose a way to migrate the service to the new Kubernetes environment.

Since our platform is real-time and it is the eyes of our customers over their IT resources, maintaining zero downtime was our goal. And yet, a challenge as well.

The service that I was responsible for its migration to Kubernetes, had multiple means of interfacing with other services in the pipeline including HTTP, RabbitMQ, and Kafka.

As we saw it, we had 2 main options to handle this challenge:

The first option was to perform the migration by scaling up the Kubernetes instances, rerouting all the different communication to the new Kubernetes cluster, and in the meantime scaling down the EC2 instances, leaving all communication with service unhandled for a couple of seconds during the rerouting.

The problem with this option was that we could not guarantee that the service on the Kubernetes cluster would be capable of performing its duties in the production environment. We did test its functionality in the stage environment, but we still were not sure, as there is nothing like the real production environment.

Which led us to the second option — Why not use the Canary Deployment principles on the infrastructure as well? This way, we could guarantee that no customer is affected by the migration.

The Canary Migration with the charm of Feature Toggle

Canary deployments are a technique for rolling out changes to a small group of servers as an initial examination. At first, you deploy the change to a small subset of servers, test it, and then you implement the change to the rest of the servers. The canary deployment is implementing your final test to your code.

And that’s exactly what we’ve done. After we mapped all the service resources and connections in our pipeline and application, we added a simple feature toggle in our code. A feature toggle is a mechanism that turns certain functionality on and off during runtime.

Our feature toggle decided which requests will be handled in the new Kubernetes cluster, and which requests will remain in the old EC2 instances.

Once we deployed the Kubernetes cluster and had the service running smoothly, we started to route the communication to our Kubernetes cluster via the feature toggle. After validating that no functionality of the service was affected by the migration, including end-to-end testing, we routed the entire communication to the new k8s cluster.

After a few hours of monitoring, we made our happy goodbyes to the old EC2 instances.

--

--