Kubernetes Migration Flight Manual

How we used a checklist to migrate our application from Heroku to Google Cloud

Amanda Holl
Kantata Product Development
4 min readSep 6, 2018

--

As you may have seen in our first blog post, we recently undertook the challenge of moving our integrations platform from running on Heroku to running on Google Cloud with Kubernetes. The entire process took approximately two and half months to complete and involved not only the work of our Integrations Platform team, but also close collaboration between two other engineering teams (approximately four engineers per team) and coordination with many other teams like Sales and Services that rely on our platform to help our customers succeed. So how did we, the Integrations Platform team, drive this migration while taking into account all the coordination and communication required? The short answer: a checklist.

What do I mean by checklist?

No, I am not going to give you the dictionary definition of what a checklist is; instead, I’m going to give you the definition of an Aircraft Flight Manual (AFM), which is

A document produced by the aircraft manufacturer containing detailed information on the operation of the aircraft. The AFM details the recommended aircraft operating technique for normal, abnormal and emergency operation together with the Aircraft Performance that should be achieved when the aircraft is operated in accordance with these procedures. https://www.skybrary.aero/index.php/Aircraft_Flight_Manual_(AFM)

And now you may be asking what a flight manual has to do with migrating an application to Kubernetes. Just like a flight manual specifies the “operating technique for normal, abnormal, and emergency operation” our checklist defines detailed instructions on how to proceed at each phase our migration process.

What did our migration checklist look like?

We broke our migration from Heroku to Kubernetes into three distinct phases. The first two phases could be completed with no disruption in service for our customers, and the last required us to take a brief period of downtime in order to complete a database migration. Our checklist involved approximately 15 tasks at each migration phase, or more in the case of the database migrations, as well as a pre-execution and a cleanup phase. Overall, the structure of our checklist was as follows:

  1. Pre-execution phase: Validating setup and preparing the kubernetes environment
  2. Phase 1: Migrating background processing
  3. Phase 2: Migrating the web service
  4. Phase 3: Migrating databases
  5. Cleanup phase: Simplifying our helm chart and removing old resources

So how did we arrive at these stages and determine what each phase included? The simple answer: Iteration. Fully migrating our platform to Kubernetes involved completing four full migrations, two for staging-like environments and two for production-like environments. Throughout each migration, we iterated on the checklist, adding items as we discovered new information, adding explicit command line instructions to reduce the overhead of remembering what to run, removing instructions as we simplified the process, and more.

What were our goals in developing a checklist?

We devoted time to developing and iterating on our checklist in order to develop a migration plan that was, at its core:

  1. Simple
  2. Repeatable
  3. Predictable

By simple, I mean that the checklist helped us ensure our migration required as few changes as absolutely necessary to mitigate the risk of breaking the architecture of our platform and causing a substantial disruption in service to our customers. We also wanted our migration plan to be repeatable, so anyone on our team could successfully run through the process for any of our environments, and achieve predictable results. Predictability was especially important, as we have internal and external people that rely our platform to be consistently available.

How did our checklist help us?

Often, despite our best efforts, things do not always go according to plan, and our migration plan was no exception. In the course of migrating our first staging environments, we learned that we had to migrate our databases earlier than expected, but we rapidly iterated on our checklist to provide simple and repeatable steps for completing a database migration for that environment and others. During our second migration, we encountered issues with the predictability of our process, and experienced confusion among the teams we work with about how long it would take to complete all three phases of our migration. In response, we worked to simplify the process and added explicit communication checkpoints at each migration phase in order to set expectations for various teams about the migration step being completed.

The checklist also proved to be a critical touchpoint between our Integrations Platform team and our DevOps team, whom we worked closely with to make sure that the migration was secure and stable. It was also an important checkpoint with our R&D leadership and external teams, such as the Customer Success Managers that we worked with, giving them confidence that we had developed a reliable process.

Just as airplane pilots use checklists to ensure that flights run smoothly and safely, we used a checklist to ensure that our migration from Heroku smoothly, safely and successfully landed in our new Kubernetes frontier.

--

--