The Big Migration
In the problem space of software development, few things I can think of are as daunting a task as migrating a living, breathing, heavily-trafficked web application from one environment to another. While there may be more technically challenging feats in software development, none require the kind of critical pre-planning, meticulous coordination, and flawless execution that goes into making the Big Migration a successful one.
I find this kind of migration far more gut-twisting than the launch of a new application. While that involves a similar leap of faith, you start from zero. Early production issues on a new system come without much baggage — you have no long-time users yet.
The mid-flight migration — the one that requires an intricate orchestration of chess-like moves to minimize any disruption to your now-reliant customer base — those make me sweat. No matter how much testing you’ve done on your new environment prior to the actual deployment, the distinct possibility remains that something doesn’t work out the way you were sure it would before the proverbial switch is flipped. And usually, it’s some things.
We’ve undergone two such migrations in eight years at DoneDone, one in 2011 (action shot above) and one in 2013. Both went without some minor hiccups, but thankfully, also without major ones.
Our most recent Big Migration was in November 2013. We moved our infrastructure from dedicated physical hardware on Rackspace to virtual servers on AWS. The migration was by no means a simple data and application transfer either. We simultaneously leveraged some newer features now available to us on the Cloud, like queues and e-mail as a service. We added new VMs to handle data caching and off-cycle jobs. There were significant moving parts.
Migration night was set for 8pm on Friday, November 9th. We typically release new feature updates during the week. But, for a big migration, I prefer the weekend. The weekend gives you time to rollback should some unforeseen catastrophe beckon. It also gives you a day or two to sanity check everything before the bulk of your users come back on Monday morning. You sacrifice a weekend of leisure, but that’s what you occasionally do for a labor of love.
That Friday morning, I had met with the migration team — myself, our systems administrator Ameer, and systems developer Paul. We had a step-by-step plan laid out with a surgeon’s precision. Nearly every single item on the migration list had been tested multiple times in the course of the past three months leading up to this seminal moment.
We’d all been involved in getting the different pieces of the engine ready. There was a precise order to the madness. Everyone was clear and confident on the order of events.
An 8pm migration means that much of day is spent anxiously waiting.
I wonder what others do in anticipation of their “big moments”. Some athletes have bizarre pre-game rituals. Wade Boggs, for instance, was notorious for them.
“He ate chicken before every game (Jim Rice once called Boggs “chicken man”), woke up at the same time every day, took exactly 117 ground balls in pre-game practice, took batting practice at 5:17 pm, and ran sprints at 7:17 pm”. https://en.wikipedia.org/wiki/Wade_Boggs
Many surgeons are also known for playing classical music while performing surgeries. Studies have shown that music lowers blood pressure and pulse rates. In a way, a big migration feels a bit like surgery: Routine at best, but always with the risk of something unanticipated.
On the night of our migration, the music of choice in the house wasn’t of the orchestral type. Instead, it was Europe’s The Final Countdown. Programmers are weird like that. Three months of effort that culminated in one final moment of proverbial flip-switching didn’t feel right under the sonic background of Bach’s Minuet in G minor. 80’s hair metal rock it was.
At precisely 8pm, we all sat at our desks ready to go down our checklist of tasks. “Let’s do this!” I yelled. As I began the first set of operations on our to-do list, Ameer proclaimed “Alright, DNS has been switched over!”
“Wait, what?? No not yet!”
In my haste, I must’ve yelled out the code word for the last step as I was beginning the first one. We’d sewed up the patient before cutting him open.
Ameer quickly reverted the DNS rules.
The rest of the evening went largely as planned. In retrospect, I’m glad we had that small gaffe at the very beginning. It cut the tension and it shook away our nerves.