Making the case for cloud only
How we persuaded the Financial Times Technology department to move everything to the cloud.
Moving the FT’s infrastructure from hybrid to “cloud only” was a large, multi-team, multi-year, multi-solution, multi-bloody-difficult project. This is the story about how we got so many parts moving in the same direction and the prize we got at the end.
The Financial Times used to have hardware and services running from data centres in New Jersey and the outskirts of London (Watford and Park Royal). Looking after your own kit is expensive and time consuming. The costs of hardware and software support and licensing, and the risks in terms of cyber security and reliability of ageing systems were not things the FT Technology department wanted. However, moving all the capabilities those older systems gave the FT (such as our core content management system, and publishing and distributing the newspaper) was not a very appetizing prospect.
Large technology projects are hard. Almost all the teams in FT Technology needed to be involved in (re)moving those services. Coordinating, cajoling and collaborating across so many streams of work was a huge task.
In the summer of 2017 we talked with the then CTO, John Kundert, about the next step up from our existing “cloud first” strategy. Something simple and easy to communicate was needed. For a short, happy, interlude I genuinely thought I was a visionary for saying “how about we call it Cloud Only”. 20 minutes later I had googled the phrase and felt considerably less clever. I was a long way from the first to use it, but the CTO liked it, so we were off. JK wanted a deadline and so Cloud Only became Cloud Only 2020.
If we wanted to get out of looking after our own equipment in data centres someone would have to be willing to pick up the unglamorous task of badgering a lot of people for (potentially) years.
Sarah Wells (Technical director of Operations and Reliability in FT Technology) has spoken about large projects needing clarity, communication and empathy if you want other people to share your goals and help you achieve them. Let’s break that down a little:
- Why do it (and why now)
- What is the finish line/how we know we are done
- Do far more than you initially think you need
- Use simple messages, slack, email, even posters and more
- Use nudge theory
- Use E.A.S.T. — Make it Easy (remove blockers), Attractive (show how it will be better for them or the organisation), Social (compare progress and get public commitments) and Timely (choose the right time to communicate)
Simple clear messages work well.
‘Cloud Only 2020’ essentially had all the high level information that people needed encapsulated in the phrase. It said what we wanted to do, and by when.
We also needed a simple metric so that everyone could easily track overall progress and, crucially, their own team’s contribution/ask.
We came up with an “instance” count. An instance was a lump of hardware (blade, router, switch, server, storage array etc), a logical chunk of storage or a virtual machine. Regardless of how big or small each counted as 1 instance. At the end of 2018 that ‘instance count’ of hardware and VMs in our data centres was around a thousand, by the start of 2020 it was still almost 600.
Our Cloud only instance tracker for 2020 that was on an intranet site and also spammed out to the department every month.
Working with our group’s director of delivery Thyle Carroll, Infrastructure and data hosting lead David Moor and Technical officer Mark Cubley, we did the sums on how much money we could save (north of £1million by our calculations). Then embarked on a PR offensive to “sell” Cloud Only to the rest of the department. With a clear bottom line impact as well as the security and reliability benefits, our message was obvious, and we kept repeating it.
We knew we could not do it without lots of help from lots of people. We set out to find owners of all those instances. No small task but vital to get conversations started.
Using presentations, formal OKR setting sessions, cross team dependency workshops and informal water cooler chats we started to get buy-in.
We created internal intranet sites, posters, published the metric as a chart (showing the number of instances each group still owned) and we begged for public commitments to reduce the number.
With a nod to the communication part of the clarity, communication and empathy mentioned earlier David Moor then chased and chased and chased again, using gentle reminders or requests for updates directly to the people doing the work to help keep the project front of mind.
Moving the business facing systems out of the data centres was a huge task that needed to be completed before any of the supporting systems (DNS, mail servers, network, back up tools etc) could go. In addition this was a case of all or nothing: if even one system used by the FT remained then we could not decommission the old kit.
Our ‘Simple’ diagram of Data centre services/hardware dependencies and group owners that was a huge poster on the wall of the FT’s head office.
Moving these systems took a truly global effort. I want to illustrate that with some name checks.
Some of the many that contributed included:
- The infrastructure Management team (under Carlo Beltran) worked with almost every department of the FT to move terabytes of data to new storage solutions.
- The Newspaper Distribution team (under Nikola Ivanov and Vasil Stefanov) moved subscription and billing systems to the cloud.
- The CMS team migrated our content management system to SaaS
- The Cloud Enablement team (Rob Godfrey) and the Network team (Mark Cubley) moved the FT’s connectivity infrastructure to AWS transit gateways
- Chris Hall’s Digital Print team made the worldwide production of the famous pink newspaper entirely cloud based
- The End user computing team (led by Chris Hayes) moved application & desktop virtualization to AWS.
Of course getting to the Data Centres to unrack and drag the hardware into the back of the recycling van had some last minute hiccoughs, and despite all the communication there were still surprises. One thing I have learnt for such large projects (and hope to put into practice) is to double the amount of comms I think I need, and then double it again.
Ultimately, a few days before Christmas 2020 (and having had help from the FT’s CPIO Cait O’Riordan — who is impressively handy with a Phillips screwdriver) a small team of volunteers stood in a car park outside a data centre in Watford sipping champagne from plastic cups in the cold, having delivered Cloud Only 2020.
And after all this effort, what is the prize? Was it worth it?
Well in a word — Yep! We have saved an awful lot of money in hosting, support and licensing. Getting off creaking systems/hardware means we have more reliable services and the impetus this project gave us to get off those older systems makes us a lot more secure. We never again have to go through the 3 to 5 year capacity (gu)estimating and associated capex pain for new hardware.
But, for me, there is something even more important than the money, reliability and security benefits. Cloud only means FT Technology’s outlook is leaner, quicker, more nimble. We think in a cloud native fashion, so the scope of what we consider possible is growing. And it is growing exponentially.