Reducing Cloud costs. Covid-19 experience

Miguel Angel Coll
TUI MM Engineering Center
5 min readMay 4, 2020

--

I still remember me talking with David Garcia (TUI Destination Experiences IT Director) the weekend when the Spanish government declared the strict social distancing measures to control the Covid-19 pandemic impact. The week before also our colleagues in Milan were suffering huge restrictions and there, the business is already going to Zero.

Suddenly, we understand that it is not just a temporary or localized crisis. We are facing a global shutdown of the Travel Business with no clear end date. The upcoming weeks are going to be hard.

We have a mission

March 16th, Monday afternoon. I’m in a call with all the team including: Solution and Technology Architects, Scrum Masters, Release Managers, QA engineers, and Data engineers.

Guys, we have a problem — I said honestly — I just have a Leadership call with the Board members and direct reports. Covid-19 outbreak is affecting Travel and TUI DX massively. We are going to close all the operations on our 50+ countries on the next weeks. That means no business at all.

TUI Destination Experiences is an strong company. But, our CFO was very clear on his message: “in order to extend our life-span we need to protect cash”. Therefore, we have to find all the efficiencies possible in our IT budget. Cada euro cuenta ( every penny counts in spanish)

Guiding principles

The call about how to reduce our AWS costs is scheduled for Tuesday first in the morning. A small set of people agreed to start looking seriously on how to reduce our AWS costs following some simple guiding principles:

  • Reduce everything possible: Our MOTTO is clear, every euro counts.
  • Daily meetings: We will have daily calls to share the evolution and coordinate tasks.
  • Start easy, get pace: We can’t start stopping things randomly or going to far on reducing capacity. Is better if we start cleaning the first week and we push harder on the second one.
  • Log everything: We will create a wiki page to log every change we made. The aim is to track efficiencies but also to facilitate the reverse process when we re-start our operations.
  • Every one helps: Even we are part of different teams and areas, we will work together with the same goal. If someone need helps just rise hand.

Also, we agree to use our cost reporting system to track the evolution and define our first goals.

Let’s do it

The first thing we prioritise, because it has zero impact, zero dependencies on Business functions and potentially could give cost reductions, is cleaning the house.

Clean the house

If you are just like any other company, there is more money that you expect hiding in your couch. We came from some years where Digital Transformation was the key and delivering new stuff the priority. At those times, technical debt is not a priority. Now it’s time to pay the debts.

Some of the things review are: Unused disks, unnecessary snapshots, obsolete buckets, ghost EC2, unused systems, etc. We started looking at our cost control tool reports looking for improvements and there are a lot.

Downscale

After some days cleaning the house, we start talking with the different teams to review how to downscale their systems. Most of them have auto-scaling policies, but no scaling policy was prepared for a zero business situation. Therefore, we find a lot of possible adjustments, some of them meaning changing instance types, reducing ECS clusters, adjusting times to start and stop environments, etc.

Stop things

Avoiding costs by stopping systems looks pretty straight forward. The drawback is that, even with zero business, we are not allowed to stop systems on our own. That’s why we let this approach for the second week. People outside IT is now already facing the shutdown and conversations to look for efficiencies goes smooth. Some of the systems we shutdown at this stage were back-office tools, not longer needed with closed destinations. Also we stopped some systems that were in the process of migration to new ones. Before the out-break the plan was to do a soft migration protecting the conversion rate. With the business at zero, conversion rate no longer justify having two platforms running.

Look for the “hidden” costs

Sometimes, when we look for costs and cost reduction we tend to forget about “hidden” costs. I put in this category the costs you can’t anticipate because are dynamic (like traffic). In our case we identify also there some savings by changing some NAT gateways to VPC endpoints.

Architecture matters

Examples like the above mentioned started to rise. The requirements we used to take decision on the lasts months (years?) no longer sustained.We understand that we must review all our architecture decisions. Do we need HA for this solution in the current situation? Do we need daily backup? Do we need to store customer data over this period? Now is the time to apply thoughtfully the learning form Martin Fowler’s article the Elephant in the Architecture.

“Business value is vital but inconstant”, Martin Fowler

Opportunities

After two weeks we already have reductions close to 50% on our daily costs and new opportunities start to fade. We still keep working on improvements as every euro counts. We change our daily to a weekly meeting and start focusing on other tasks that potentially will give us more efficiencies. Even it seems self-contradictory, some of the opportunities we are looking now involve increasing our AWS costs. Is now the right moment to push some of the cloud migration projects?

The good thing of operating a zero business platform is that you can be brave on your decisions and go faster. In that sense, we have now the opportunity to do thinks differently. As I like to think about it: you can now work on an enterprise at an startup pace.

The real Team

For sure all the above described is not my merit or a one man task. As I said one of the keys for success was to work as a team with the same vision and the same principles. On the words of Dee Hock, visa founder:

“Given the right circumstances, from no more than dreams, determination, and the liberty to try, quite ordinary people consistently do extraordinary things” , Dee Hock

In this case some of the ordinary people that need to be mentioned are:

Carlos Mainez, Ali Khan, Gabriel Fernadez, Miguel Salva, Victor Fernandez with the support of all the colleagues on our teams and the collaboration of our cloud Partners, Vector, APSL and LinkeIT.

Thanks a lot for leading the change!

Our Company
TUI Destination Experiences is the world’s leading provider of destination experiences. TUI DX offers 14 million guests a portfolio of excursions, activities, tours, transfers and guest services. Our subsidiaries include Intercruises, the leading global cruise handling company and Musement, one of the leading online platforms in the Tours & Activities market. TUI DX is part of TUI Group.

Find more about TUI DX technology on our blog: https://medium.com/tuidx

--

--

Miguel Angel Coll
TUI MM Engineering Center

Technology passionate, vocational manager, CTO @ Domminion Commercial