Traveloka Data Cost Optimization & Management During & after COVID-19 Pandemic

Credit

Editor’s Note:

Today, we hear from Juan Kanggrawan on his high-level recount of overseeing an organization-wide data cost optimization initiative across all components & services of Traveloka’s data infra amid the economic turmoil imposed by the pandemic that has severely impacted the travel & tourism industry, where Traveloka as well as her peers in the business found themselves within.

Juan is a former data/analytics lead and currently, a senior data technical product manager (TPM), who’s been covering international expansion, data university, cost management, & commoditized ML domains. He has passion in sharing his experiences, knowledge, as well as capabilities to a wide variety of communities.

Intro & Context

As a senior technical data product manager, I have been driving rigorous data cost optimization initiatives between March and May of 2020. I wish that this article can be useful for all readers, especially to those who are also in charge of the organisation’s cost optimization initiatives.

2020 has been a really challenging year for any business, especially for those in the airline, tourism, and hospitality industries like Traveloka. With all the different challenges as a result of the COVID-19 pandemic, it was a great opportunity for us, in hindsight, to rethink the way we run our company, especially related to our technology infra cost. In the past, we have not done any systematic and rigorous cost optimization effort in an organisation-wide scale to collectively assess and optimise our cloud infra cost. The target for the data team was to reduce our monthly data infra cost by at least 50% from March to May 2020.

Why

As part of a technology unicorn, we, in the data team, often ask countless “whys”. Defining the correct problem statement with reasonable hypotheses is always a crucial first step and answering the “whys” will provide clarity about the direction of the plans and actions that we will take, including for this cost optimization. For me personally, I stepped back and pondered “why are we doing this cost optimization?” and “Is it simply our reactive actions towards COVID-19 pandemic?” After several discussions and thinking processes, I concluded that it was because we wanted to push ourselves to be more sustainable, accountable, and transparent not only to ourselves (the internal data team), but also to other business units and all management teams. Beyond the COVID-19 challenge, I strongly believe that these ground principles are critical in bringing us forward as a sustainable and respected technology unicorn in the region.

How

When the initiative started in mid-March 2020, our first task was to understand our top cost components, cost fluctuation, usage pattern, implications if we scale down / decommission certain tech assets, etc. With all these complexities (and variables), establishing a cost optimization task force (aptly nicknamed the cost killer) was a crucial first step. Our task force was a relatively small unit of 5–7 core members from various cross-functional teams such as the PICs of analytics, data ingestion, data warehouse, DevOps and technical product management. We met weekly to discuss our progress, challenges, and next steps. Many times, we needed to make tough decisions that would have side effects on other teams. I am truly grateful, in retrospective, that we could work together really well, set aside our own ego or biased personal perspective, and moved together in achieving our collective objective despite our differences.

The second aspect that I can highlight is the tremendous support from business stakeholders. In the past, the data team had tried to perform cost optimization and governance, in hindsight, despite the absence of urgency and collective agreement about the exact priority / criticality. Hence, the previous initiative did not persist or scale to a wider level. This time was different. Since the beginning, we had a clear mandate from Traveloka management and business units to collectively reduce our technology infra cost, making operational / day-to-day decisions straightforward with such a unified direction. When we had considerations and trade-offs due to new requests from business units, the cost perspective was our north-star in guiding our discussions, negotiations, and ultimately, our collective decisions.

The third aspect is about the (technical) execution. Given the non-technical scope and limited space in this article, let me share just the primary highlights from our experience. After we figured out the top six most expensive components / services in our data infra, we carried out the following three fundamental action items in the next 1–2 weeks:

  • Data retention policy. Surprisingly, the data team has not archived our data systematically in the past. After several discussions with various data leads about the historical duration of archive coverage, we decided to archive data that was more than 2 years old. In addition, we are currently moving our infrequently accessed data from our active storage to long term storage.
  • Resources, processing, query review & optimization. During our “audit”, we were also surprised to find queries that were triggered up to 2,000 times per week! We also checked and evaluated the top most expensive queries to ensure we could refactor them in the most efficient and reasonable ways.
  • Kubernetes cluster downgrade & deletion. To our surprise again, we had so many under-utilised resources related to our Kubernetes clusters.

Result

By April 2020, We were really happy and encouraged to start seeing concrete progressive results. With this promising trajectory, we strived to achieve a 50% data infra cost reduction by the end of May 2020, a month ahead of the original schedule. And together with various stakeholders, we did it. (Well to be precise, it was a 47% reduction. But nevertheless, a substantial accomplishment for the data team.)

As I mentioned in the introduction, this cost optimization initiative was not only our reaction towards COVID-19, but it was our pivotal step to ensure that Traveloka can run its business operations sustainably in the long run. Even after our intensive cost optimization effort in March-May, we still continue to check and review our monthly cost pattern rigorously & diligently. We are also refining our cost attribution model for each business unit, in order to have more transparency on our technology infrastructure.

Learning Points & Next Steps

After we have achieved our target, we realized that we also have fostered a new good habit. Our cost optimization task force still meets weekly to monitor, review, & maintain our recurring cost. During major sales events such as EPIC SALE, we exercised rigorous habits in managing our cost fluctuation with more anticipation. Going forward, we are evaluating the tradeoffs between visibility and efficiency of our cost components as some of our infra services are being shared or used collectively to ensure cost efficiency. But with that approach, it is not straightforward to know exactly the usage of each service / component by individual business units. We can develop dedicated tech infra, asset, or pipeline for each business unit at the expense, however, of increasing cost. In a follow-up article, I will share more in detail about justapoxing cost management between shared & dedicated resources from visibility as well as efficiency perspectives.

Until then, if the topics and experience you’ve just read are of interest to you, I invite you to check out our careers page for a potential role that may challenge you to solve these kinds of problems everyday in Traveloka.

--

--

Juan Intan Kanggrawan (juan.tan.kang@gmail.com)
Traveloka Engineering Blog

Interests: Analytics, Strategy, Innovation, Smart Cities, Urban Science, Public Policy, Nation Building, Social Impact, Ideology, Philosophy, Arts (Paintings)