Expedia Group Tech — Engineering

Head in the Clouds

How we monitored our Cloud spend to save money at EG

Nidhi
Expedia Group Technology

--

A scenic view of the Eiffel tower with a background of a cloudy sky and sunshine peaking through.
Photo by Il Vagabiondo on Unsplash

As organizations continue to move their applications to the cloud, they need to be ever vigilant of their cloud costs to keep them manageable. I oversee the part of Expedia Group™— Global Tax Platform that has recently experienced a tenfold increase in the volume of transactions flowing through our platform. We were not only able to mitigate the anticipated increase in cloud costs due to the increase in transactional volume but also reduced our cloud spend by approximately 45%. This has allowed us to contribute to Expedia Group’s cloud cost-savings stretch goals and provided opportunities for our engineering teams to build cost-savings strategies into their day-to-day practices.

Our tax platform infrastructure is hosted on the Amazon Web Services (AWS) cloud. And we run ETL (Extract, Transform, Load) data workloads on Apache Airflow and Apache Spark. Anticipating the increase in cloud costs and mitigating them was a key focus area for me and my teams for many months. In the next section, I cover the key cloud cost-optimization strategies that we employed as we iteratively thought through this challenge and implemented solutions.

1. Cloud storage: Cloud providers charge for storing objects in their storage systems. Let us take the example of Amazon S3 storage buckets. The rate charged here depends on several factors, such as your utilization (the GB of storage, the requests made against your storage bucket), duration of storage, and the storage tier. There are many storage tiers, such as S3 Intelligent-Tiering, S3 Standard-Infrequent Access, S3 Glacier Instant Retrieval, and S3 Glacier Deep Archive. Each storage tier has a different pricing structure.

Analyzing our existing life cycle migration policies and moving to lower-priced tiers when data-access patterns change (such as when the frequency of access decreases) has led to cost reductions. For example, we identified an opportunity to move data residing in the Infrequent Access tier ($0.0125 per GB) to the Glacier Instant Retrieval tier ($0.004 per GB), achieving a 60% savings.

Analyzing our file storage trends and implementing new lifecycle versioning, archival, and purge policies have also led to cost reductions. While keeping multiple copies of data for disaster recovery purposes must be considered, we also analyzed the cost implications of storing this data. For example, we created S3 lifecycle rules to keep a smaller amount and limit the number of non-current versions stored. We also implemented purge policies to delete non-current versions after a few months.

2. Leveraging AWS Graviton processors: Amazon Elastic MapReduce (Amazon EMR) is a service that allows you to process vast amounts of data. Amazon EMR leverages Amazon Elastic Compute Cloud (EC2) instances to distribute processing and scale big data environments. Amazon EC2 instances are powered by Intel and AMD processors. Amazon’s Graviton processors are priced lower and provided us an opportunity to move from using Intel-based processors. Moving to lower-cost AWS Graviton processors across all our platform instances allowed us to realize additional annualized cloud cost savings.

3. EMR cluster management: There are several cost-optimization methods that can be enabled to more effectively manage EMR clusters. Some areas we invested in are autoscaling of EMR clusters, enabling spot instances and EMR transient clusters, reducing the run time of transient clusters, and auto-termination of clusters based on idle time.

Additionally, we leverage centrally enabled Expedia dashboards, metrics, alerting, and reporting to monitor and act on any increases in our cloud costs.

The above-mentioned strategies have proven to be effective in significantly reducing our cloud costs across our big data and ETL workloads. Teams must employ strategies tailored to suit the customer support needs, Service Level Agreements (SLAs), and the technology stack of their platform. When possible, it is recommended to analyze needs up-front and consider appropriate cloud packages and services, taking into consideration long-term cost implications.

Monitoring our cloud spend is a constant journey. My engineering teams participated in the innovative Slashathon event organized across Expedia Group in June 2023, which brought in additional cost-optimization recommendations to further drive savings!

All of this could not have been achieved without the intellectual prowess and drive from my awesome engineering teams! #Taxmanians #Intaxicated #TaxRangers

References:

https://aws.amazon.com/s3/pricing/

https://aws.amazon.com/s3/cost-optimization/?nc=sn&loc=2&dn=4

https://aws.amazon.com/blogs/storage/reduce-storage-costs-with-fewer-noncurrent-versions-using-amazon-s3-lifecycle/

https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html

https://www.amazonaws.cn/en/elasticmapreduce/

https://aws.amazon.com/blogs/big-data/amazon-emr-now-provides-up-to-30-lower-cost-and-up-to-15-improved-performance-for-spark-workloads-on-graviton2-based-instances/

https://aws.amazon.com/ec2/graviton/

Introducing Amazon EMR Managed Scaling — Automatically Resize Clusters to Lower Cost | Amazon Web…AWS is happy to announce the release of Amazon EMR Managed Scaling-a new feature that automatically resizes your…aws.amazon.com

--

--