8 Techniques for Cloud Cost Optimization

Published in

AirAsia MOVE Tech Blog

6 min readJan 15, 2021

Our primary responsibility as software engineers is to continuously deliver excellent value to our business. We operate in an agile manner, experiment a lot, and work on a plethora of projects that are time-sensitive and therefore we end up spending a lot of money on our infrastructure costs. As an organization, we need to pay attention to our operational cost and keep it under control.

We recently performed several waves of optimizations on the cloud and were able to optimize the monthly billing by around 55%.

Here are some recommendations to help folks who are looking to optimize their billing on the cloud. Most of the recommendations are from our experience working on the Google Cloud Platform. We believe these recommendations can also be helpful to optimize billing on other cloud platforms as well.

1. Designing the cluster

We have multiple Lines of Business that scale independently of each other and therefore, the choice of a region where you host your application is very important.

Having a high-level view of the landscape and services that you may need to support the product roadmap is a very important criterion. Example: Google App Engine is not available in all regions.
Understand the target customers and the primary revenue-generating group. This will ensure that you do not compromise on performance. The closer the region of choice to the end-customer the better.
Colocating clusters in the same region is a good thumb rule to follow. This is a very important criterion that helps you save on the Network Egress cost across regions. Example: For a web application, co-locating the load balancer, micro front ends, microservices, database, upstream services all in the same region will bring down the network egress by a lot. I mean a lot!
The cost of services varies by the choice of region.
Design for multi-regional only for services where needed and based on the environment. Example: You may have to consume a SAAS that is on prem and available only in one region. To overcome the last mile network degradation, you may some times need to take the trade off over EGRESS with two regions in play.

2. Log Optimisations

Many times we churn services with log levels with info during an initial launch of a new component. However, it is important to note that as the traffic increases, the cost of logging and storing them will also increase depending on the log level you have set up. Hence it is important to log only errors and more importantly log only actionable insights (Not all errors are actionable insights). There are as well a few configurations that need to be well understood, such as on GCP, there are two retention buckets for logs, _Default which is billable, and _Required which is free. Optimize the log rotational period as required. By adding exclusion filters you can also selectively take the logs that are an actionable insight to storage.

Example: One of the frequent problems seen during a cascading failure is a surge in 500 errors that will increase the billing with logs being written at a high velocity. By adding an exclusion filter policy you can gate and stop taking the same pattern of logs for a given rate limit.

3. Resource Tuning

Many times young teams are churning several spikes and over a while, you may end up with tons of resources unutilized that are increasing your operational cost. The billing details must be made transparent to the team. Periodic review of the cost table and services provisioned will help review the list and the need for resources provisioned.

Based on usage statistics review the spec of Compute Engine. Example: If you see your CPU usage is below 5% in an 8 vCPU, you may look to downgrade the spec to 2 vCPU.

Defining VM as pre-emptible is helpful for use cases that run scheduled jobs or data pipelines or on-demand jobs submission to data proc.

In a typical microservices setup with Google App Engine, it is important to note that the Memcache that is shared is free of cost while a dedicated Memcache is billable. All lower development stages can use shared Memcache and the only production to have dedicated Memcache.

Review GCS periodically, relevance of the data. When you have time sensitive data, look at crons to clean up the data storages periodically.

4. Big Query Tuning

It is important to first define the value of the data and have a plan created for historical data and short term data. Partitions to be clearly defined and setting the expiry on datasets is very important towards optimizing the billing. Review sinks provisioned and review the need regularly. Micro batching the data into a big query is a more efficient approach than streaming the data. Streaming data into BQ is expensive whereas batch insert is free. Most of the sinks utilize streaming- so instead of that, setting up some cron jobs that do a batch insert into BQ from a source like a stack driver/pubsub — will be super cheap with a bit of maintenance overhead.

5. Google App Engine Tuning

Google App Engine is a great platform to deploy in agile and enables many startups to get going very fast without worrying about the scaling aspects.

However, it is important to keep a close eye on App Engine as your traffic increases and you start to scale high.

Defining a scaling strategy and a cap is a good first step to prevent excessive operational costs during a cascading failure.

Here is a recommended scaling cap setting to support RPS of 8000 with an F2 instance type for GAE standard. Please read about the need for each attribute from the official documentation.

automatic_scaling:  target_cpu_utilization: 0.85  min_idle_instances: 2  max_idle_instances: 30  max_instances: 100  min_pending_latency: 200ms  max_pending_latency: automatic  max_concurrent_requests: 80

For GAE Flex

resources:  cpu: 2 # 1 in lower envs  memory_gb: 8 # 4 in lower envs  disk_size_gb: 20automatic_scaling:  min_num_instances: 1   max_num_instances: 100 # 10 in lower envs  cpu_utilization:   target_utilization: 0.85

Capping cannot be blanket always, you may also need to layer your core microservices into critical path/Revenue generation, non-critical path. Read VS Transactional and define efficient and relevant capping policies.

6. GKE Tuning

The avenues to tune over GKE are endless, however here a few cores that have helped us a lot:

If you can predict your workload on the resources, subscribing to sustained use discounts and committed use discounts helps save a lot on an annual basis.
In our experience going with a lesser GKE cluster, with logical namespaces by the domain and well-defined node pools has helped us scale and not having to over-provision.
Reviewing the Infrastructure As Code to ensure spec for lower environments is customized with replica sets tuned to minimum requirements, scaling policy checked.
Within a node pool, the communication does not cost you. Collocating related microservices, using the concepts of node pool affinity, and assigning pods to nodes with GRPC you can benefit from high performance and optimized billing.
Configuring Horizontal Pod Auto-scaling will help optimize the cost as well. Example: We may have dedicated workers to listen into queues during a sale event and HPA can auto-increase the PODs and scale down after the event.
Housekeeping of the GCR and the docker images is a good practice.
For dev environments, consider the usage of a pre-emptive VM for GKE node pool for stateless components.
Deriving the spec/template of the GKE for a given node pool is also a healthy exercise to avoid overprovisioning. A few services may need high memory, a few may need high CPU, and a few may need high CPU and memory. Each node pool can be configured based on the need of the microservices.

A great way to calculate CPU request would be to apply this formula:

((1000 — buffer) x nodes x vCPU per node) / (number of services x replicas for each service)

E.g. ((1000–200) x 2 x 2) / (3 x 3) ~ 530m CPU request

7. CDN Tuning

Reviewing the data pipeline, storing relevant information over CDN.
Move the origin close to CDN or by using origin shield
Tune the headers, Enable Etag, max-age, expiry props. Tune the byte size.

8. On-Demand Environments

Having DevOps practices mature over a while with recipes will provide you the ability to create and destroy environments on-demand.

Consider using a dev or staging environment only for testing a piece of code and once done you tear it down. Having the ability to shut down your environment during long holidays, weekends goes a long way.

Summary

Opening up the billing information to the team helps to gain insights and continuously optimize the operational cost.
Evaluate the operational cost of a feature VS returns periodically.
Not all recommendations help the team to tune, empower teams, and be the informed captain to place your bet.
Adding the review of billing to the definition of done and release checklist helps to catch burns proactively.
Don’t forget to rerun your load testing and chaos testing after you have optimized your billing cycles :-)