Cloud — Cost Optimization Strategies

Puneet Saha
AllThingData
Published in
6 min readSep 28, 2023

Now, we are in the process of migrating our applications and systems to the cloud. Some are becoming cloud-enabled, while others are being developed as cloud-native applications. Among the major cloud computing providers, Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure (Azure) are the top contenders.

There are multiple compelling reasons driving our shift to the cloud, with cost savings on infrastructure and development expenses being a significant factor. Luckily, there are various knobs which can be dialed to have a optimum balance of cost and needed infrastructure satisfying our use-cases. In this discussion, we will explore some rule-of-thumb strategies that can help us save costs during this transition across any cloud providers. These strategies are applicable across various cloud providers.

1 . Monitor Resource Utilization: Ensure you have proper monitoring in place for your applications, covering all core infrastructure resources: CPU, storage, disk IOPS, network IOPS, and other performance metrics such as latency. Look for any underutilized resources; these are your ‘low-hanging fruit’ opportunities to seize. Remember, you’re paying for resources that may not be fully utilized. For instance, if you observe that CPU utilization is consistently below 50%, it’s a potential area to investigate. For example, poorly tuned Spark jobs can increase costs by 50–60% compared to well-tuned jobs that maintain utilization in an acceptable range of 70–80%.

2. Resource Tagging: Tagging your resources has proven to be extremely helpful in increased accountability and also helps in diagnosing which of the resources are consuming most of your budget and are most under-utilized. Once identified: you can look into the below strategies or come up with your own solution.

3. Selecting the Right Storage: Selecting the appropriate storage type — Block Storage, Object Storage, or File Storage — for your use case is essential to minimize costs while ensuring optimal performance, availability, and data durability. It’s important to note that transient data does not require block storage; only persistent data necessitates the use of persistent storage. Transient storage is typically more cost-effective than persistent storage.

4. Tiered Storage Approach: When selecting storage resources, adopt a tiered approach. Some data are ‘hot’ — in constant demand and needing immediate access — while others are ‘cold,’ required infrequently and can wait. Determine which data falls into each category and choose your storage solution accordingly. Cold storage options are often more cost-effective than regular storage. Cloud providers offer various tiers of storage, with multiple parameters to fine-tune according to your use case. The trade-offs involve factors such as cost, accessibility, durability, throughput, and IOPS. For example, AWS provides options like Amazon S3 Standard, Intelligent Tiering, One Zone-Infrequent Access, and Glacier, among others. Similar choices are available for block storage, and you’ll find comparable offerings with other cloud providers.

5. Compute Resource Selection: When choosing compute resources, it’s wise to consider a tiered or hybrid approach. There are various alternatives, including instances, serverless compute (e.g., Lambda), containers, and serverless containers. The choice of compute resource comes with a trade-off between cost and control. Generally, the more control you seek over the infrastructure of compute resources and the duration you want to hold on to the instances the expensive it is. However, the concept of ‘expensive’ varies depending on your specific use case. For instance, if your use case involves handling consistent heavy traffic, instances or containers may be more suitable than serverless options which are suited for lightweight, spikey tasks. I would opt for instances only if there are very specific infrastructure requirements. In most cases, containers perform admirably for scenarios involving high volume, low latency, and varying rates of traffic.

6. Pricing model for instances : There are several planning model for instances as well. You can again take a hybrid approach based on what kind of pricing model you want to select for each service your application is hosting. Some services needs to be up and running 24/7 — a good candidate for reserved model. Some services or jobs are not so critical and can take hit if enough compute resources are not available — a good candidate for on-spot model( cheaper than reserved) as there is no guarantee by cloud provider to procure you any instances.It is based solely on any idle resources available for that AZ. Then there are cases, when your service is taking hit more than the anticipated load. At that point, you might go for on-demand pricing model for few instances to handle the sudden unexpected load. So a good approach is to have reserved instances for expected load. And on-demand for the spike beyond expected load.On-Demand are most expensive to use it only for bare minimum required. There are few more pricing models which you can explore but above should give you an idea on pricing models and how to pick one.

7. Microservices Architecture: Adopting a microservices architecture facilitates a tiered approach to selecting the right compute resources. This helps reduce costs while ensuring scalability, performance, and meeting availability SLAs. When developing a service, ensure that it does not attempt to do too many things simultaneously. Breaking down tasks in terms of synchronous vs. asynchronous and real-time vs. batch processing will assist you in effectively choosing the right compute resources, preventing wastage of resources and costs. Consequently, your applications may feature a hybrid mix of dedicated instances, containers, and everything in between.

8. Instance Sizing: Picking the right instance type is paramount, influencing both application performance and cost. Conduct thorough load testing on your application and actively monitor it in real-time to determine if it faces constraints in compute, disk IOPS, memory, or network resources. This involves identifying whether your application’s performance is hindered or limited due to the scarcity of any of these resources.

Once you identify the bottleneck, you can experiment with upgrading instances that offer higher availability of the constrained resources. The increased costs can often be offset or even surpassed by reducing the overall cloud billing through a reduction in the number of upgraded instances needed to meet your requirements. Scaling vertically (upgrading instances) and descaling horizontally (reducing the number of instances) can significantly enhance your application’s performance while keeping costs in check.

9. Efficient Network Management: Network management is another critical aspect, and costs can escalate significantly based on internet traffic from the outside world and data transfer between components in your application architecture. If possible, choose a region where data ingress and egress are the most cost-effective. However, if your application is geographically distributed globally, selecting just one region may not be feasible. In such scenarios, ensure that your requests take the shortest possible routes. If the data needs to be processed and stored in some common storage located in different region, only the final processed data is transmitted in a compressed form. Compressed data consumes less network bandwidth, but the trade-off is that it requires some compute resources for decompression and processing. What works best in terms of performance and cost should be tested to determine the optimal solution.

10. Utilize CDNs and Global Accelerators: Content Delivery Networks (CDNs) and Global Accelerators can help by routing traffic to the nearest data centers, thereby reducing network costs. As always, these considerations should be tailored to your specific use cases. It’s essential to measure the benefits against the costs of utilizing CDNs and other cloud services to determine what aligns best with your objectives.

11. Minimizing Internet Traffic: Accessing services or transferring data over the Internet is generally more expensive than doing the same within a private or internal network. Therefore, having VPN, VPC peering, and Direct Connect (with your on-premises data center) is always a good choice.

12. Cloud Management Tools: Last but not least: there are many tools provided by the cloud providers and 3rd-party vendors, such as cloudcustodian and resilio, to name a few, which can ease your task.

These are useful to save your cloud billing and not necessarily Total Cost of Ownership(TCO). For reducing TCO, you’ll have to factor in engineering and opertaional resource , data center , hardware resources as well in the cost. For e.g , would it make sense from business and financial perspective to have your own kubernetes cluster or use a managed service like AWS EKS .

By no means these are the only and all strategies. If you find some innovative strategies, please share them in the comments.

--

--