Avoiding Common pitfalls when running a Cloud Cost Optimization program

Keiran Holloway
5 min readMar 9, 2023

--

I have authored a peice around How to run a succesful AWS Cost-Optimization program in the past. This article is a follow-up on this, to discuss common mistakes and pitfalls.

This covers the key misteps that I’ve observed in my time working with AWS.

Stop doing optimization and start doing cost governance

If you’re commencing a cost optimization activity within your organization then I have bad news for you: You’re already wasting money.

When operating in the cloud, the concepts of cost and architecture are effectively the same thing. Ensuring that you have good architecture practices in place that considers cost is critical.

Your cloud team should be oriented for good cost decisions up-front which will save you money. This means good architecture decisions. This will also save you time as you will not need to go back to unwind design decisions which were cost ineffective.

Failing to understand the recommendations

As per my last blog post, the first step is to look at the recommendations being provided and drawing sensible conclusions. Cloud Management platforms (such as Cloud Health) and associated tooling will provide recommendations. These recommendations are provided without context. This means that they could be wrong. For example, recommedantions around downsizing EC2 instances takes into account performance data available. Chosing to act on these recommendations without context could have a material impact on service availability.

I fondly reflect on a customer that I was working with who looked at AWS cost saving recommendations and decided to remove 300 “unused” elastic load balancers

They were considered “‘unused” because they’d had a very small number of hits and used with ECS containers. The cost management tool was misreporting their use. A well-intentioned cloud engineer went in and deleted these load balancers. Whilst this would have saved a chunk of cash, this also rendered both the user-acceptence and development environments unavailable.

Worse than this — the URI paths and target groups had been added manually (outside of IaC). This mean that it wasn’t simply a case of re-deploying. Every single one of these ELBs needed to be recreated by hand. This took days to complete. Whilst this work was being completed none of the developers could do their job.

There were significant cost-savings which were otherwise possible. In part, the ELBs could be consolidated. However, deleting every single load balancer was not the desirable approach!

Not having the right buy in from around the business

Needing to rearchitect and transition workloads off expensive deployment patterns takes time. This also requires involvement from various stakeholders, and they need to be on-board.

If you don’t get this, then you’re going to be stymied from the get-go.

Previously, I worked with customers who had a legacy shared services environment. This environment was running software which was aging and unreliable. This needed to be propped up on a daily basis. It was overprovision with too many resources and all work was manual. Simple cost effective concepts like autoscaling were missing.

Our team built a new shiny platform which was all singing and dancing. All administration tasks were automated using IAC and running at a fraction of the cost. The customer ended up running the two environments con-currently for many months. There was no buy-in or commitment from the application owners to deploy onto the new platform. Running two copies of the infrastructure resulted in even more wasted money!

Bringing in experts to help with the wrong aspects

When working through a cost optimization program, there are a few different roles. For example, application owners, developers, cloud engineers, cloud architects, security experts, data engineers.

Cloud engineers are the “doers”. They’re the people who will be putting hands on keyboards to do the work. A wrong (but common) assumption is having more cloud engineers at once will save money faster.

As per the theory of constraints goes. Any optimization outside the bottleneck is an illusion.

For example, understanding the context of the recommendations requires developer knowledge. It could also even need input from the application owner. If you’re looking to re-architect, you need cloud architects. Make sure you have the right skills and capability otherwise you could end up stalled. Adding more engineers only makes senses if you’re constrained with the doing.

Failing to think long-term — specifically around architecture practices.

It is not uncommon to see people lift-and-shift workloads from on-prem or dedicated host environments into the cloud. If you want a similar level of resilience and availability that you’re familiar with running you own facility — it is probably going to cost you more to use the cloud.

When moving to the cloud, consider the cost around long running EC2 or Virtual Machines. It can be expensive. Consider licensing requires to do so as well. Compare and contrast this against cloud native primitives (serverless or PaaS approaches); at a bare minimum — consider containerisation.

Be aware of the upper limits of what you can save..

You may have historically procured commercial discounts through volume purchasing agreements. For example, with AWS you can buy savings plans or reserved instances/capacity. When using these commerical constructs, you get a discount for commiting to a certain level of spend. Usually the commitment team is either 1 or 3 years and you agree to pay a minimum fixed fee each month.

Spending time and money once you get to your agreed minimum spent will make no difference to your AWS bill. Be concious of this when running cost optimization programs as this will be the upper limit of savings.

Before you go — if you’ve got this far you’ve hopefully seen some value in this article. I write and publish this content free of charge. The cost for reading it though? Please clap the article and click follow meThink of this as a gentlemen’s agreement. The cost is nothing finacial just a token of appreciation for the time taken to put this together. Thank you

--

--

Keiran Holloway

Technical Lead and Engineering Manager with over 20 years running complex public infrastructure. Strongly passionate about continous learning and improvement.