The cost of your cloud infrastructure may be far more than the dollar value in your monthly invoice.

Total Cost of Ownership: The Cloud Cost Analysis You Should be Doing

Many organizations get hung up on dollars and cents comparing cloud providers and services, but don’t consider the total cost of managing and owning the infrastructure.

Ben Goodman
6 min readApr 28, 2023

--

In focus: Cost Optimization

Any good engineering solution will aim to optimize performance and cost. Especially in the current environment of more expensive capital, many organizations and engineering teams have a renewed focus on cloud computing cost management. This focus showed up in Amazon’s Q1 2023 earning report, with CEO Andy Jassy commenting that “companies [are] spending more cautiously in this macro environment” amid AWS’s Q1 revenue growth slowing to 16% year-over-year.

We are not here to say that companies should not scrutinize their cloud spending, they should. Especially for larger, enterprise organizations, there is a large opportunity in cost optimizing a cloud infrastructure footprint. For many smaller and medium sized businesses (SMBs), however, these dollar cost optimizations are at a scale where decisions on cloud services and cloud service providers can look insignificant relative to differences in the Total Cost of Ownership (TCO) of the product.

Total Cost of Ownership

Total Cost of Ownership for a cloud provider and cloud service can be thought of with the following formula:

TCO = $ Cost + Development Cost + Management Cost + Tech Debt Cost

Let’s look at each of these components further to understand how that might impact a potential solution’s TCO:

$ Cost

These are the tangible, dollar costs that will show up in your Cloud Provider’s invoice each month when you consume a cloud service.

There are books, products, and careers made on this subject but some possible first places to look for savings include:

  1. Use pay-by-usage, serverless compute offerings when possible. This allows you to pay only when using the compute, and oftentimes lowers spending due to asynchronous consumption of resources. For example, pay for compute when your customers are online and using the service, then scale to 0 when they are asleep and offline overnight.
  2. For long running application needs, if your organization wants to commit to it, year long or 3 year reservered instances can unlock immediate, 30% + savings over more flexible, on-demand pricing.
  3. Find an automated tool to identify savings within your cloud spending (like dragondrop’s built-in cloud cost calculator, powered by infracost).

Development Cost

This is the cost in engineering hours for configuring and setting up the new cloud provider or infrastructure service. Because engineer time is quite expensive, this cost component can add up quickly at a monetary level. And, if a particular service or cloud provider is particularly cumbersome to work with, there is an opportunity cost of the engineer not being able to work on other key technology components.

Ways to optimize here include:

  1. Building services that are deployable via containers to allow parity and quick testing of changes locally prior to deployment to cloud infrastructure.
  2. Building with serverless technologies. These technologies, pioneered by AWS Lambda, and including offerings like GCP’s Cloud Run and Azure Functions, allow developers to avoid management of servers, machine images, etc. and spend more time on design and deployment.
  3. Use cloud providers and services with clear documentation and sensible defaults built-in. This minimizes the learning curve for deployment of a new cloud service. In our experience, GCP has the clearest and most concise documentation, followed by Azure, and then AWS. On the other hand, AWS has by far the largest user base, and so the most available content and online Q&A on Stack Overflow when debugging.
  4. Using a cloud provider with a single service for what you are trying to do. This is an easy way to save hours of engineering time immediately without needing to research, compare, and contrast different tools. This cost is particularly associated with AWS, which tends to ship products very quickly and at high volume (For example, should we use Parameter Store or Secrets Manager? For pub/sub, should we use SQS, SNS, EventBridge, Amazon MQ, Kinesis Data Streams, or managed Apache Kafke?).

Remember, if an engineer is working on infrastructure deployment and management for one day on a problem due to the selected cloud provider or service, that becomes a very real expense very quickly. For a cloud engineer earning $50 / hr, if they spend 8 (avoidable) hours working, that is the equivalent of running an extra x1.32xlarge EC2 instance (128 vCPU, 1952 GB Ram) for 24 hours — and still having ~$80 left over.

Management Cost

This is the cost in engineering hours for operating and managing a cloud provider or service. A service may be quick to start up, but if it costs a lot to maintain and keep active, that is a continuous cost your organization has to pay that does not show up in your provider bill.

Ways to optimize here include:

  1. Serverless options help minimize costs here again. Without needing to manage the underlying servers, maintainence toil (updates, patches, etc.) is greatly reduced.
  2. Pick services which have sensible defaults in place and are easy to manage in the first place. A perfect contrast here is AWS’s Elastic Kubernetes Service (EKS) vs. Google Kubernetes Engine (GKE). This comparision could be another article itself, but essentially GKE is so much better configured for “out of the box usage” that the engineering management cost for a kubernetes cluster on GKE is much, much less than EKS.
  3. Ensure service reliability. This is not a differentiator between any of the big three cloud providers, but is worth keeping in mind.

Tech Debt Cost

From Wikipedia:

technical debt is the implied cost of future reworking required when choosing an easy but limited solution instead of a better approach that could take more time

While any team that is moving fast and shipping rapidly will accumulate some technical debt, we find that Tech debt can especially accumulate when using “Platform-as-a-Service” (PaaS) offerings from major cloud providers. These tools can be great for spinning up applications quickly, but you may find yourself locked within a restricted feature set. This can lead to pain and toil down the line when your application outgrows the PaaS offering. Offerings in this category might include AWS Elastic Beanstalk, AWS Amplify, and Google Firebase.

When using a PaaS offering, ask yourself the following questions prior to moving forward to help minimize the accumulation of tech debt:

  1. Is my application going to require a lot of flexibility going forward, including integration with other cloud services? If yes, you might want to avoid a PaaS offering.

2. Are we okay with relying heavily on a solution custom built for tool XYZ?If not, you might want to avoid a PaaS offering.

3. Does needed speed in shipping the product outweigh the above concerns? If yes, it is probably worth pursuing the PaaS offering.

Conclusion

For SMBs, cloud dollar costs can form just a small component of the cost of maintaining your cloud footprint, so think about optimizing costs for Total Cost of Ownership and not just cloud dollar costs.

As a reminder, TCO breaks out as:

TCO = $ Cost + Development Cost + Management Cost + Tech Debt Cost

Astute readers will notice that serverless technologies check off a lot of boxes for minimizing costs across these different components — so we would recommend starting there 😉. Good luck!

dragondrop.cloud’s mission is to automate developer best practices while working with Infrastructure as Code. Our flagship product, cloud-concierge, allows developers to codify their cloud, detect drift, estimate cloud costs and security risks, and more — while delivering the results via a Pull Request. For enterprises running cloud-concierge at scale, we provide a management platform. To learn more, schedule a demo or get started today!

--

--

Ben Goodman

Senior Site Reliability Engineer @ ROKT. Working on developer tooling