This is the second in a multi-part series aimed at identifying how organisations can manage and optimise costs to get the best of their cloud infrastructure. At Momenton we place a strong focus on leveraging cloud whilst also seeking to optimise cost management. I will be looking on the compute aspect of cloud in this blog.
About five years ago, Public Cloud services were not well understood. Large enterprises debated if migration to the cloud would meet their security requirements, paralyzed with the fear of the unknown. We have since come a long way — Digital Transformation is now synonymous with migration to the cloud. The benefits of on-demand infrastructure and elasticity have made engineers more productive and businesses happier with the promise of improved time-to-market.
Note: A key assumption being made in this write up is that your organisation has a considerable portion of applications that are stateless and can be subjected to horizontal scaling to make the most of cloud infrastructure. We will address optimising cost other workloads in another blog in this series.
Elasticity — The Double Edged Sword
Public Cloud is a double-edged sword, especially when it comes to on-demand infrastructure and elasticity. Although spinning up resources and scaling out horizontally takes little effort, gracefully shutting down and draining connections to the same resources needs a more foresight and engineering rigour. If this is not done right, you could be racking up a hefty bill.
The pay as you go model of public cloud is not as straightforward either. Several services have a complex billing model where data retrieval is charged at a different rate than data storage of the same service, AWS X-Ray is a good example one such a service. When X-Ray is used with other services such as AWS API Gateway and Lambda in a single solution, forecasting the spend gets a more complex.
Momenton is a proponent of Kubernetes and has helped several organisations adopt this container orchestration system. I will try and highlight the reasons why you may want to do the same.
Focusing on compute as a starting point
At Momenton we have the privilege of working with several clients large and small who operate in domains varying from Finance, Utility to Gaming. We see the nuances of how their teams grow and adopt different technologies on their road to cloud adoption. However, irrespective of their domain there are always similarities in the way they utilise resources.
From our experience, we see that compute is generally the largest cloud cost. For example, the following figure shows the usage costs by service for four companies across different domains. We can see that compute is the biggest component of cost, with it contributing to 40% to 70% of the cost.
Understanding Cost to Usage Ratio
Public Cloud Vendors do a good job of breaking down usage by line item and show the trends of the billing cost. What they do not do, is a good job of highlighting the actual usage of service. For example in the EC2 line item from the images — all we see is how much we pay, not how much of that service is being used.
You need to dig deeper to get a better understanding of utilization. Cloud Watch on AWS and Stack Driver on GCP do a pretty good job at this.
Leveraging the right data for informed insights
What you cannot measure you cannot optimize, and measuring the right metrics is even more important. To explore inefficiencies in compute utilization we can start with collecting CPU and RAM metrics. Cloudwatch allows easy plotting of CPU usage metrics.
Averages are misleading — When you look at data points on a very large scale, averages water down spiked sustained CPU usage. The above graph shows the average CPU utilization is about 7%. When you look at the P95 & P99 of the same data you see a slightly different picture.
Use percentiles instead — In the above two images you can see the CPU utilisation is around 16% and 22% respectively for the 95 and 99 percentile measurement, which is significantly higher than the average measurement. These two measurements paint a more accurate picture of the CPU load profile and the compute capacity you need to configure. It’s generally a good idea to have your elasticity desired state configure to P95, Average to Min & P99 configured to Max.
The same metrics can be plotted for RAM as well — Most cloud service providers don’t let you do this out of the box due to limitations in the hypervisor but Momenton has you covered. To enable memory metric collect checkout this repo 💻 https://github.com/momenton/aws-memory-monitor
Always on Elastic Infrastructure is the future
It is a common practice to see organizations turn off their non-production infrastructure at nights or over the weekends. But with more organisations choosing to have geographically distributed teams working in different timezones, turning off infrastructure is not always feasible.
Also, it is important to ask yourself how many times you have walked into work on a Monday morning to discover that not all your infrastructure has booted as expected. Teams that move fast tend and update their infrastructure-as-code often and may encounter issues more frequently. Broken infrastructure reduces productivity drastically so this is another reason to leverage elasticity instead of turning off infrastructure. In our experience at Momenton, we’ve discovered more often than not the productivity loss from infrastructure issues outweigh the cost gain of turning off the infrastructure over the weekend.
An organisation that takes the approach of leaving their infrastructure always-on will enable their teams to work in multiple time-zones and flexible work hours. Kubernetes allows for clusters to scale from hundreds of nodes to just a couple without any manual intervention. Resource allocation can also be changed for all the containers in a matter of seconds with just a call to its API.
Understanding the power of Kubernetes Bin Packing
Once you have understood the compute requirement of your workload, you will need a platform that will allow for the optimisations of the use of your servers. This is where Kubernetes & its Bin Packing feature comes in.
Bin Packing works just like the game Tetris. Consider each of the pieces in the game to be application workloads. Kubernetes takes these containers workloads and places them on the same server for maximum utilisation of the resources. The security, resource utilization and isolation of these containers is also managed by Kubernetes but this is a topic for another blog post.
Kubernetes allows you to think of compute resources (CPU & RAM) as a pool instead of different servers. This paradigm of approach allows to look at the entire compute requirement across the organisation and hand off the divvying of these resources to Kubernetes.
In these uncertain times, most organisations are looking to optimise their spend where ever they can. Momenton has developed a number of tools to allow for easy visualization of metrics that can be turned into actionable insights. Optimising compute utilization is just the start, please reach out to email@example.com to understand how Momenton can help your organization with this and more.