The original article was written in 2017. All of the below recommendations are still valid.
However, since the article was first published AWS released a lot of new features and products to help customers better manage their cloud spending.
A reworked and updated version of this article can be found here: https://kainos.com/how-to-unlock-cost-savings-on-aws-2
My team and I are responsible for running a pretty big (and successful) SaaS product. We are growing as a product, our customer base is growing, and… our infrastructure bills too. That’s inevitable. I remember some milestones like hitting first $1k bill, followed up by $10k, $20k, $30k… During last 5 years of running production and development operations on AWS my team gained a lot of experience and knowledge about optimising, managing, and monitoring costs on Amazon Web Services.
In this article I would like to share our experience in a form of a non-exclusive list of actions which can help you in managing costs on Amazon Web Services. I think it makes a good reference for everybody both starting their journey with AWS and those who already run massive operations on AWS.
I split this article into 3 sections. The first one talks about actions we do in production environments. The second one talks about actions we do in non-production environments. Finally, the last section talks about monitoring your spendings.
Optimising costs of production operations
Here are the things which my team is doing to proactively manage the costs of running production operations:
- buy reservations for RDS (one of the most expensive service we use), ElastiCache, and of course EC2;
- for EC2 reservations buy the convertible ones; they are slightly more expensive, but have the advantage of being convertible to new machine types — cloud platforms are not static, platform sizing exercise must be done on a regular basis, convertible reservations come in handy;
- buy EC2 reservations for all static servers; let true AutoScaling Groups (i.e., a fleet of background workers) use on-demand instances (BTW: AWS recently introduced 1 second billing); of course if your ASG is operating at a predictable/measurable minimum capacity 24/7 it makes a perfect sense to purchase reservations for ASG to match its minimum capacity;
- buy EC2 convertible 3 years with no up-front payment reservations — this gives us a reduction of 50% compared to on-demand ones; also 3 years no up-front reservations have the advantage of not causing a heart attack in your finance team which potentially could happen if they saw an invoice for 50 machines for 3 years with full up-front payment;
- start servers (and especially clusters) on-demand — a good example in our product is the use of AWS DataPipeline and AWS Redshift (Redshift is pretty expensive (and pretty powerful too!) data warehouse as a service); we start it on-demand in the morning, run governance & compliance tasks, and then terminate the cluster; the essence of pay as you go;
- we are not using Spot Instances in production account— yes, I know that there are many examples of using Spot Instances in production, but due to the long running nature of our work we are not (yet) using Spot Instances in production; however we use them heavily in all non-production environments (see below);
- AWS Simple Monthly Calculator is your best friend — throughout the years the online calculator was the tool for creating my all infrastructure budgets; I’m fond of the fact that my budgets were always accurate; with AWS costs calculator it’s very easy to run multiple infrastructure simulations and compare them;
- AWS costs calculator supports reservations — you can try out different setups and choose the best one for you;
- if you do infrastructure as a code and use AWS CloudFormation for it, you can get an estimate of your monthly bill — just upload the CloudFormation template to AWS Web Console and click on the costs link; this can give you a very good indication of how much would new features/services/resources affect your monthly bill; the feature is also available in aws cli estimate-template-costs command;
- use Consolidated Billing (we do this at the company level) — thanks to this unused AWS EC2 reservations are shared with other accounts; also you get Volume Discounts for services like S3 and EC2; finally some services like AWS Shield Advanced, once purchased, are enabled out of the box for all consolidated accounts;
- evaluate new AWS services — we use AWS Redshift for some of our governance & compliance checks, however we are thinking about rewriting it to AWS Athena; this has the advantage of less maintenance & management overhead; remember that DevOps engineer’s time is also your cost — if a new managed service was released that can do the job (perhaps even better), then go for it.
Optimising costs of non-production operations
And here is a list of things we do to minimise our non-production operations:
- buy reservations for RDS, ElastiCache, and EC2;
- your staging envs most likely don’t need AWS RDS running in Multi-AZ nor you need a cluster of AWS ElastiCache Redis nodes; do you need 2xlarge AWS RDS instance? maybe you could live with just one NAT GW for all your VPC AZs? do a proper non-production platform sizing; if you are using AWS CloudFormation use condition function to provision reduced capacity resources; we use it a lot;
- in production we do a daily cross-region replication of all resources in both US and EU (DB snapshots, golden images, etc.)—perhaps there are tasks which in non-production environments could be run weekly? thanks to this we heavily reduced our data transfer fees;
- where possible use Spot Instances, in our product we are saving thousands of dollars every month by using Spot Instances;
- ideal use case for Spot Instances: build clusters (we have 4 different build clusters, all their agents are running on Spot Instances, see TeamCity and Jenkins documentation for further information);
- AWS makes it very easy to use Spot Instances in AutoScaling Groups — just specify the spot price bid in Launch Configuration and AWS will take care of the rest;
- don’t be greedy, as your Spot Instance bid set the on-demand price: worst case scenario you will pay the on-demand price;
- sometimes it’s even better to set the price a little bit above the on-demand one— this way you become more resilient to short term fluctuations;
- from AWS Web Console you can view Spot Instance price history graphs; the feature is also available in aws cli describe-spot-price-history command;
- shutdown envs at night and during weekends — AutoScaling Group has schedule time triggers which can be used to start/stop machines based on time events; the ultimate cloud costs reduction rule is: if you don’t need it terminate it; or better: don’t start it at all.
Once you have all optimisations applied it is imperative to monitor your spendings. AWS comes with a lot of tools to help you stay on top of it. Here are the things that you should be doing:
- review monthly cost allocation reports, monitor stacks, create your own tags and monitor costs across groups of resources/services;
- if you have premium AWS Support, review Trusted Advisor Cost Optimisation report—be aware that TA is pretty aggressive on figuring out what could be a potential saving; you may think twice before showing it to your CFO or you will be asked to cut your bill by a few thousands dollars next month…;
- review bills, they will list all resources, including reservations, and in case of EC2 also Spot Instances (I really enjoy looking at the Spot Instances sections of the AWS bill);
- create AWS CloudWatch Billing Alerts — compute an average of your last 3 bills and setup CloudWatch alerts for S3, EC2, RDS, ElastiCache, and Data Transfer; it is OK to get billing alert on the 28th day of a month; it is KO to get same alert on the 5th day of a month.
As I wrote the above list is a non-exclusive list. I hope that my experience can help you save money. For your convenience I tried to add a reference link to every point. This will allow you to dig further.
Also, I would say that some of the actions listed above can be successfully applied (with some modifications of course) to other cloud providers too.