AWS Cost Optimisation

Pronomita Dey
The Tech Matter
Published in
3 min readJun 1, 2024

Key routes/takeaways on slimming the bill

Things to keep in mind:

  • You have to always always ask, “ is this worth it” — it shouldn’t be “for the time being”
  • Make sure you understand your application and developmental requirements and accordingly make tradeoffs. Optimising is a fine line away from being a foolish miser — that fine line is called an outage.
  • Know what part of your system can tolerate faults. Not everything needs to be accounted for. Non prod workloads have an appetite for data loss. Logging solutions can tolerate a downtime. Read latencies from buffers won’t immediately break the system.
  • Reservations on AWS are super cost efficient. [WAIT]. Answer whether you need the system for the next 365days & an absolute YES is the only go ahead. AWS allows no backsies.
  • Go 80–20. Address the low handing fruits before you come up with architectural, code or breaking infrastructure changes.
  • You need not solve every problem — moreover systems scale & as long as the spendings are inline with the business growth, you’re doing great buddy!

Logging/Monitoring

  • Cloudwatch Log Data retention: Make sure you only keep what you need and expire them after. AWS defaults logs to never expire.
  • Opensearch: use lifecycle policies to move data to cheaper storage options (add hot->warm->cold storage lifecycle)
  • Avoid large queries: By default DENY running cloudwatch data queries for all users on a large single index. You will need to add rules to your account/org wide roles.
  • Moving out of a managed solution not only gives more control but also gives you cheaper solutions. For example: spend sometime on developing some expertise on ELK instead of going with AWS Opensearch
  • Standards: pick logging libraries and over time mandate using it across all applications as a best practice
  • Optimising at source — account for runaway conditions and develop better microservices
  • Alarms: If a certain log group or index is growing rapidly, report it.

Storage

  • Do not put everything to autoscale and sleep peacefully. The runaway conditions are just going to choke your budget.
  • Know your peak load and optimise on how much to provision/over-provision. Having a headroom of 20–30% over your peak traffic requirements is okay. Going 2x of peak is stupid.
  • Use GP3 type volumes — cheaper and give higher throughput
  • Cleanup all dangling volumes and public IPs
  • You don’t need all data since inception — know what to backup and for how long
  • Move data from S3 Standard to Glacier using lifecycle policies
  • Cleanup DynamoDb data by employing a TTL(time-to-live) field

Compute

  1. Pick the latest in the family
  2. Use graviton instance types
  3. Switch from OnDemand to Spot
  4. Use instance reservations
  5. Buy savings plans
  6. Use Lambda when it’s an on-demand/scheduled run

Databases

  1. Avoid replicas, snapshots and multi-az deployments of non-prod workloads
  2. Know what to snapshot and how long to keep them
  3. Vaccum for disk space over up-scaling
  4. Try to stay as close to the engine LTS version for most efficient usage
  5. Pick instance types based on your use case — know if you are solving for latency or storage efficiency or something else.

HELP yourself by:

  1. Having monitors in place to inform you of runaway scenarios
  2. Automate cleanup of unused resources (volumes, ec2 instances, public ips etc.)
  3. Tag everything- answers operational questions like who owns what, what to keep, what to clean etc.
  4. Regularly monitor costs (lack of visibility creates bigger bills than bad engineering)
  5. Set a budget with AWS so that anything crossing higher watermarks come to notice
  6. Educate teams about best practices and lead by examples
  7. Have efficient IaC in place and avoid making exceptions

--

--