AWS S3 billing analysis and cost optimization — Deep dive

Rudolf Ratusiński
7 min readJul 16, 2019

--

AWS S3 seems to be very cheap, almost negligible from a financial point of view when we look at its pricing charts: few cents per GB of stored data, less than a cent for a million of requests, fraction of a dollar for data transfer.
I’ve even heard people saying, “It’s almost free, put everything on S3.” Well this is obviously a mind trap and a well-thought-out AWS marketing approach to show prices scaled like that. However, AWS cannot be blamed as S3 use cases are endless. In the end, the responsibility is ours — developers and architects - to use it wisely.

If you have fallen into that trap, you could make poor architectural decisions. Well-done business scales almost exponentially, and so will consequences of those decisions. When that happens and you spot monthly bills growing from a few hundred to a few thousand dollars just within a couple of months, it’s time to brew a cup of coffee and open the AWS Console.

AWS Cost Explorer introduction

The default view is a very basic indicator to us that the bills are getting higher than usual, unless we choose the “Group by” option. Most relevant groupings for S3 are:

  • Usage Type:
  • API Operation:

Usage Type and API Operation are perfect to get a basic understanding of what is happening. In the examples above, the most expensive are GET requests (API Operation) transferring data to internet (Usage Type). In cases where there are many buckets, it may not be enough. To narrow down which bucket(s) are costlier than others, a simple but powerful functionality has to come into play — Tags!

Per bucket billing data — Tags

Tags in the Cost Explorer refer to Cost Allocation Tags. The tags need to be assigned to the buckets first so as to enable them, thereafter they are able to be used. It’s a simple process that can be done in the “Properties” tab of each bucket.

In our example, I have decided to use tags Name, Purpose, and Team for each bucket. For “Name”, I simply used bucket name. As a “Purpose”, I chose from Logging, Documents, Web, and Backups. As for “Team”, I had the option of setting it as DevOps or Frontend.

After setting the tags to all the buckets, it’s time to enable them in the Billing service, Cost allocation tags section:

There are two types of Cost Allocation Tags. AWS-generated and User-Defined. It’s always a good idea to enable both and tag the resources as soon as they are created. Newly added bucket tags will show up automatically on the list. Locate them, select and click “Activate”.

They will only become available in the Cost Explorer next day as its billing data are, unfortunately, delayed by long hours. It would be possible to group S3 costs by any previously created tag the next day. For example Purpose:

As you can see, applying tags is not retrospective. Thus, it’s worth to grab a pencil and a piece of paper and plan upfront the tag names and values that would suit your particular S3 use case best, while taking the business roadmap into consideration.

Per prefix (directory) billing data — Metrics

For businesses starting their journey with AWS S3, it might be tempting to use one single bucket, with the name of the app to keep all the files there — documents, images and logs. There are no pros of such approach, other than saving a few minutes on creating separate buckets and making your app work with them. Cons? Well, let’s list some:

  • Almost all AWS functionalities available for S3 are per bucket. Things like cross region replication, object-level logging, server access logging, versioning, encryption and many more. You may want to enable versioning for your documents, but not for your logs. You may log server access to your images, but not for the documents. This separation becomes impossible, when everything is in a single bucket.
  • Any Access Control Lists, IAM policies or scripts performing operations on data need to be crafted with special care so as to not affect data integrity or availability of the objects inside other directories. Making a typo in ACL or simply forgetting about setting proper boundaries in the code can have catastrophic consequences.
  • It’s much easier to manage buckets, than to remember and control prefixes
  • And.. it requires more than just tags to find out which object category is used the most.

Solution to that are Metrics in the bucket configuration.

Assuming that example-bucket-with-different-data bucket have inside two directories (logs, files) and the files directory contains another two directories (images, documents), three filters have been created:

  • Logs for prefix logs
  • Images for prefix files/images
  • Documents for prefix files/documents

Creating filters is as easy as clicking “+ Add” link next to “Filters” checkbox and specifying them one by one for each prefix. It is also possible to create filters for object tags, but it’s outside of the scope of this article. It may take a few hours for Metrics to show the filtered data.

Metrics panel provides simple metrics visualization charts out of the box…

… but the real power lies in CloudWatch, which gives whole a spectrum of options to aggregate, visualize and monitor those metrics.

Now it’s clearly visible, that objects with prefix files/images, are accessed much more often, than any other objects, using GET requests. But it’s not the GET request themselves that are expensive. 200k GET requests daily would cost only around $2/month (assuming $0.0004/1000 GET requests).

Let’s look at the transfer OUT of S3 to the internet:

Average of 170GB daily gives ~ 5.1TB monthly, cost around $459/month (assuming $0.09/GB). Bingo!

Optimizing S3 cost

There are many ways of optimizing S3 costs, depending on the use case.

Some of them are:

  • If you’re using S3 inside AWS, eg. your application working on EC2 is downloading, uploading, copying objects etc., then S3 VPC Endpoint should definitely be enabled. All traffic between the VPC and S3 will use internal AWS network, instead of going outside through the internet. No S3 outbound traffic costs will be incurred and all operations will speed up as a bonus.
  • If you’re using S3 outside AWS, eg. as source of images for your website, then the CloudFront distribution is highly recommended. Requests made to CloudFront are way cheaper than those made directly to S3 (for objects accessed multiple times, like website images). It gives you full control over caching objects, SSL, better access logs and WAF support for much advanced requests filtering, than S3 ACL provides.
  • In almost all cases, you can save money by using Lifecycle Rules on a bucket, which will delete files older than n days (useful for automatically generated data. Eg. daily database dumps), change their storage class to much cheaper in storage, Infrequent Access, or even Glacier class.

Bucket Server Access Logs / Object Level Logging

Ultimately, there are two functionalities that can provide insights to every single object in the bucket down to the granular details.

  • Server Access Logs, to log all activity on objects performed via HTTP protocol.
  • Object Level Logging to log even more, including all update, delete, copy operations and metadata changes (tags, ACLs etc)

But those topics are broad enough to deserve their own article.

Thank you for reading and click Clap below if you liked it!

Please let me know in the comments if you have any feedback or questions.

--

--