This is How I Reduced My CloudFront Bills by 80%

If you are using S3 and CloudFront to host your content and noticed that your bills are increasing, read this !

I applied the same advices to reduce my CloudFront usage costs:

Ask Yourself: “What kind of content did I deploy to CloudFront ? How my users are using it ?”

These are some questions, that you should ask. Sometimes, just asking, helps in finding great solutions ..

Are your content should be publicly accessible ? If no, you should absolutely think about setting your content to private.

Serving Private Content Trough CloudFront

This is the case where for example, you need only your mobile phone application or your web application to access your content and in this case, you have 2 choices:

  • signed URLs
  • or signed cookie

Another use case, is serving paid content: Not everybody should access your content, only users who paid for.

This is a simple prototype on how to create a signed URL for a resource:

my_connection = boto.cloudfront.CloudFrontConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
my_distributions = my_connection.get_all_distributions()
my_distribution = my_distributions[0].get_distribution()
my_key_pair_id = "my_key_pair_id"
my_private_key_file = "my_private_key_file.pem"

time_to_expire = int(time.time()) + 60000
my_url = "http://<my_cloudfront_id>.cloudfront.net/my_resource_file"
my_signed_url = my_distribution.create_signed_url(my_url, my_key_pair_id, time_to_expire, private_key_file=my_private_key_file)
print (my_signed_url)

How Long Objects Stay in a CloudFront Edge Cache ?

In many cases, some files on your S3 bucket are not updated at all or updated rarely, this is why you should ask yourself this question: How Long Objects Stay in a CloudFront Edge Cache ?

Adding Headers to Your S3 Objects To Control Cache

Say we have a video in the header of a static website, we will never update it. Why not add these two headers to that object on S3:

In the above example, I added the header cache-control and specified 1 year in seconds, this will cache the concerned object for 1 year in CloudFront cache edges.

Imagine the number of requests, you can save in a year for multiple users and do the same for all the objects that requires the same settings.

You can also choose the expiration date by adding the header: Expires

You need to invalidate your CloudFront cache after modifying metadatas.

More Caching

Some AWS users, including me, use S3 to host static website and CloudFront for two reasons:

  • Perfromance
  • SSL certificate for a website in HTTPS

Well, a good idea is setting a max-age equals to 10 years (more or less, this is not really very important) and when you update your website, you need to invalidate only the new files and objects.

Sync Your Content The Right Way

When you want to synchronize a local directory with a remote S3 buket, you need usually to execute the following command (or its equivalent using the SDK):

aws s3 sync . s3://my_website.com --acl public-read

You may change your local folder by removing or deleting some files, then you will execute the same command to re-sync the bucket. However, your localy deleted files, will never disappear from the remote bucket, unless you execute:

aws s3 sync --delete . s3://my_website.com --acl public-read

The “delete” option, will save your money !

S3: Choosing The Best Naming Strategy = More Performance

While this is directly related to performance and not costs, but in some cases, you may decide to create a CloudFront distribution because you think it will solve your performance problem !

Well, let’s take this example from the official documentation.

This is the first suggestion of how you can store some photos:

examplebucket/2013-26-05-15-00-00/cust1234234/photo1.jpg examplebucket/2013-26-05-15-00-00/cust3857422/photo2.jpg examplebucket/2013-26-05-15-00-00/cust1248473/photo2.jpg examplebucket/2013-26-05-15-00-00/cust8474937/photo2.jpg examplebucket/2013-26-05-15-00-00/cust1248473/photo3.jpg ... examplebucket/2013-26-05-15-00-01/cust1248473/photo4.jpg examplebucket/2013-26-05-15-00-01/cust1248473/photo5.jpg examplebucket/2013-26-05-15-00-01/cust1248473/photo6.jpg examplebucket/2013-26-05-15-00-01/cust1248473/photo7.jpg     ...

And this is a second suggestion:

examplebucket/232a-2013-26-05-15-00-00/cust1234234/photo1.jpg examplebucket/7b54-2013-26-05-15-00-00/cust3857422/photo2.jpg examplebucket/921c-2013-26-05-15-00-00/cust1248473/photo2.jpg examplebucket/ba65-2013-26-05-15-00-00/cust8474937/photo2.jpg examplebucket/8761-2013-26-05-15-00-00/cust1248473/photo3.jpg examplebucket/2e4f-2013-26-05-15-00-01/cust1248473/photo4.jpg examplebucket/9810-2013-26-05-15-00-01/cust1248473/photo5.jpg examplebucket/7e34-2013-26-05-15-00-01/cust1248473/photo6.jpg examplebucket/c34a-2013-26-05-15-00-01/cust1248473/photo7.jpg     ...

Guess in which case, AWS will use more performance to index and use your bucket ?

Of course it’s the first one. As you may see, keys are almost similar and S3 will target a specific partition for a large number of your keys.

examplebucket/2013-26-05-15-00-00/cust1234234/photo1.jpg examplebucket/2013-26-05-15-00-00/cust3857422/photo2.jpg examplebucket/2013-26-05-15-00-00/cust1248473/photo2.jpg examplebucket/2013-26-05-15-00-00/cust8474937/photo2.jpg examplebucket/2013-26-05-15-00-00/cust1248473/photo3.jpg ... examplebucket/2013-26-05-15-00-01/cust1248473/photo4.jpg examplebucket/2013-26-05-15-00-01/cust1248473/photo5.jpg examplebucket/2013-26-05-15-00-01/cust1248473/photo6.jpg examplebucket/2013-26-05-15-00-01/cust1248473/photo7.jpg     ...

At least with the second example, keys are really different and we notice that at the begining of each key, we don’t need to read almost all of the key like in the first case:

examplebucket/232a-2013-26-05-15-00-00/cust1234234/photo1.jpg examplebucket/7b54-2013-26-05-15-00-00/cust3857422/photo2.jpg examplebucket/921c-2013-26-05-15-00-00/cust1248473/photo2.jpg examplebucket/ba65-2013-26-05-15-00-00/cust8474937/photo2.jpg examplebucket/8761-2013-26-05-15-00-00/cust1248473/photo3.jpg examplebucket/2e4f-2013-26-05-15-00-01/cust1248473/photo4.jpg examplebucket/9810-2013-26-05-15-00-01/cust1248473/photo5.jpg examplebucket/7e34-2013-26-05-15-00-01/cust1248473/photo6.jpg examplebucket/c34a-2013-26-05-15-00-01/cust1248473/photo7.jpg     ...
Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored in UTF-8 binary ordering across multiple partitions in the index. The key name dictates which partition the key is stored in.
Using a sequential prefix, such as time stamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. (from the official documentation)

Picking The Right Region

Choosing the right region may ameliorate your performance and reduce your costs. Take a look at the pricing table of AWS, check the price of data in/out, price per requests, in function of from where your users are using your content, you may find the best combination.

Do You Need HTTPS ?

In most cases, I would say yes but whenever you don’t really need it, use HTTP. We all know that HTTPS is safer, more standardized and everybody will sooner or later use SSL over HTTP, but according to the AWS pricing table, the price per 10,000 requests is cheaper for HTTP.

  • 0.0075 USD for 10,000 request over HTTP
  • 0.0100 USD for 10,000 request over HTTPS

Let’s say that when a user calls CloudFront, your application makes 500 requests. Say also you have 50000 users daily.

You will pay this amount for a year:

HTTP: (500/10000) * 0.0075 * 50000 * 30 * 12 = 6750 USD
HTTPS: (500/10000) * 0.01 * 50000 * 30 * 12 = 9000 USD
9000 - 6750 = 2250

You save more than 2000 dollars yearly.

Security

When some “script kiddies” get access to your credentials, they can use your resources without any limit on billings, so make sure to secure your account and credentials.

Delete Your Root Account

One of the best practices of securing access to your resources, is deleting the root user. Root is the user who have an administrator access. Don’t generate credentials with administrator access and try to refine your users’ rights and access to AWS resources.

Activate the MFA for your AWS account.

You have many choices:

  • MFA Form Factors
  • Virtual MFA Device
  • Hardware Key Fob MFA Device
  • Hardware Display Card MFA Device
  • SMS MFA Device (Preview)
  • Hardware Key Fob
  • MFA Device for

Rotate your AWS keys.

When you change your access keys (access key ID and secret access key) on a regular schedule, you will shorten the period an access key is active and therefore reduce the impact on your billing if they are compromised.

Using AWS CLI, list the id of a given user:

aws iam list-access-keys --user-name user

{
"AccessKeyMetadata": [
{
"UserName": "user",
"AccessKeyId": "BBBCCCCDDDDEEEE",
"Status": "Active",
"CreateDate": "2018-05-31T23:07:29Z"
}
]
}

Create a new key:

aws iam create-access-key --user-name user
{
"AccessKey": {
"UserName": "user",
"AccessKeyId": "FFFFGGGGHHHHIIII",
"Status": "Active",
"SecretAccessKey": "xxxxx/xxxxxx/xxxxx",
"CreateDate": "2018-06-05T20:07:05.344Z"
}
}

Now you should re-type the first command, you will notice that you have two keys:

aws iam list-access-keys --user-name user
{

"AccessKeyMetadata": [
{
"UserName": "user",
"Status": "Active",
"CreateDate": "2013-04-03T18:49:57Z",
"AccessKeyId": "BBBCCCCDDDDEEEE"
},
{
"UserName": "user",
"Status": "Active",
"CreateDate": "2013-09-06T17:09:10.384Z",
"AccessKeyId": "FFFFGGGGHHHHIIII"
}
]
}

Disable the old one:

aws iam update-access-key --access-key-id BBBCCCCDDDDEEEE --status Inactive --user-name user

Delete it:

aws iam delete-access-key --access-key-id BBBCCCCDDDDEEEE --user-name user

Geographic Restrictions

Another option to reduce costs that is not widely used but could be useful in some cases, is restricting the usage of your CloudFront files to certain countries. You have the choice to whitelist or blacklist a list of countries.

The Problem of Hotlinking and Controlling the Access to Your Files Using AWS WAF

When you host a static website using S3 and CloudFront or when you setup an CloudFront distriubtion to be consumed by your Wordpress blog or any other web app, your static files like images and videos, are invisible to public using URLs like:

https://static.your-website.com/assets/streaming/video.mp4

(where static.your-website.cm is the URL of your CloudFront CDN)

Your concurrent or any other user, may use the same videos on their website and you will pay. This can be done in a way as simple as adding the following code to their websites:

<video width="320" height="240" controls>
<source src="
https://static.your-website.com/assets/streaming/video.mp4" type="video/mp4">
</video>

The problem with this, is not just your content being reused, but the costs that could beget .. You will pay every request while non was made by you or your customers.

Using AWS Web ACLs, you can create conditions and rules to protect your CDN from being hotlinked.

This part need two or three lectures, but it is more detailed in Practical AWS training.

Analyzing your Bucket Usage

A good way to analyze who are using your S3 files, is activating logs.
Start by creating a bucket for logging:

aws s3 mb s3://logs-zae45z4a5e4zr

Go to your AWS console, choose the bucket you want to log access to, and choose the destination bucket for logs (in our case logs-zae45z4a5e4zr )

Analyzing Your CloudFront Logs

Activating your CloudFront usage logs is not only good for marketing, but can also help you to identify and understand how and from where your objects are being consumed.

Logging is deactivated by default, you need to activate it.

Wait for a day for example, then download your logs from the bucket to analyze them. You can use tools like ELK or AWS Quicksilver..

This is the format of a log line:

#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields
2018-06-05 21:46:22 FRA50 33836 51.18.131.134 GET xxxxx.cloudfront.net /learning.jpeg 200 - Mozilla/5.0%2520(compatible;%2520) - - Hit skU3yiThb71WWW7HENzD9WB5Fzhn_gn-NNVpi4F6VV9uF_mpr7xOuw== website.com https 240 0.007 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/1.1 - -

These logs are useful to identify information like:

  • date
  • time
  • x-edge-location
  • ip
  • method (get/post..)
  • Host
  • Referer
  • User-Agent
  • URI
  • Cookie

And AWS headers like:

  • x-edge-result-type
  • x-edge-request-id
  • x-host-header
  • x-forwarded-for
  • x-edge-response-result-type

As well as other miscellaneous information like:

  • ssl-protocol
  • ssl-cipher
  • Protocol version
  • fle-status
  • fle-encrypted-fields

A Quick Way to Download Your S3/CF Logs

If you followed the steps above, you can download your S3 logs to your localhost using:

aws s3 cp --recursive s3://logs-zafdf54sdfsdr/ .

Print them all to the same file while respecting the date of creation:

cat $(ls -tr) > logs.txt

Using Compression

The cost of CloudFront data transfer is calculated in function of the total amount of data served so serving compressed files is less expensive than serving uncompressed files.

Some files need to be absolutely compressed, usually files like JS scripts and CSS. Compressing these files not only reduce your costs but enhance the user experience since your website or web app will render faster.

Using CloudFront dashboard, you can activate compression in your distribution:

This will tell CloudFront to compress all files in function of the header:

Content-Type

If the latter corresponds to one of these values, the file is compressed:

application/eot
application/x-otf
application/font
application/x-perl
application/font-sfnt
application/x-ttf
application/javascript
application/json
application/opentype
application/otf
application/opentype
application/pkcs7-mime
application/truetype
application/ttf
application/vnd.ms-fontobject
application/xhtml+xml
application/xml
application/xml+rss
application/x-font-opentype
application/x-font-truetype
application/x-mpegurl
application/x-javascript
application/x-opentype
application/x-httpd-cgi
application/x-font-ttf
font/eot
font/ttf
font/otf
font/opentype
image/svg+xml
text/css
text/csv
text/html
text/javascript
text/js
text/plain
text/richtext
text/tab-separated-values
text/xml
text/x-script
text/x-component
text/x-java-source

Compression is not done when the user request does not include the header:

Accept-Encoding: gzip

Only files that have a size be between 1,000 and 10,000,000 bytes are compressed.

When you use S3 or any other custom origin, CloudFront must get the response must include a Content-Length header so it can determine whether the size of the file is in the range that CloudFront compresses.

If the Content-Length header is missing, CloudFront won't compress the file.

If you are using S3, go to CORS configurations and add the header to the list of allowed headers.

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<MaxAgeSeconds>3000</MaxAgeSeconds>
<AllowedHeader>Authorization</AllowedHeader>
<AllowedHeader>Content-Length</AllowedHeader>
</CORSRule>
</CORSConfiguration>

The response must not include a Content-Encoding header.

You can test if your compression works using CURL:

curl -H “Accept-Encoding: gzip” -I http://<your_resource>

X-Cache (Hit vs Miss), ETag and Headers

When I troubleshoot my CloudFront distribution, I don’t use many tools, only CURL was enough to understand and reseolve some problems.

However, you need to understand the different headers functions, what they should reply with and your expected replies.

For instance, the X-Cache header that corresponds to the result whether the CDN has served the result or the origin served it instead, could be helpful.

When X-Cache replies with HIT, it means that you are being served from the CloudFront distribution and when it is MISS, it means that CloudFront used S3 (and not its edges) to server you the requested file.

The ETag header could help in debugging and troubelshooting since it identifies a specific version of a resource. This is the definition given by Mozilla:

The ETag HTTP response header is an identifier for a specific version of a resource. It allows caches to be more efficient, and saves bandwidth, as a web server does not need to send a full response if the content has not changed. On the other side, if the content has changed, etags are useful to help prevent simultaneous updates of a resource from overwriting each other ("mid-air collisions").

“Via”, is another header that CF uses and it could be also helpful.

The Via general header is added by proxies, both forward and reverse proxies, and can appear in the request headers and the response headers. It is used for tracking message forwards, avoiding request loops, and identifying the protocol capabilities of senders along the request/response chain.

Note that some problems and optimizations need a good understanding of HTTP headers. AWS overrides the headers thay you don’t configure by some default values that you can find here.

The Origin Access Identity

Using an Origin Access Identity allows you to restrict access to your Amazon S3 content. When you create a CloudFront web distribution, you should create an “Origin”. This is the step in which you should configure the restriction.

But before that, you need to create the identity as it is shown in the next screenshot:

You can also do this at the moment of the creation of your distribution. This is an example:

In both cases, make sure that you activated the Bucket access restriction, create or use an existing identity, then you can set up AWS to update your bucket automatically.

To require that users always access your Amazon S3 content using CloudFront URLs, you assign a special CloudFront user — an origin access identity — to your origin. You can either create a new origin access identity or reuse an existing one (Reusing an existing identity is recommended for the common use case).

Some Quick Tips

  • Delete unused files: You pay for bandwidth sure but you also pay for storage
  • Use S3 lifecyle feature when needed
  • Clean your incomplete multipart uploads
  • Compress data before sending them to S3 (CSS, HTML, JS ..)
  • Set up a billing alert
  • Use CloudFront monitoring dashboard, even if they are not the greatest monitoring tool I used, but they could be helpful

Connect Deeper

If you resonated with this article, please subscribe to our newsletters:

  • DevOpsLinks : An Online Community Of Thousands Of IT Experts & DevOps Enthusiast From All Over The World.
  • Shipped: An Independent Newsletter Focused On Serverless, Containers, FaaS & Other Interesting Stuff
  • Kaptain: A Kubernetes Community Hub, Hand Curated Newsletter, Team Chat, Training & More

You can find me on Twitter and you can also check my books and courses: SaltStack For DevOps,The Jumpstart Up, Painless Docker & Practical AWS.

Don’t forget to join Jobs For DevOps and submit your CV.

… and don’t forget to check my Practical AWS Course !