This is How I Reduced My CloudFront Bills by 80%
If you are using S3 and CloudFront to host your content and noticed that your bills are increasing, read this!
I applied the same pieces of advice to reduce my CloudFront usage costs:
Ask Yourself: “What kind of content did I deploy to CloudFront ? How my users are using it ?”
These are some questions, that you should ask. Sometimes, just asking, helps in finding great solutions ..
Are your content should be publicly accessible ? If no, you should absolutely think about setting your content to private.
Serving Private Content Trough CloudFront
This is the case where for example, you need only your mobile phone application or your web application to access your content and in this case, you have 2 choices:
- signed URLs
- or signed cookie
Another use case, is serving paid content: Not everybody should access your content, only users who paid for.
This is a simple prototype on how to create a signed URL for a resource:
my_connection = boto.cloudfront.CloudFrontConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
my_distributions = my_connection.get_all_distributions()
my_distribution = my_distributions.get_distribution()
my_key_pair_id = "my_key_pair_id"
my_private_key_file = "my_private_key_file.pem"
time_to_expire = int(time.time()) + 60000
my_url = "http://<my_cloudfront_id>.cloudfront.net/my_resource_file"
my_signed_url = my_distribution.create_signed_url(my_url, my_key_pair_id, time_to_expire, private_key_file=my_private_key_file)
How Long Objects Stay in a CloudFront Edge Cache ?
In many cases, some files on your S3 bucket are not updated at all or updated rarely, this is why you should ask yourself this question: How Long Objects Stay in a CloudFront Edge Cache ?
Adding Headers to Your S3 Objects To Control Cache
Say we have a video in the header of a static website, we will never update it. Why not add these two headers to that object on S3:
In the above example, I added the header cache-control and specified 1 year in seconds, this will cache the concerned object for 1 year in CloudFront cache edges.
Imagine the number of requests, you can save in a year for multiple users and do the same for all the objects that requires the same settings.
You can also choose the expiration date by adding the header: Expires
You need to invalidate your CloudFront cache after modifying metadatas.
Some AWS users, including me, use S3 to host static website and CloudFront for two reasons:
- SSL certificate for a website in HTTPS
Well, a good idea is setting a max-age equals to 10 years (more or less, this is not really very important) and when you update your website, you need to invalidate only the new files and objects.
Sync Your Content The Right Way
When you want to synchronize a local directory with a remote S3 buket, you need usually to execute the following command (or its equivalent using the SDK):
aws s3 sync . s3://my_website.com --acl public-read
You may change your local folder by removing or deleting some files, then you will execute the same command to re-sync the bucket. However, your localy deleted files, will never disappear from the remote bucket, unless you execute:
aws s3 sync --delete . s3://my_website.com --acl public-read
The “delete” option, will save your money !
S3: Choosing The Best Naming Strategy = More Performance
While this is directly related to performance and not costs, but in some cases, you may decide to create a CloudFront distribution because you think it will solve your performance problem !
Well, let’s take this example from the official documentation.
This is the first suggestion of how you can store some photos:
2013-26-05-15-00-00/cust1248473/photo3.jpg ... examplebucket/
And this is a second suggestion:
Guess in which case, AWS will use more performance to index and use your bucket ?
Of course it’s the first one. As you may see, keys are almost similar and S3 will target a specific partition for a large number of your keys.
2013-26-05-15-00-00/cust1248473/photo3.jpg ... examplebucket/
At least with the second example, keys are really different and we notice that at the begining of each key, we don’t need to read almost all of the key like in the first case:
Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored in UTF-8 binary ordering across multiple partitions in the index. The key name dictates which partition the key is stored in.
Using a sequential prefix, such as time stamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. (from the official documentation)
Picking The Right Region
Choosing the right region may ameliorate your performance and reduce your costs. Take a look at the pricing table of AWS, check the price of data in/out, price per requests, in function of from where your users are using your content, you may find the best combination.
Do You Need HTTPS ?
In most cases, I would say yes but whenever you don’t really need it, use HTTP. We all know that HTTPS is safer, more standardized and everybody will sooner or later use SSL over HTTP, but according to the AWS pricing table, the price per 10,000 requests is cheaper for HTTP.
- 0.0075 USD for 10,000 request over HTTP
- 0.0100 USD for 10,000 request over HTTPS
Let’s say that when a user calls CloudFront, your application makes 500 requests. Say also you have 50000 users daily.
You will pay this amount for a year:
HTTP: (500/10000) * 0.0075 * 50000 * 30 * 12 = 6750 USD
HTTPS: (500/10000) * 0.01 * 50000 * 30 * 12 = 9000 USD
9000 - 6750 = 2250
You save more than 2000 dollars yearly.
When some “script kiddies” get access to your credentials, they can use your resources without any limit on billings, so make sure to secure your account and credentials.
Delete Your Root Account
One of the best practices of securing access to your resources, is deleting the root user. Root is the user who have an administrator access. Don’t generate credentials with administrator access and try to refine your users’ rights and access to AWS resources.
Activate the MFA for your AWS account.
You have many choices:
- MFA Form Factors
- Virtual MFA Device
- Hardware Key Fob MFA Device
- Hardware Display Card MFA Device
- SMS MFA Device (Preview)
- Hardware Key Fob
- MFA Device for
Rotate your AWS keys.
When you change your access keys (access key ID and secret access key) on a regular schedule, you will shorten the period an access key is active and therefore reduce the impact on your billing if they are compromised.
Using AWS CLI, list the id of a given user:
aws iam list-access-keys --user-name user
Create a new key:
aws iam create-access-key --user-name user
Now you should re-type the first command, you will notice that you have two keys:
aws iam list-access-keys --user-name user
Disable the old one:
aws iam update-access-key --access-key-id BBBCCCCDDDDEEEE --status Inactive --user-name user
aws iam delete-access-key --access-key-id BBBCCCCDDDDEEEE --user-name user
Another option to reduce costs that is not widely used but could be useful in some cases, is restricting the usage of your CloudFront files to certain countries. You have the choice to whitelist or blacklist a list of countries.
The Problem of Hotlinking and Controlling the Access to Your Files Using AWS WAF
When you host a static website using S3 and CloudFront or when you setup an CloudFront distriubtion to be consumed by your Wordpress blog or any other web app, your static files like images and videos, are invisible to public using URLs like:
(where static.your-website.cm is the URL of your CloudFront CDN)
Your concurrent or any other user, may use the same videos on their website and you will pay. This can be done in a way as simple as adding the following code to their websites:
<video width="320" height="240" controls>https://static.your-website.com/assets/streaming/video.mp4
The problem with this, is not just your content being reused, but the costs that could beget .. You will pay every request while non was made by you or your customers.
Using AWS Web ACLs, you can create conditions and rules to protect your CDN from being hotlinked.
This part need two or three lectures, but it is more detailed in Practical AWS training.
Analyzing your Bucket Usage
A good way to analyze who are using your S3 files, is activating logs.
Start by creating a bucket for logging:
aws s3 mb s3://logs-zae45z4a5e4zr
Go to your AWS console, choose the bucket you want to log access to, and choose the destination bucket for logs (in our case logs-zae45z4a5e4zr )
Analyzing Your CloudFront Logs
Activating your CloudFront usage logs is not only good for marketing, but can also help you to identify and understand how and from where your objects are being consumed.
Logging is deactivated by default, you need to activate it.
Wait for a day for example, then download your logs from the bucket to analyze them. You can use tools like ELK or AWS Quicksilver..
This is the format of a log line:
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields
2018-06-05 21:46:22 FRA50 33836 22.214.171.124 GET xxxxx.cloudfront.net /learning.jpeg 200 - Mozilla/5.0%2520(compatible;%2520) - - Hit skU3yiThb71WWW7HENzD9WB5Fzhn_gn-NNVpi4F6VV9uF_mpr7xOuw== website.com https 240 0.007 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/1.1 - -
These logs are useful to identify information like:
- method (get/post..)
And AWS headers like:
As well as other miscellaneous information like:
- Protocol version
A Quick Way to Download Your S3/CF Logs
If you followed the steps above, you can download your S3 logs to your localhost using:
aws s3 cp --recursive s3://logs-zafdf54sdfsdr/ .
Print them all to the same file while respecting the date of creation:
cat $(ls -tr) > logs.txt
The cost of CloudFront data transfer is calculated in function of the total amount of data served so serving compressed files is less expensive than serving uncompressed files.
Some files need to be absolutely compressed, usually files like JS scripts and CSS. Compressing these files not only reduce your costs but enhance the user experience since your website or web app will render faster.
Using CloudFront dashboard, you can activate compression in your distribution:
This will tell CloudFront to compress all files in function of the header:
If the latter corresponds to one of these values, the file is compressed:
Compression is not done when the user request does not include the header:
Only files that have a size be between 1,000 and 10,000,000 bytes are compressed.
When you use S3 or any other custom origin, CloudFront must get the response must include a
Content-Length header so it can determine whether the size of the file is in the range that CloudFront compresses.
Content-Lengthheader is missing, CloudFront won't compress the file.
If you are using S3, go to CORS configurations and add the header to the list of allowed headers.
<?xml version="1.0" encoding="UTF-8"?>
The response must not include a
You can test if your compression works using CURL:
curl -H “Accept-Encoding: gzip” -I http://<your_resource>
X-Cache (Hit vs Miss), ETag and Headers
When I troubleshoot my CloudFront distribution, I don’t use many tools, only CURL was enough to understand and reseolve some problems.
However, you need to understand the different headers functions, what they should reply with and your expected replies.
For instance, the X-Cache header that corresponds to the result whether the CDN has served the result or the origin served it instead, could be helpful.
When X-Cache replies with HIT, it means that you are being served from the CloudFront distribution and when it is MISS, it means that CloudFront used S3 (and not its edges) to server you the requested file.
The ETag header could help in debugging and troubelshooting since it identifies a specific version of a resource. This is the definition given by Mozilla:
ETagHTTP response header is an identifier for a specific version of a resource. It allows caches to be more efficient, and saves bandwidth, as a web server does not need to send a full response if the content has not changed. On the other side, if the content has changed, etags are useful to help prevent simultaneous updates of a resource from overwriting each other ("mid-air collisions").
“Via”, is another header that CF uses and it could be also helpful.
Viageneral header is added by proxies, both forward and reverse proxies, and can appear in the request headers and the response headers. It is used for tracking message forwards, avoiding request loops, and identifying the protocol capabilities of senders along the request/response chain.
Note that some problems and optimizations need a good understanding of HTTP headers. AWS overrides the headers thay you don’t configure by some default values that you can find here.
The Origin Access Identity
Using an Origin Access Identity allows you to restrict access to your Amazon S3 content. When you create a CloudFront web distribution, you should create an “Origin”. This is the step in which you should configure the restriction.
But before that, you need to create the identity as it is shown in the next screenshot:
You can also do this at the moment of the creation of your distribution. This is an example:
In both cases, make sure that you activated the Bucket access restriction, create or use an existing identity, then you can set up AWS to update your bucket automatically.
To require that users always access your Amazon S3 content using CloudFront URLs, you assign a special CloudFront user — an origin access identity — to your origin. You can either create a new origin access identity or reuse an existing one (Reusing an existing identity is recommended for the common use case).
Some Quick Tips
- Delete unused files: You pay for bandwidth sure but you also pay for storage
- Use S3 lifecyle feature when needed
- Clean your incomplete multipart uploads
- Compress data before sending them to S3 (CSS, HTML, JS ..)
- Set up a billing alert
- Use CloudFront monitoring dashboard, even if they are not the greatest monitoring tool I used, but they could be helpful
If you resonated with this article, please subscribe to our newsletters:
- DevOpsLinks : An Online Community Of Thousands Of IT Experts & DevOps Enthusiast From All Over The World.
- Shipped: An Independent Newsletter Focused On Serverless, Containers, FaaS & Other Interesting Stuff
- Kaptain: A Kubernetes Community Hub, Hand Curated Newsletter, Team Chat, Training & More
Don’t forget to join Jobs For DevOps and submit your CV.
… and don’t forget to check my Practical AWS Course !