Cache control with Lambda@Edge

János Krnák

Published in

uniplacesgeeks

7 min readNov 22, 2018

TLDR: check out the github repository https://github.com/jkrnak/serverless-lambda-at-edge

If you are familiar with how CDNs work feel free to jump to Enters Lambda@Edge.

About CDNs

CDNs (content delivery networks) are solving the problem of serving content to your end users fast. This is achieved by serving it from an edge node close to the user. Setting up a CDN for your assets (CSS, JS, photos, SVGs, etc.) isn’t a silver bullet on its own, but it can dramatically speed up the load time of your site and if caching is used it can decrease the used bandwidth from the origin.

Terminology

I mentioned two terms that we need to clarify before we move on: origin and edge node. The origin is the source location from where the CDN is loading the asset when it’s not already in the network. The edge node is a web server, a proxy that is delivering the asset to the user. In a CDN edge nodes are distributed around the globe. When users from the East Coast of the US are requesting the CSS of your page, they will get served from a node near to them and when someone from Germany is loading your page, they will connect to a server in Europe maybe even in Germany.

What’s the point of a CDN if it connects to the origin to fetch an asset?

A content delivery network might have their own backbone, dedicated lines thus it could serve the content faster than it would via the end user’s route, but the main advantage comes when the content is actually cached on the edge node, so there is no need to go to the origin server and load the content, it can be served immediately.

How edge nodes cache content?

I will limit this post to AWS CloudFront, but I’m sure other CDN providers have very similar setup. By default CloudFront caches objects for 24 hours on the edge nodes, but this behaviour can be modified in two ways:

change the default TTL for all objects
use cache-control max-age, s-maxage or expires headers on the objects from origin

Change the default TTL or use cache headers (screenshot from AWS console)

You can read more about how to set this up in the AWS documentation.

Changing the default TTL is easy but it applies to all objects, you lose some flexibility and you also don’t take advantage of client side caching. Unless you tell the client that it can cache the object until a certain time (expires header) or for a given period (max-age, s-maxage) the client will go the the CDN and ask for the object. The CDN might reply with a 304 (Not changed) response or it will give you the asset. (The decision is done by comparing the E-Tag header but that’s out of scope for this post).

I sense you as well thinking what I’m thinking, that round trip to just go and figure out that the logo of your page didn’t change doesn’t make much sense. Especially if you have a lot of assets and we are doing 10, 50 or more of these trips, all of this time is wasted on telling us what we could have already know, that none of these assets have changed. Luckily the solution is easy, add cache-control max-age or s-maxage header or expires header and the client will cache the content. When your assets have any of these headers it tells the browser that it can cache these files. Next time the user loads your page the browser will see that it has already cached some of these files and it won’t go and ask for them. (I should mention immutable assets here, this is again out of scope for this post, I might write about it in the future, until then, please google for it.)

How can I control the cache headers?

This is all nice, you can set these headers on your origin by either your web server, your web application etc. As a result your objects will be cached on the CDN.

What if your origin is S3? How do you set the headers then? You can add meta tags to your objects for example cache-control max-age headers and S3 will respond with those headers to the CDN. But once you have millions of objects settings these headers are not easy. S3 doesn’t allow modifying headers, you need to copy the object onto itself with the new headers, that’s a lot of operations for millions of files. Another scenario could be that you want different headers for clients from different continents, or different browsers etc. How can you do that?

Enters Lambda@Edge

AWS gives you a very nice serverless solution called Lambda. Serverless means that you don’t have to bother with provisioning EC2 instances, docker containers etc., you just write your function in a language that Lambda supports and it runs your code, it versions your code and it scales your code.

Lambda@Edge is a version of Lambda that can be run on edge nodes on CloudFront. It has a different pricing model compared to Lambda (no free tier). The current limitation is that you can only use Node.JS and that you need to deploy your function in N. Virginia (us-east-1) from there the function can be distributed to CloudFront.

In your function you can modify the request and response objects, headers and even the content. For example you can add the cache-control headers. (I know, very imaginative).

There are four different events that can trigger your function to launch.

viewer-request when CloudFront receives a request from a viewer
origin-request before CloudFront forwards a request to the origin
origin-response when CloudFront receives a response from the origin
viewer-response before CloudFront returns the response to the viewer

This is probably enough preface (it actually got a bit too long) now let’s see a use case and code!

Example

Use case: add cache-control header to object which doesn’t already have one. Do this on origin response this way we can cache the objects on the edge nodes when origin responds and we don’t need to execute our functions as many times as we would if we trigger the function on viewer response.

For the following example I will use the serverless framework. It’s a tool that helps deploying and versioning serverless functions for AWS, Google Cloud, Azure and a few other providers.

Our function. It’s pretty simple, isn’t it?

To learn more about the event structure please have a look at the documentation.

Set up the serverless project

To set up the project, you’ll need to install serverless first. Let’s assume that we have it installed. Now create the project:

serverless create --template aws-nodejs --path lambda-at-edge --name "LambdaAtEdge"

This will create us a handler.js and a serverless.yml file. Let’s replace the content of the yaml configuration with the following:

Now you can run the following command with the profile name for your AWS account.

serverless deploy --aws-profile=my-profile-name

If you don’t want to copy paste, you can clone the project from github.

jkrnak/serverless-lambda-at-edge

Example serverless project for AWS Lambda@Edge that adds Cache-Control headers to origin objects that doesn't already…

github.com

Set up the CloudFront trigger

We are not done yet. We have our function deployed, but it’s not being triggered, we still don’t see the cache-control headers. We now need to deploy it to Lambda@Edge. The following two screenshots will help you how to deploy it.

On the Lambda function’s screen click on the Actions dropdown and select Deploy to Lambda@Edge.

On the second screen select the CloudFront distribution, the cache behaviour and the CloudFront event type. For the event type please select origin-response.

On the setup screen select your distribution, which behaviour and the event type

Now you will need to wait until the CloudFront distribution updates. After it finished updating make a request to a resource on your distribution and check if the cache-control header is set.

# curl -I https://mycdn-distribution.com/deploy.png
HTTP/2 200
content-type: image/png
content-length: 52160
date: Wed, 21 Nov 2018 17:40:08 GMT
last-modified: Wed, 21 Nov 2018 16:24:10 GMT
etag: "229ea20dd1d9ecb417b784cb746040c5"
accept-ranges: bytes
server: AmazonS3
cache-control: public, max-age=1209600
x-cache: Miss from cloudfront

In the example above the function successfully set the cache-control header.

Troubleshooting

If for some reason the header is not set you should check the CloudWatch logs for the function. Note that the logs will be written to a region that is close to the CDN edge node that is serving you. For example even though the Lambda is in us-east-1 it is now deployed throughout the distribution and if you are in France for example the logs of the edge function will be written to the Paris region.

Final thoughts

This is just a very basic idea and for some this might even be overkill. If you don’t have too many items in your S3 bucket or if you can control the meta tags on the objects it might be cheaper for you on the long run to just add the cache-control header on the S3 objects.

However if you want finer control, if you want to set different headers or serve different content based on other headers like accept-language or user-agent Lambda@Edge gives you great flexibility.

It looks like there is a plugin for the serverless framework for Lambda@Edge. I didn’t check it out but looks promising, it would help you set up everything from serverless without the need to click around on the console. https://github.com/silvermine/serverless-plugin-cloudfront-lambda-edge

This post got much longer than I expected it but I hope it will help you understand how to set up a Lambda@Edge function.