Default files in S3 subdirectories using CloudFront and Lambda@Edge

Chris Pointon
3 min readApr 17, 2018

--

Why did I feel the need to write this? Didn’t AWS solve this on their blog already? Well, not quite.

To briefly recap (this is well explained in the AWS blog), if you’re serving a static website using CloudFront as a CDN in front of an S3 bucket, CloudFront can handle a default file at the root of the site…

http://mysite.com/ 
quietly returns actual contents of
http://mysite.com/index.html

…but not in a subdirectory

http://mysite.com/foo/ -> 404!
even if http://mysite.com/foo/index.html exists

The code snippet in the AWS blog solves this problem by using Lambda@Edge (a Lambda function that runs in the CloudFront CDN) to take any request for a URL ending in / and add index.html to it for the URL that’s actually fetched from the underlying S3 bucket.

This works just fine except real people on real browsers aren’t very good at remembering to add that trailing /. We’ll type https://mysite.com/foo not https://mysite.com/foo/

Most webservers deal with this by quietly 301-redirecting the non-slash-ending version to the slash-ending version using platforms like ModRewrite in Apache (I’ll ignore the holy war over whether adding slashes or removing them is “right”).

We also need to remember to include any querystring parameters in the redirect so that https://mysite.com/foo?utm_source=blog properly ends up at https://mysite.com/foo/?utm_source=blog. We don’t want to lose our client-side link tracking!

To achieve this, we need to add another Lambda@Edge function to get everything working properly. Why don’t we just update the AWS one? Well, the AWS one is called after CloudFront has checked for cache hits, but before it requests the file from S3. This is an origin-request trigger.

By default, CloudFront ignores querystring parameters when it decides whether to return cached content or origin content from a request. This means a request for https://mysite.com/foo/?utm_source=blog will be treated as a cache hit forhttps://mysite.com/foo/ and the content will be returned minus querystring, without the origin-request trigger ever being fired.

Instead we need to catch the request before CloudFront checks for cache hits. This trigger for this is called viewer-request. This is called for any request that the CloudFront distribution receives. You can register a Lambda with the slash-adding redirect to do this.

So… here’s a Lambda@Edge script that handles the trailing slash and parameter-retaining part of the job. You can register this using the same steps as the AWS post, but change the dropdown selection in the CloudFront configuration from Origin Request to Viewer Request.

I left some commented-out logging in the code as it can be helpful to check what’s happening when testing the function before deploying to CloudFront. I don’t advise leaving logging in when you deploy to CloudFront as the logs are written in the region the code runs in, and can be really hard to track down.

Don’t forget you need to create the function in US-East-1 (N. Virginia) in order to have the option to use it for CloudFront. And it should use a role with both the lambda.amazonaws.com and edgelambda.amazonaws.com trust relationships. Oh and you need to publish a specific version in order to deploy to CloudFront. The second two are at least prompted by the Lambda UI, but the first one had me scratching my head for a while as I generally work in another region.

2018–10–28: Code and instructions updated following feedback from Ali Gajani

2021–12–22: Code updated to fix open redirect vulnerability pointed out by Geoffrey Hayward

--

--

Chris Pointon

Internet entrepreneur and technologist. Co-founder of @Racefully and @Databoxer