Converting images to WebP from CDN
--
The rise of WebP: A new image format for the web
The WebP format has become increasingly popular since Google introduced it in 2010. Its biggest selling point lies in its ability to produce much smaller file sizes while maintaining similar image quality. Faster load times = higher conversion rates.
WebP is a modern image format that provides superior lossless and lossy compression for images on the web. WebP lossless images are 26% smaller in size compared to PNGs. WebP lossy images are 25–34% smaller than comparable JPEG images at equivalent SSIM quality index. — Google
Tooling and considerations
Tooling: AWS (S3, CDN, Lambda@Edge), Sharp, User Agent
There are a few considerations we have to make before getting to the code:
- Firstly, not all browsers support WebP. Currently, WebP is natively supported in later versions of Google Chrome, Firefox, Edge, the Opera browser, Android Browser and Samsung internet.
- We may have a store of hundreds or thousands of pictures that we want to convert from supporting browser requests.
- We have to change what the HTTP request and response objects look like.
The plan: On-the-fly conversion
We’re going to listen for requests to CDN, and return a WebP image for all supporting browsers, granted that a WebP image exists. Otherwise, we’re going to fetch the image in its original format and convert it to WebP and return the newly converted WebP image.
It’s going to be…
CDN requests and responses
On top of the considerations, we have to understand what the CDN request and response objects look like:
We’ll be triggering our lambdas with the viewer request and origin response objects. The reason for using an origin response is that we want to leverage CDN caching for responses where the image conversion has already happened. However, for requests, since we modify the request uri
, we change the cache key, and therefore need to do this on every viewer request.
The CDN Request Object
Don’t get intimidated, the important thing is that this object has the requestheaders
object and the requesturi
string.
The CDN Response Object
Again, fear not — what’s really important here is that we have access to the headers
and the request uri
string.
Let’s get coding!
Summary
Here are the steps we’re going to take:
- Listen for requests to CDN, and trigger a Lambda function that hijacks any viewer request.
- Determine if the request event is for an image and if the browser requesting the resource supports WebP based on the
user-agent
we receive from the request. - If we determine that the request is for an image and that the browser supports WebP, we replace it with the
request uri
image extension with.webp
and add the original extension into therequest header
- Next, we trigger a separate Lambda that hijacks any CDN origin response.
- If the
request uri
on the response event has a.webp
extension, and theresponse status
is a 404, we check our S3 bucket for the same image, but with the original extension, we placed into our request header in step 3. - If we find an image with the original extension in S3, we run a WebP conversion using Sharp and place it in the origin response, otherwise, we leave the 404 response unaltered.
The code: Viewer Request
The code for the Viewer Request Lambda is straight forward. It compares the browser and browser version from the request to a predefined list of supported browsers to determine WebP support and rewrites png, jpg and jpeg
extensions to webp
; and leaves all the heavy lifting to the Origin Response Lambda.
This leaves our function pretty lightweight, which is pretty ideal since Viewer Request and Response Lambda’s can’t be more than 1MB in size.
The code: Origin Response
The origin response function does all the heavy lifting. If the response status is a 404, it fetches request headers to determine the original file extension. It then replaces the webp
extension in the request uri
with the original file extension and queries S3 with the new uri (s3Key)
.
If it finds the file in S3, it then converts the image to WebP using Sharp, puts it in the S3 bucket, and places it in the response body as a base64
image. It finally sets the Content-Type
header to image/webp
. If it fails to find the image in the S3 bucket, it sets the Content-Type
header to image/webp
and leaves the response as a 404.
That’s it!
Gotchas
- If you’re deploying using the Serverless Application Model (I’ve attached a conjoined template in the appendix), make sure you use 2 separate projects for your viewer request and origin response functions — AWS won’t let you deploy viewer requests more than 1MB (Installing Sharp will make your zip exceed this).
- You need to give your functions the
edgelambda.amazonaws.com
execution role. - Cloudfront Triggers for Lambda@Edge are only available in
us-east-1
. Make sure your Lambda’s are deployed in that specific region. - Cloudwatch logs for your Lambda’s won’t necessarily be in the
us-east-1
region, instead, they’ll be in the region closest to where you’re making that response from (It’s CDN after all) - If you’re on Mac OS, Sharp might not run if you install it locally and deploy it to AWS — it needs to be specifically installed for Linux. There are multiple ways to do this. Sharp recommends using t2.micro instance and ssh’ing into it; I find this unnecessarily complex and difficult to maintain across teams — I use a Docker container running Linux to install all my npm packages and create a zip that I push using aws sam. I’ve attached it in the appendix.
Appendix
Conjoined SAM Template
Creating Functions ZIP from Docker Container
With the Makefile
and Dockerfile
in your root, run make all