Resizing images at Cameo using Lambda@Edge

Cameo Engineering

Published in

cameoeng

8 min readSep 2, 2020

By T.J. Tarazevits

Overview

At Cameo we strive to provide the best user experience while also leverage modern engineering techniques that let us move fast and iterate. Every web developer has had to deal with the issue of optimizing images for the web. Supporting responsive screen sizes and mobile layouts has only amplified this challenge. We provide an in-house image resizing service for all engineers and designers to use. This lets us rapidly prototype and update our site layouts while still delivering optimized images to users regardless of size. Originally built with AWS Lambda and S3, we recently completed an upgrade that leverages Amazon’s Cloudfront CDN and Lambda@Edge to reduce our request volume by over 99.9% and cut the image size by up to 50% by using WebP.

Problem

V1 of our image resizing service was built about two years ago on top of Amazon’s Lambda and S3 service. Talent profile images are used across the site in different sizes depending on the site layout and screen. V1 used an API Gateway to trigger a lambda function. This function would look at query parameters in the image url to do an S3 object lookup using the aws-sdk for Node.js. This system worked fine for the early days of Cameo. Each user would send a handful of requests, which would go through the lambda code and pull the resized image from S3. When someone changed the required dimensions, the first users to visit the site after the change would experience a delay as the lambda would fetch the original image source, resize it, deliver it to the user, and store it on S3 for future fetching. Having programmatic execution for every image request allows for a lot of flexibility.

However as Cameo has grown in scale and traffic volumes, this approach began to show cracks. By default, Amazon gives each account 1000 concurrent lambda executions. This means that only 1000 image requests could be handled concurrently. In the 90s, engineers struggled against the C10K problem, and we had imposed a C1k problem on ourselves. This manifested in image retrieval errors across clients during peak traffic times. One naive solution is to add image retry support to the front-end client. This will allow the client to eventually render the image but doesn’t solve the volume problem.

A popular solution to the 1000 concurrent lambda executions is to implement a dead letter queue in front of the lambdas handling requests. This ensures the number of requests does not exceed the limit and lead to errors, but introduces a new source of latency. During peak traffic, image requests would stack up behind one another, and this latency would increase proportionally to traffic. We needed higher throughput.

A common solution to handling high-throughput asset delivery is to use a Content Delivery Network. We already leveraged Cloudflare for static site assets and as DDOS protection, but we didn’t use a CDN for these images.

Secondary requirements for this new service included the ability to deliver new image formats. Designers used JPEG and PNG images throughout the front-end. At the time, our product team hoped to leverage GIF images to add motion to our front-end experience. This legacy format is very inefficient and was not possible to deliver to users at a reasonable speed and data size. WebP offered a solution with lossy and lossless compression better than comparable JPEG and PNG formations and WebP Animation as a potential format for motion images.

Lastly, we wanted to leverage Infrastructure as Code (IaC) and formalize creating and deploying lambda services going forward. We have fully formed development and production environments for our server-side code, but did not have development environments for our lambda code. Being able to test and tweak lambda changes alongside development code allows us to move fast and provide uninterrupted service to our users.

What is Lambda@Edge

Lambda@Edge is a new feature of their Cloudfront CDN that enables lambda execution based on cache hits and misses. This allows us to leverage the efficiency and geographic scope of Amazon’s CDN while still maintaining the ability to execute code when we need it. Lambda@Edge is made up of four triggers that fire during the lifecycle of a cache access.

Viewer Request — This event is triggered when an http request from a client hits the CloudFront Point of Presence. This allows us to modify the incoming request using our lambda code before CloudFront attempts to handle it.

Origin Request — If a cache miss occurs, the request is sent to the backing origin service. The origin service is the source of truth for content that should be cached. This could be an EC2 instance or container if you’re running a traditional web server, or an S3 bucket. This event can be used to transform the request into a new format the origin service can accept. For example, CloudFront can cache based on query params, but S3 does not support them when retrieving objects.

Origin Response — This event fires with the returned response from the origin services. Lambda code here can be used as a ‘second chance’ if the cache was missed and the origin service returned a non-200 status code. We use this to attempt to generate the resized image asset. It could also be used to reroute traffic to a 404 page or a backup service if the origin is non-responsive.

Viewer Response — This event fires once the origin response has been stored in the CloudFront cache. Lambda code can be used to modify the final response sent to the client leaving the Point of Presence.

These four events give engineers a lot of power to combine lambdas and CloudFront into new services, but Lambda@Edge also brings with it several quirks, which we’ll get into when deploying the service using Serverless.

What is Serverless?

Serverless is a framework and CLI tool to enable Infrastructure as Code across cloud providers. It supports Amazon’s AWS, Google’s GCP, Microsoft Azure and others. Cloud resources like S3 buckets, CloudFront distributions, lambda functions, and more can all be pre-defined in source-controlled files and deployed and rolled back together as a service, rather than manually configured using web tools.

To define a service, first start by making a serverless.yml file. This defines names, runtime variables, and the cloud resources needed for your service. We used Silvermine’s ‘serverless-plugin-cloudfront-lambda-edge’ to extend Serverless to support Cloudfront resources and Lambda@Edge events.

Under the hood, Serverless is leveraging AWS’s CloudFormation Resource Templates. These are bundles of cloud services, controlled through a single file. You can use CloudFormation templates on your own and AWS even provides a drag-n-drop visual editor for linking cloud resources together. Serverless is handling all of this for us automatically.

Image Resizing Infrastructure

Armed with Serverless and Lambda@Edge, our replacement image resizer was ready to take shape. Our Serverless template explicitly defines 6 resources.

First, we define an S3 bucket to store our media files. Here we can use Serverless configuration variables to create a -dev and -prod bucket. This let’s us spin up development verticals of our resizer that operate identically, all with a single CLI command.

Next, we need to create a CloudFront Distribution, here called ImageDistribution. This is a combination of settings and configuration for CloudFront’s cache. Serverless will link Lambda@Edge events to this distribution, and when it is deployed across the globe, AWS will also copy and deploy our lambda code. When defining this, we can pass it a reference to our S3 bucket, so Serverless will use the S3 bucket as the origin for the CloudFront cache. In the configuration, `ForwardedValues` should have `QueryString`:true and `Headers:`Accept` set. This enables caching resized images based on their query parameters, otherwise CloudFront will ignore them and only cache the first size of the image. The Accept header allows the Lambda function to look at what image format the requesting client supports. This allows us to send back a WebP image on supported platforms.

With our asset storage enabled, it time to turn to our code. AWS uses IAM roles for access control to AWS resources, and our Lambda code will assume an IAM role during execution, allowing it to talk with other AWS services. We need an IAM roll that allows reading and storing images to S3 buckets, as well as setting the Read permission of these new objects.

“s3:PutObject”
“s3:ListBucket”
“s3:PutObjectAcl”

We have two lambda functions.

ResizeImage is the lambda function that will use Sharp to process images into new sizes and formats. We define it in Serverless by specifing an entry point in our included JS file with an exported function name `handler.resizeImage` and use the Lambda@Edge extension to tie it to our `ImageDistribution` and the `origin-response` event.

TransformImageRequest handles `origin-request` events and simply converts queryString resize parameters into url-encoded parameters that S3 can work with.

Resizing an Image at Cameo

With our Infrastructure now defined as Code, let’s walk through the lifecycle of an image request. A client makes an http request with a Cloudfront URL. The request is handled by the nearest Point of Presence (PoP) in CloudFront. If the cache is hit, the request is fulfilled. If the cache is missed, a request is made to the backing origin service, in this case, our S3 bucket. This creates an `origin-request` event, which triggers TransformImageRequest. We inspect the request and look at its queryParams. S3 does not support query parameters as object keys. If the request has query params, we rewrite the url to encode the width, height, and quality parameters into the URL. The request is then handled by S3.

The response by S3 creates an `origin-response` event, triggering our ResizeImage lambda. We inspect the request, and if S3 was able to find the original or resized image, we pass that directly back to the client. Otherwise, we begin the resize process. First, we validate that the resize parameters are within our constraints. This lambda runs with 1024 mb of memory, so generating large resized images would lead to runtime crashes. We then pull the original asset URL from the encoded image request URL and check S3 if that asset exists. We then pull that image from S3 and pass it to Sharp with resize parameters. If the resize is successful, we change the response status code to 200, and base64 encode the image as the response body. Throughout the process, if we encounter an error, we send a custom JSON error instead of the generic S3 missing key error.

Come Work With Us!

We are tackling a lot of interesting problems at Cameo and are seeking for highly talented individuals. We’re a fully distributed team — and offer high amounts of ownership on whatever project you’re working on. If you have access to stable WiFi and interested in our product, let’s chat. View open positions here!