Real-time Thumbnail Generation in a (mostly) Serverless AWS Architecture

Real-time thumbnail generation is a surprisingly helpful and complex task to do at scale. We implemented it by using AWS Lambda.

Published in

THRON tech blog

8 min readJun 16, 2020

two stickmen cropping and resizing images

In a normal web page, images are about 45% of the weight of the entire page (Source: https://httparchive.org/reports/page-weight), and most of the images we see on a web page are thumbnails. Thumbnails are preview images for interactive content such as videos and especially resized/cropped images used to avoid inserting the original image. There are essentially two reasons to use a thumbnail instead of the original image:

Size (in pixels) too large for the page (scaling), focusing on just a part of the image (crop) provides a better user experience;
Size (in bytes) too large, heavy images increase the reaction time by slowing down the web page loading, worsening the user experience

For the aforementioned reasons, the ideal thumbnail depends on the type of “viewport” (whether mobile or desktop) and the connection speed.

How we use “thumbnails” and what we needed

THRON is a digital asset management, the idea behind it is to centralize all content in one place.

The typical workflow of a user is to upload the original content to the platform and then share it on the various platforms it owns (e.g. web sites, mobile apps, Facebook, Twitter).

However, each platform needs a different thumbnail to better adapt to the available space. The traditional solution consists of pre-calculating a defined set of thumbnails to use, depending on the platform being used. This approach has two main shortcomings: it generates a high number of images that will never be used and, regardless of the image you choose, they will never be exactly what your customer requests. The ideal approach is to be able to customize the thumbnails, each time, for every need.

Content centralization is important

Customers do not wait for your solution to provide the answers to their need, they will either abandon the product or twist its usage to match their needs. In the case of thumbnails, if the system cannot solve the customer needs, the customer would usually have their designers to generate the thumbnails locally, using tools such as Adobe Photoshop, and upload each thumbnail to the DAM. The price to pay, in this case, is that you will have different digital assets representing the same content and this leads to worse data quality in analytics and content classification.

Thumbnail generation requirements

The approach we choose is to generate the thumbnail in real-time when the customer asks for the content itself because that is the only time where you know what the desired target features are. Consider a dynamically resizable website, in such case, the thumbnail size changes at any given viewport size and may change during the viewing session too (imagine rotating the smartphone). Requirements:

Flexibility. Provide enough features to quickly edit images (e.g. resize, cut, adjust brightness, format, etc.).
Speed. Generating the thumbnail must not reduce the page loading speed compared to pre-calculated thumbnails.
Efficiency. Creating images with each request would be a waste of resources, you have to implement one or more layers of cache, but without saving them for eternity if they are no longer used.
Scalability. Traffic trends vary widely throughout the day and in the presence of massive events/imports. The service must always adapt without requiring human intervention.

State of the Art

When we developed the first version of the application we took a look at what the market offered, but the solutions were scarce and did not meet our requirements. Also, a proprietary solution would allow deep integration with our services, faster speed, and security (data does not come out of our AWS VPC, and thanks to the Gateway VPC endpoints).

Fun fact: a few years after the release of our service, AWS created a blog post to address the problem of dynamically generated thumbnails. The architecture discussed in the article is similar to the one we have created over the years.

Real-time generation of thumbnails

When a customer uploads content, the system first converts the files to a 4k resolution image. This image is saved in an S3 bucket and is used by our application as a source. This step is necessary to keep input consistent, the files that come from customers may come with very high (or very low) resolutions and uncommon formats (vector formats too). By having a consistent source we were able to optimize for speed and performance.

The thumbnail generation service requires the image to be requested with additional parameters that describe the desired size and, optionally, the editing to be applied to the image to obtain the thumbnail.

THRON thumbnail generation flow chart — High-level flow-chart that shows how thumbnail generation cache is used (src: THRON)

As described in the flowchart, each request passes through a backend, which checks the thumbnail-reserved S3 bucket if the resource has already been created, if so, it will simply return the cached copy. If the desired version of the thumbnail has never been created before, it downloads the source image from the original S3 bucket, creates the thumbnail, returns it to the customer, and then loads it back to the S3 bucket of the thumbnails (focus on speed rather than consistency).

We opted for a separation of the two buckets to set different permissions and configurations. The bucket of the originals is under backup and replication to another region (which is unnecessary for thumbnails, we can generate them at will). The thumbnail bucket is under a lifecycle that erases thumbnails that have not been used for some time (LRU). This policy is not implemented by default on AWS, but we were able to achieve it using a lifecycle that erases older thumbnails (based on creation date) and a mechanism that “renews” the date of creation of objects on S3. This mechanism is nothing more than copying the objects on themselves, in this way the content does not change, but the creation date does.

The backend has evolved over the years: initially, it was in an EC2 machine balanced via an ELB, but the time it took to scale out was not optimal. We then moved to a container, this step allowed us to considerably increase the scaling speed (from the minutes for a new EC2 machine to seconds for a new container).

But the most significant step was the use of AWS Lambda, this allowed an almost unlimited scaling with a lower cost. We still keep using a container, but it only has a web server that returns the thumbnail from the S3 bucket, the creation of a new thumbnail is completely managed on AWS Lambda. This has allowed us to have more consistent performance even in the case of high-load. The ECS-hosted component is still required because Lambdas have a maximum response payload limit. If the image created is larger than this maximum payload the lambda does not respond directly with the image, but it first uploads it to S3, and then responds to the ECS component, so that the ECS component will download it from S3 and, finally, return it to the customer.

We’ve also evaluated the switch to API Gateway, but the cost compared to autoscaling containers is much higher, and the problem of images exceeding the lambda payload would remain.

How dynamic thumbnail generation is helping customers

The most common use case is to insert an image into an <img> or <div> tag. It is important to note that the user that is creating the page is not the same that prepares the images, and sometimes the size and the aspect ratio of the destination area is only known at the later stages of the development or even after that. This causes the need to manually crop and resize the image. You should also consider that most of the time the source images are not designed with the website as a target platform, this means that they might follow photography composition rules and the subject is rarely in the center of the frame in a rectangular aspect ratio. But this all goes wrong when you need to embed a square image, you will not just need to crop the source but you need to crop it off-center to ensure that the subject stays in the center. This is something that we entirely automated by automatically detect the subject and keeping it at the center of the frame. Check this source image, note that the subject is on the left side of the frame; the following image is a square crop with automatic detection of the subject generated by our system:

Please note the left and right margins. Centering a subject is not just placing the same space around it. Link

Another typical use case regards product images, they usually come with a uniform background which is to be used as passe-partout, shrunk or extended to fill the frame without cropping the product at all. This is what you do when you want to place your product image on an e-commerce website. This is also something that is usually performed by human operators with a huge waste of time and talent. We also automated this, detecting the background and making the cropping rules adapt to this scenario. Consider this source image, the image below is the result of the “product mode” on a square destination of 500x500 px:

Product images usually come at very high resolutions, but when rendered on mobile this causes a waste. Link

This last example also shows a further optimization: the source image is in PNG format, with lossless compression. This might be too much for some context, especially if you are just rendering a small thumbnail on a search result page. Transcoding the image to JPG, in addition to cropping and resizing, makes sure that the resulting file uses as many bytes as needed but not one more than that. In the previous example, the 500x500px PNG image weight would be 421 KB, JPEG only 72.1 KB.

Next steps

This service has been operating for more than 5 years now, and its development has never stopped. We are working on improving the performance, especially for very big source images. We are thinking about strategies to remove the ECS component to move entirely on serverless architecture and we plan to deepen the integration with the CDN providers to automatically detect the best encoding format (eg. WebP for Chrome users).

If you are interested in working on these topics, check our hiring page.