Implementing an image processing service using imgproxy

Published in

Trendyol Tech

15 min readJun 4, 2024

Disclaimer: With so many social media content being actual advertisements paid by the corresponding vendors of the products, I feel obliged to clarify that this article is not one of them. This is nothing more than one of the many success stories within Trendyol, utilizing great open source software, and certainly nothing less.

At Trendyol, we use MultiCDN architecture to deliver content to our users in a fast, secure and highly available way. This means we manage our own CDN infrastructure in addition to working with 3rd party CDNs. To have more information about the general architecture of Trendyol CDN, or Girdap as we call it internally, along with several optimizations aimed at increasing the overall efficiency, refer to two informative articles by Levent CENGIZ: Trendyol CDN — 1 and Trendyol CDN — 2. And if you wish to dive deeper into the details of one of these optimizations, be sure to check Utilizing Intel® QuickAssist Technology to Enhance TLS Performance in Trendyol CDN.

This article aims to provide detailed information about how we implemented our image processing service (Vakum, as we call it internally) and why.

General Information

Providing a reliable e-commerce platform is one of Trendyol’s primary focus areas. It would be greatly challenging, if not impossible, to have an e-commerce platform without any images. Visual data is crucial for displaying information about products. Not only does it ease the browsing process for a customer, but it also plays a key role in decision to purchase a product. It is probably safe to say that very few people would buy a product without seeing it first, either physically or through an online image. For example, here is what happens to Trendyol’s website dedicated to the Netherlands when we purposefully reduce hardware resources for image processing in a test environment:

Would you shop from an e-commerce platform with lots of images failing to load?

Product images failing to load might be a decisive factor leading to a change of heart about purchasing a product; whereas images failing to load in various parts of an e-commerce platform, or simply slow loading images, might be a decisive factor resulting in browsing away from the platform.

As a result, we need to ensure that all images on the platform are loaded not only correctly but also fast.

Challenge

Practically, every product displayed on an e-commerce platform needs to have at least one image associated with it. In real world scenarios, however, there are typically 5–10 images per product. Considering different clients, such as desktops, laptops or mobile phones, which have varying resolution capabilities, or the different parts of the page requiring different-sized images, having various resolutions of the same image becomes inevitable. Here is an example product page:

5 color options, 7-10 images per color option; this product has around 40 images and a couple of videos associated with it

As a result, with an e-commerce platform such as Trendyol’s, where the number of products is not in mere millions but in hundreds of millions, the total number of images easily reaches several billions.

With billions of images,

Resizing the images
Sending the images over network in an efficient format

become two of the numerous challenges in a CDN context.

It is impractical to resize all of the images to the needed resolutions and store them on disk alongside the originals whenever a new resolution is required. Similarly, it is unfeasible to convert all of the images into the necessary format whenever a new, more efficient image compression algorithm appears.

Solution

One of the viable solutions is processing the images online, resizing them, or converting them into a different format on-demand and on-the-fly. This way, out of billions of images, only the actual requested ones get processed and stored in cache until they are no longer used. New resolutions can easily be added when needed, as can new image formats, such as AVIF or WEBP, which aim to provide more efficient image compression algorithm.

For on-the-fly image processing solutions, Trendyol previously relied on image processing services of third parties. In the early phases of Trendyol CDN, the edge nodes were dependent on these third party services to process images. This posed several issues from a MultiCDN architecture perspective:

Service costs, as it is often provided as a paid service by third parties,
Limitation in capabilities and performance of the service,
Lack of control in scaling the service,
Dependence of our edge nodes on third-party services in order to operate normally.

Designing an image processing service for our own use, over which we have full control, sounded like a reasonable solution to address all of these issues.

Architecture

Fortunately, there are a couple of straightforward ways to implement an image processing service. There already exist several image processing libraries, including but not limited to libvips, OpenCV, ImageMagick, GraphicsMagick or Pillow. In addition, there are numerous open source image processing server software that use these libraries, such as imgproxy, thumbor, imaginary, sharp or imageserver.

Considering the variety of libraries and server software using these libraries, an in-depth comparison is beyond the scope of this article. In our research, we found various articles and benchmarks showing libvips particularly standing out in terms of speed and scalability, both of which we considered as key requirements for designing such a service. An example article comparing ImageMagick and Libvips performance can be found in “Boosting image processing performance, from ImageMagick to Libvips”. Libvips Wiki also contains the results of various benchmarks and comparisons of speed and memory use between image processing systems.

Libvips comes with a variety of bindings, making it also an option to develop an in-house product for image processing and integrating it into existing code. I personally thought the Lua bindings sounded interesting in the context of caching servers and wrote a small Proof of Concept Lua code to run with OpenResty lua-nginx-module. It was so surprisingly easy that it was quite fun, though naturally nowhere near production-ready (I might be able to share it in a later article).

Thus, with various open source solutions already available, we decided that implementing our service would be much faster by taking advantage of a production-ready solution. This time, imgproxy, powered by the libvips library under the hood, stood out compared to its alternatives, in terms of speed, scalability and maturity (You can find their benchmark results in Image processing servers benchmark).

imgproxy has a cool feature set, a complete list of which can be found on imgproxy features page. Although some of the features require a pro license, the open source version still has enough functionality to meet our immediate initial requirements:

Simplicity in processing an image and the ability to specify processing options in the URL.
Support for conversion to a variety of formats, including but not limited to JPEG, PNG, GIF, AVIF and WebP.
Resizing the images on the fly.
Sharpening the images.
The ability to specify image quality (when supported by the image format, such as JPEG, AVIF and WebP).
The ability to get source images over an HTTPS connection.
Authorization.
Source image file size and resolution limits.
Image source restriction.
Prometheus metrics
Support for Etag and Last-Modified headers

Infrastructure

Integrating an image processing service using imgproxy to the existing infrastructure is not rocket science, thanks to its simplicity. All that needs to be done is deploying it on a server (or more servers), whether on bare metal, a VM, the cloud or a container ecosystem, depending on your choice of infrastructure. This allows it to respond to the HTTP requests sent by other nodes. It supports a variety of image sources, such as an HTTP(S) service, public cloud storage services, and local files.

Therefore, integrating such a service to the existing infrastructure is simple enough: image processing servers can serve to the origin shield nodes, which in turn serve to the edge caching nodes:

Vakum = origin shield + image processing nodes

So, when a client requests content from our CDN, one of the edge nodes is responsible for responding to the client. It checks if the content is already in its cache; if it is, the content is returned immediately. If the content is not in its cache, the edge node requests it from the origin shield nodes. Similarly, the origin shield nodes check their cache; if the content is present, it results in an immediate response. If not, they request it from either the origin or the image processing servers, depending on the content type. The image processing nodes return the content based on their configuration, which might be a format conversion, resizing, or applying filters. The processed content is then cached by the origin shield nodes and returned to edge nodes as a response. The double cache layers (caching in both origin shield nodes and edge nodes) helps reduce the load on the origin as well as the image processing service.

It is evident from the setup that all components need to be scalable to prevent a single point of failure or bottlenecks. Using nginx or tengine for caching can provide scalability for edge and origin shield nodes; while load balancers combined with a cluster architecture can offer scalability of the origin service. Therefore, it is equally crucial to choose image processing software that is not only fast but also scalable to prevent bottlenecks in the infrastructure.

Performance

There is no doubt that it is possible to implement a horizontally scalable image processing service using imgproxy. However, to achieve better capacity planning, it is also important to have an understanding of its vertical scaling capabilities. Libvips library is not GPU-accelerated; all image operations utilize the CPU. Considering the use case of processing images on demand, where they are requested individually (as opposed to batch processing), the overhead of reading a single image to and from the GPU would limit the speed gain considerably. According to John Cupitt’s estimation on the “How libvips can use NVIDIA GPU on AWS g2 series VM/machines” question, the gain would likely be less than 20%. While this might still sound significant gain in a high-traffic environment such as Trendyol’s, the fact that we achieve a 98% cache hit ratio on edge nodes, combined with the performance benchmark results of imgproxy below, led us to decide that GPU performance gain could be researched in detail for future iterations.

We have performed a series of performance tests to determine the optimal IMGPROXY_WORKERS (IMGPROXY_CONCURRENCY in earlier versions) setting to maximize performance of a single imgproxy instance.

When testing imgproxy’s performance, it seems the performance of the image source (i.e. how fast the source files can be downloaded from the origin to the imgproxy) also plays an important role, resulting in a different IMGPROXY_WORKERS beyond which there was no performance increase observed. After sharing the results of the first test setup below with Sergei Aleksandrovich, the author of imgproxy, he was extremely helpful in providing insights about how imgproxy works:

There’s usually no sense in processing more images simultaneously than CPU cores available. That’s why we usually recommend starting with 2 workers per CPU core: virtually, one worker processes an image and the other worker downloads an image. So, the slower the networking between imgproxy and the image storage, the more workers you may need to fully load your CPU. A need for 4 workers per CPU core usually means pretty slow networking. When images are stored locally, even 2 workers per CPU don’t give a better performance than a single worker per CPU.

With this information in mind, I repeated the tests by using the local filesystem as the image source. The results are below.

First test setup: Remote origin as the image source

The first test setup involves two nodes on AWS in the same Availability Zone. One node (girdap_node) is responsible for sending the requests, while the other node (vakum_node) processes the image (a combination of resizing and/or format conversion to WebP) after requesting it over HTTPS from the origin. Both nodes are c7i.xlarge instance types, with 4 vCPUs and 8GiB of RAM. The ab client from apache2-utils on Ubuntu 22.04 was used on girdap_node to send the requests to the imgproxy port on vakum_node with the following command:

ab -c 64 -n 5000 http://vakum_node:imgproxy_port/path/to/image.jpg

where image.jpg is a 1.1MB, 1220x600 PNG banner image. ab was used with a concurrency of 64, and a total number of 5000 requests.

The origin is a test server outside of AWS with the following ttfb values from the vakum_node:

DNS lookup: 0.047031 TLS handshake: 0.091181 TTFB including connection: 0.188512 TTFB: .097331 Total time: 0.397552
DNS lookup: 0.000622 TLS handshake: 0.045828 TTFB including connection: 0.132497 TTFB: .086669 Total time: 0.341440
DNS lookup: 0.000745 TLS handshake: 0.048010 TTFB including connection: 0.163474 TTFB: .115464 Total time: 0.435110
DNS lookup: 0.000659 TLS handshake: 0.044964 TTFB including connection: 0.258268 TTFB: .213304 Total time: 0.690279
DNS lookup: 0.000805 TLS handshake: 0.045762 TTFB including connection: 0.134070 TTFB: .088308 Total time: 0.369743

fastest .086669 slowest .213304 median .097331

These are the results:

WebP conversion only:

| IMGPROXY_WORKERS | Total time (s) | Requests/s | 95% (ms) | 99% (ms) | 100% (ms) | Max avg. time per req. < (ms) |
| :--------------: | :------------: | :--------: | :------: | :------: | :-------: | :---------------------------: |
|        4         |    369.797     |   13.52    |   5061   |   5165   |   5366    |             83.84             |
|        8         |    174.619     |   28.63    |   2461   |   2605   |   3101    |             48.45             |
|        16        |    158.634     |   31.52    |   2259   |   2369   |   2574    |             40.22             |
|        32        |    158.969     |   31.45    |   2281   |   2365   |   2716    |             42.43             |

Resizing to 610x300 and WebP conversion:

| IMGPROXY_WORKERS | Total time (s) | Requests/s | 95% (ms) | 99% (ms) | 100% (ms) | Max avg. time per req. < (ms) |
| :--------------: | :------------: | :--------: | :------: | :------: | :-------: | :---------------------------: |
|        4         |    235.601     |   21.22    |   3264   |   3389   |   3636    |             56.81             |
|        8         |    139.890     |   35.74    |   1984   |   2084   |   2639    |             41.23             |
|        16        |    115.405     |   43.33    |   1693   |   1771   |   2036    |             31.81             |
|        32        |    115.054     |   43.46    |   1668   |   1772   |   2003    |             31.29             |

First of all, these numbers are already quite impressive. It is possible to convert 64 images to WebP at the same time in under 2.5s (longest time under 100% column with IMGPROXY_WORKERS set to 16). Converting after resizing takes even less time, with the longest time being 2 seconds. Although there is an added resize operation, it seems that the reduced image size resulted in a reduction in WebP conversion time.

Second, these results show that with an increasing number of IMGPROXY_WORKERS, the performance gain increases, up to the point of 4 times the number of vCPUs. Beyond this, there is no further performance gain, and the numbers are very close to each other, slightly more or slightly less.

How about adding some more vCPUs? If we change the instance type of the vakum_node, where imgproxy is running, to c7i.2xlarge so that it has 8 vCPUs and 16GiB of RAM:

WebP conversion only:

| IMGPROXY_WORKERS | Total time (s) | Requests/s | 95% (ms) | 99% (ms) | 100% (ms) | Max avg. time per req. < (ms) |
| :--------------: | :------------: | :--------: | :------: | :------: | :-------: | :---------------------------: |
|        16        |    120.512     |   41.49    |   1646   |   1722   |   1863    |             29.10             |
|        32        |     76.871     |   65.04    |   1182   |   1288   |   1642    |             25.65             |

Resizing to 610x300 and WebP conversion:

| IMGPROXY_WORKERS | Total time (s) | Requests/s | 95% (ms) | 99% (ms) | 100% (ms) | Max avg. time per req. < (ms) |
| :--------------: | :------------: | :--------: | :------: | :------: | :-------: | :---------------------------: |
|        16        |    112.381     |   44.49    |   1528   |   1623   |   1961    |             30.64             |
|        32        |     55.864     |   89.50    |   909    |   1014   |   1318    |             20.59

Note that, when the IMGPROXY_WORKERS is kept at 16, there is still a performance gain with 8 vCPUS compared to 4 vCPUs for bigger image sizes, but almost no gain after resizing. This is one of the hints that the performance gain might not be coming from the image operation itself, but from downloading the images faster.

However, the boost comes when we set it to 32, meaning 4 x vCPUs, beyond which no performance gain happens.

2nd test setup: Local file as image source

As mentioned above, after sharing the results with Sergei, I repeated the tests using the same image as the source, but from the local disk of vakum_node (defining IMGPROXY_LOCAL_FILESYSTEM_ROOT in imgproxy configuration). Here are the results, for WebP conversion of the original image without resizing it:

| IMGPROXY_WORKERS | Total time (s) | Requests/s | 95% (ms) | 99% (ms) | 100% (ms) | Max avg. time per req. < (ms) |
| :--------------: | :------------: | :--------: | :------: | :------: | :-------: | :---------------------------: |
|        4         |    117.393     |   42.59    |   1544   |   1560   |   1590    |             24.84             |
|        8         |     75.002     |   66.66    |   985    |   997    |   1028    |             16.06             |
|        16        |     74.907     |   66.75    |   982    |   1112   |   1196    |             18.69             |
|        32        |     74.858     |   66.79    |   1087   |   1147   |   1271    |             19.86

This confirms what Sergei already explained, having IMGPROXY_WORKERS beyond the number of vCPUs does not result in a performance gain.

So, how about downloading the file from localhost? The following results were obtained by downloading the image from an nginx instance running on vakum_node:

| IMGPROXY_WORKERS | Total time (s) | Requests/s | 95% (ms) | 99% (ms) | 100% (ms) | Max avg. time per req. < (ms) |
| :--------------: | :------------: | :--------: | :------: | :------: | :-------: | :---------------------------: |
|        4         |    124.697     |   40.10    |   1641   |   1663   |   1692    |             26.44             |
|        8         |     79.616     |   62.80    |   1048   |   1062   |   1087    |             16.98             |
|        16        |     80.361     |   62.30    |   1143   |   1193   |   1372    |             21.44             |
|        32        |     79.720     |   62.72    |   1348   |   1453   |   1586    |             24.78             |

Same results here, no performance gain beyond the number of vCPUs.

With these results, it is indeed clear that if the image source used for imgproxy is the local filesystem, IMGPROXY_WORKERS should be set to the number vCPUs (or cores). If the image source is remote, depending on the download performance, IMGPROXY_WORKERS can be set to more than the number of vCPUs. In our test case, 4 x the number of vCPUs was the point beyond which no performance gain occurred. Of course, if the remote source is really slow, it might also be an option to attempt to increase the performance of the remote source itself. If the origin is responding too slowly, it might be a good idea to deploy another layer of caching between imgproxy and the origin, possibly increasing the download speeds. In that case, it might be just enough to set the IMGPROXY_WORKERS to 2 x the number of vCPUs, to get the maximum performance.

AVIF performance

One final piece of information as a side note: tests converting images to AVIF as an alternative to WebP showed much slower results (around 10 times slower), making it unfeasible for use in production. This might be related to the performance of the underlying conversion library and needs to be further investigated. This is a subject that we plan to revisit in the future.

Conclusion

Trendyol, serving the largest and most reliable e-commerce platform in Turkey, has also become one of the most popular e-commerce platforms in the world with around 30 million customers and 250k business partners. The platform must rely on billions of images displayed with extreme accuracy and speed. This challenge is unique for several reasons, including but not limited to the significance of visual data in an e-commerce platform, its overall size, the variety in resolutions and image formats, and the typically longer processing times involved.

As a result, an effective architecture and infrastructure for processing images is a key factor to ensure the reliability of the platform and business continuity. There can be various approaches to tackle this challenge, including relying on 3rd party services or managing our own.

Trendyol transitioned from relying on 3rd party services to managing its own image processing infrastructure, which resulted in various advantages:

Reduced overall service costs
Increase in capabilities and performance of the service
Full CDN infrastructure operating independently
Full control in scaling the service, resulting in decreased 5xx and 4xx errors across the platform

The new design, utilizing multiple caching layers, is performant, scalable, redundant, and simple to maintain and observe. In achieving such a reliable infrastructure, imgproxy has played a key role as the software empowering us to reduce the complexity of all image operations with its simplicity, scalability and rich set of features. We have been benefiting from the new design for over a year now, processing around 3 billion images monthly.

As mentioned previously, visual data is one of the most important form of data in the context of an e-commerce platform, if not the most important. It is one of the main contributing factors in keeping customers engaged with the platform, greatly aiding in browsing the site or purchasing a product. Implementing a reliable image processing service over which we have full control has been one of the success stories within Trendyol, resulting in the speed and freedom needed to adapt the infrastructure to the ever-changing needs of the platform.

Want to know more? Or contribute by researching it yourself?

We’re building a team of the brightest minds in our industry. Interested in joining us? Visit the pages below to learn more about our open positions.

Trendyol jobs

Job openings at Trendyol

jobs.lever.co

References

Boosting image processing performance, from ImageMagick to Libvips
Libvips Wiki
Libvips Language Bindings
Image processing servers benchmark
imgproxy features
imgproxy documentation
How libvips can use NVIDIA GPU on AWS g2 series VM/machines
ttfb tool to measure the time-to-first-byte
Featured image courtesy of CHUTTERSNAP on Unsplash.