How switched its images to the WebP format in production

Switching all images on a big site is no trivial task. Yet this is exactly what did in the first half of 2018 in order to improve performance for our users. During this transition we switched from JPEG to WebP for all supported clients, for millions of user-uploaded images, across a variety of different platforms.

In this blog I’ll describe the steps we needed to take to handle this conversion without any user interruption, with minimal development effort, and without operational problems.

The basics: What is WebP actually?

WebP is an image format, developed by Google. It aims to significantly reduce file sizes compared to alternative formats such as JPEG or PNG. As such it allows images to be downloaded and shown faster to users which is especially helpful on lower-bandwidth connections such as a mobile device.

Although WebP was introduced quite a while ago, it was mostly Chrome which supported the standard. Firefox just started to support the format from 2019 onwards!

Even though Chrome has a significant market share, some platforms are still lacking support for the new format. Especially on iOS support in Safari is non-existent, and neither Internet Explorer nor Edge has an implementation available. As a result of this, supporting WebP as the only format would not work.

Supporting WebP as the only format is simply not an option.

Why does one want to support the WebP format?

However many people and organizations claim WebP images are superior in load times, and even in supported features:

Many web developers will be familiar with the saying that a more performant page is more likely to keep a user engaged, or more likely to lead to conversions. As such, improving the load times of a heavy resource such as an image seems worthwhile

However migrating a website with millions of user uploaded images is no trivial task. We have to consider several different frontends for our systems, all of which are in continuous use. Additionally our users may be on very different hardware: from a low-end Android device, to high-end workstations, to users who access our systems over Internet Explorer 10 on Windows Terminal Server. Ensuring everything works for everyone suddenly becomes a non-trivial task!

Our imaging infrastructure

Due to the importance of great visuals on our platform, has long used an imaging system co-developed with inventid. The goal of this was simple:

  • Be flexible; whenever we want to change image sizes or ratios on the platform, we do not want to recode all existing hard-coded images,
  • Be very flexible; determine how to crop and resize images at request time instead of on upload,
  • Be fast; encode once, and serve caches blazingly fast from the CDN,
  • Support the _2x suffix easily (#legacy),
  • Metrics; export all metrics to our influxdb systems,
  • Crop on upload, prior to writing the original.

In the end we ended up with iaas, our in-house developed Imaging As A Service product. This service handles hundreds of thousands of requests on a daily basis, all aimed to give visitors the best images they can obtain. A longer explanation can be found in a previous blog.

Rolling it out to users

As WebP is not supported across all browsers, we needed to serve the images only to clients supporting the new format and use a fallback to older image formats when required. Luckily (debate pending) Chrome sets an additional value in its Accept header (image/webp to be exact) when it supports the WebP format. We could work with that value!

One of the great decisions we made when developing iaas was to always save the originals, and work from those to scale them to any desired size or format. As such we could start to switch to another format directly from the source image on our systems. Thinking ahead of future requirements helps a great deal!

Designing with the future in mind makes future migrations significantly easier to complete successfully

However we had no experience running WebP conversions in production. Nor did it seem like a great idea to switch over all browsers in one go. That would likely become a self-inflicted DDoS attack as long as all caches did not have a single WebP image in them.

As such we did it slightly differently: we implemented a flag in iaas to allow a client to opt-in to WebP with an additional query parameters. If iaas was configured to honour that query parameter (you can disable it) it would validate whether the browser also understood WebP. If that was the case as well a WebP image would either be created or served directly from cache.

Using this approach we could first test the visual quality of WebP images on a limited scale (we turned the flag on for employees only). Quite soon we discovered that we needed some minor tweaking of the quality parameter of iaas, as JPEG and WebP seemed to have a different scale.

Once we were happy with the generated images and people could no longer visually determine whether they were looking at an image in JPEG or WebP format (yup, we validated this!) it was time to start a broader roll-out.

While our own colleagues had already filled our caches a tiny bit we were far short of a reasonable cache fill rate to enable the feature across the board. We therefore decided to slowly increase the user group for which the feature flag was enabled. That would ensure that caches would start to fill, and target a larger variety of devices types as well.

After two or three weeks of heavy rendering, most of the images had been rendered at least once, and was hot in our image CDN. From this moment on, we flipped the second switch in iaas, where opting in for WebP images was no longer required. Instead WebP would be served to all clients supporting the new standard. That meant the new formats were also available on our public pages!

What did this do for us?

Our WebP compressed images appeared visually identical to our users, but at a size which was typically 25% to 30% smaller! That is significant gain for our users. Especially for those looking for a job on a slower 3G network, that size reduction can just make the difference between an application which loads snappy, or an application with a sluggish user experience.

It makes the difference between an application which loads snappy, or a sluggish user experience.

Meanwhile, we did not notice a higher system load due to supporting an additional format. The gradual rollout ensured we never left the safe operating margins of our imaging system, so even when new versions were being generated on-the-fly, there was no noticeable slowdown. And as any system operator knows: a system that is not on fire is generally better than one that is :)

In the end the switch to WebP was mostly effortless due to a proper system setup from the get-go, a proper planning, and patience. Because of the improved performance we have seen in production, we’d advise any team to take a look into your migration path!

Software Engineer. Lead software engineer @, former CTO @ General nerd.