Picture-perfect thumbnails for all videos
We recently introduced AVIF support to images all over the Vimeo platform. Today, on the same theme, I’d like to introduce three improvements to video thumbnails that were deployed over the last few weeks.
Better thumbnail generation for HDR video
High Dynamic Range (HDR) images and videos boast an increased range of luminosity, which allows for the reproduction of brighter and more realistic images on compatible displays. When these images are displayed on a regular Standard Dynamic Range (SDR) screen, they need to be converted into SDR representations as well through a process known as tonemapping. There is no one right way to perform tonemapping of HDR videos. To obtain good, consistent results for the majority of Vimeo uploads, we implement our own method that is applied at transcode time.
In the past, our video and image transcoding systems were totally independent, which caused some inconsistencies in our processing and increased the complexity of our thumbnail generation system, as we had to work around the limitations of our image pipeline. These inconsistencies included color space conversion and tonemapping.
Because our thumbnails all use a standard dynamic range, we used to generate the thumbnails of HDR uploads after transcoding, using the tonemapped SDR videos. While this resulted in images that matched the SDR versions of the videos, it also caused the thumbnail generation to be delayed, which is bad both for user experience and for internal complexity.
We didn’t want to copy our tonemapping code directly and have to maintain it in two separate codebases, but due to tight coupling between different parts of our video transcoder, it took a major refactoring effort to extract the relevant code into a library that can be shared by our video and image processing systems.
Now, thumbnails generated from HDR videos are tonemapped and converted correctly, without the user having to wait for the video to be transcoded. This reduces the delay between the upload and thumbnail generation, and can, in some cases, improve the quality of the thumbnail image. More importantly, this paves a path towards implementing support for displaying thumbnails in HDR on supported displays, which has been on our roadmap for a while.
Interlacing is a technique that increases perceived frame rate at the expense of resolution by alternately sending only the even and odd rows of a frame. Because these sets of rows are not temporally aligned, when viewed as is on modern displays, interlaced video can be visually unpleasant due to the combing effect that occurs when there is visible movement between the fields of the same frame.
At Vimeo, we detect interlaced video based on the metadata of uploaded files, and we deinterlace it by discarding the bottom field of each frame and interpolating the corresponding rows of the top field to create a full frame. However, deinterlacing in thumbnails was overlooked, and we were still generating the images from the original, interlaced video. We solved this issue and further improved the quality of the interpolation for thumbnails by integrating an optimized implementation of the nnedi3 neural network-based spatial deinterlacer called znedi3. The result is more visually pleasing and closer to the look of the transcoded video.
Vimeo 360 thumbnails for humans
When we introduced support for 360° videos as Vimeo 360 back in 2017, thumbnails were unfortunately left out and were generated directly from the source, showing a distorted panorama instead of a normal perspective view. When rendering Vimeo 360 videos, our player uses a 3D rendering library to wrap the panoramic frames around a sphere and back onto a 2D plane to match how we see the real world. This would be too cumbersome to reuse for thumbnail generation, so we had to implement these transformations independently.
Now, when a Vimeo 360 video is uploaded, we create an output image that is half the size of the input to account for the smaller field of view, and for each (x, y) pixel inside it, we perform an inverse gnomonic (or rectilinear) projection to obtain spherical coordinates, given as (λ, Φ). We then remap those spherical coordinates into the equirectangular projection used by the original video. Finally, we consider all the pixels in the input image surrounding the coordinate we calculated and use bilinear interpolation to obtain an accurate reprojected value for each pixel in the thumbnail.
For a given point P in a thumbnail, we find the point P₁ on the surface of the sphere, then identify the coordinate in the 360° equirectangular panorama that matches it.
While these changes apply only to a small portion of uploads on Vimeo, newly generated thumbnails will now look good for all videos without any intervention from uploaders. Most uploaded videos don’t use custom thumbnails, so by making our automatically generated ones look better and representative of the videos as displayed, we know that user engagement on interlaced and Vimeo 360 videos will increase. If you have videos of those types in your library, you can regenerate thumbnails directly from the video settings to take advantage of these changes.
The architectural changes resulting from these projects will also help us implement more innovative tech and improvements, like HDR rendering of thumbnails, or normal-looking GIFs for Vimeo 360 videos, without worrying about inconsistencies in our systems or the maintenance burden of having duplicate implementations.