How WebP works (lossly mode)

When it comes to image data on the internet, JPG has been a large standard for some time now. However in 2013, Google (and a set of other open source contributors) were able to create a new image codec algorithm, named WebP, which aimed to compress images smaller than JPG while keeping the same image quality.

How good is it? I’ve seen WebP’s savings in the 24%-35%, depending on the size & complexity of the image; which is pretty impressive, considering how much JPG is already getting out of the file.

For me, the file savings itself is worthy of swapping all my app’s JPGs to WebP. When you add in the fact that WebP is supported in the majority of web browsers, and natively on Android, I’d call it a “Slam Dunk” for most applications. (but that’s my personal opinion ;)

To understand why WebP gets those savings over JPG, we have to take a look at how the codec itself works.

Where does WebP Come from?

The WebP Image file format originates from the VP8 video codec (you may better know it as WebM). One of the strong features of that the VP8 codec is the ability to do intra-frame compression, or rather, each frame of the video is compressed, and then differences between subsequent frames are compressed.

This is where WebP comes from: It’s a single compressed frame of a WebM file.

Or, more accurately, the core of WebP comes from WebM. Since it’s release in 2011, there’s been a lot of changes & updates specifically to the WebP file format that have stayed unique to it as a file type. Namely things like Alpha, Lossless mode, and in some oddly ironic twist, animation support.

Yup, you read that right : WebP is an image format… derived from a video format… which supports animation. (Check out the GIF version, vs the WebM version, vs the WebP version of a dancing banana)

I can honestly say, I never thought I’d put a dancing banana in any of my blog posts… You win this round, internet…

But we’re going to ignore all that extra fancy stuff for now, and just focus on understanding how the lossy mode of the format works.

Lossy mode

The lossy version of WebP encoding is built to compete with JPG for a still image format. And as such, you’ll notice a number of striking similarities in the formats.

MacroBlocking

The first stage of the encoder is to split the image into “macro blocks”. Typical macro block consists of one 16x16 block of luma pixels, and two 8x8 block of chroma pixels. This stage is very similar to the JPEG of converting color space, downsampling the chroma channels, and subdividing the image.

Prediction

Each 4x4 sub block of a macroblock then has a prediction model (aka filtering) applied to it. Filtering is very popular for PNG, which does the same thing per scan-line, however WebP applies filtering using a block method. This is done by defining two sets of pixels around a block: the row above it, A, and the column to the left of it, L.

Using A and L, the encoder will fill in a test block of 4x4 pixels, and determine which one produces values closest to the original block. The different ways these blocks can be filled are called “Predictors”:

  • Horiz prediction — Each column of the block with a copy of the previous column
  • Vertical Prediction — fills each row of the block with a copy of a previous row
  • DC Prediction — fills block with a single value using the average of the pixels in the row above A and the column to the left of L
  • True Motion prediction — a super advanced mode I’m not getting into right now.

It’s worth noting, btw 4x4 luma has 6 additional modes, but you get the picture at this point ;)

Basically, let’s find the best predictor for this block, and output the filtering, called “residuals” which are fed into the next stage.

JPGify it

The final stages of WebP encoding look very similar to our old friend JPG:

  • A DCT filter is applied to the residual values for a block
  • The DCT basis matrix is then quantized
  • The quantized matrix is then re-ordered, and sent to a statistical compressor.

The two main differences being that :

1) The input to the DCT stage isn’t the raw block data itself, but rather the output of the prediction phase

2) The statistical compressor used for WebP is an Arithmetic Compressor, which is similar to the Huffman encoder that JPG uses.

The results

The end result is that WebP feels kinda like an advanced mode of JPG. The prediction phase seems to be the biggest win to reduce unique color coefficients further so that the rest of the pipeline can compress the image data more efficiently. You can check out the case study of WebP vs JPG yourself, or trust that a lot of smart people who care about accuracy and graphs, made sure that the image below was a true representation of things.

In conclusion

The addition of a predictor mode to the JPG process allows WebP to take data compression to a place that JPG simply can’t go, which is why it’s easy to see how WebP can get wins over it’s JPG based cousin.

As mentioned, there’s a separate lossless webP mode.. but that’s a topic for another post.

HEY!

Want to know more about other image formats?
Check out how VectorDrawables work (and making them smaller). Or how JPG files work, and how to make them smaller. Same thing for how PNG files work, or how to make them smaller
We could also talk about becoming a data compression expert (if you’re in to that sort of thing), but I’m sure what you really want is to write faster, smoother Android Applications.