Understanding Image Compression

Published in

Big O(n) Development

7 min readApr 19, 2019

Back in the 1980s, Microsoft developed a device agnostic image rendition solution for bitmaps: the BMP file format.¹ Those who used Microsoft Paint before would relish the gigantic file size produced from simple strokes and color fills.

The idea behind the BMP file format is that every pixel is assigned a color value. So if I have a 480x360 bitmap supporting 16 million colors (24 bits), The bitmap would end up to be somewhere north of 4MB in size.

Figure 1 — The structure of the bitmap image file (https://en.wikipedia.org/wiki/BMP_file_format)

This is of course not ideal if one wishes to render multiple high quality images. Hence, the question must be asked. “Is there a way to optimize the bitmap representation such that one can still preserve the visual integrity of the image but with less resources to use?”

The answer is yes. It turns out that for the most part, users are more interested in images as visual aids rather than thoroughness. For example, suppose I am given an image of the Golden Gate Bridge as shown below:

While I know that the actual bridge when seen in person comes with far greater detail, whatever I see here is good enough for its purposes. So most people are actually willing to trade off visual integrity for speed as long as the tradeoff is acceptable. For example, if the degradation in subjective visual quality is 1% yet the user gets to enjoy a space savings of 90%, it is for the most part, a welcome trade.

Hence, one of the key consideration in compressing bitmaps is to optimize for visual indistinguishability. That is, removing certain elements of an image where the naked eye is unable to identify the difference. This is called lossy compression, which constitutes most types of compression one sees on video streaming on online images. This is how H.264, HEVC, HEIF and JPEG codecs work.

Figure 3 — Image Compression optimizes for visual indistinguishability

Other types of images such as PNG are compressed lossless. The idea is that one preserves the full visual integrity of the original image but consuming lesser bytes to represent the same thing. Still, there are other file formats such as WebP that supports both lossy and lossless compression. Why do we want to compress things lossless? As above, it is based on what we want to achieve for the image. For example, icons in apps are generally in PNG (and recently, in vector graphics format).

Lossless image compression consists of bigger file size as opposed to lossy compression. Reason primarily being there are less avenues for compressing the former than the latter. For the purposes of this blog, we’ll talk about lossy compression.

JPEG Compression Algorithm

Figure 4 — An overview of JPEG Compression

The general steps taken in JPEG compression is done to this day in video compression. Over the years, the algorithm has improved but the general concepts remain the same.

STEP 1. Color Space Conversion

Image conversion begins by converting the raw RGB image format to its Chroma (r and b) and Luminance (Y) values. The idea is that our eyes are more sensitive to changes in luminosity than it is to color. Hence, we are actually able to downsample the range of colors in the image without noticeably affecting the visual quality of the image. This is done through Chroma Subsampling as explained below.

STEP 2. Chroma Subsampling

Many gamers may recall that subsampling is among the possible knobs they can set to optimize their gaming experience. The main idea is that the more subsampling one does, the faster the game performance is. This is because the game requires less range of colors to render.

Chroma subsampling is denoted by J:a:b with J being the number of pixels to subsample, a representing the top row pixels and b representing the bottom row pixels.

In the case of 4:4:4, it means that in a 4x2 pixels, the first row (a) must have 4 colors and so does the second row. In the case of 4:2:2, it means that in a 4x2 pixels, the first row should be reprented by two colors and so does the second row. In the case of 4:2:0, it means that in a 4x2 pixels, the first row should be represented by two colors and the second row copies over what’s on the first row.

As you can see. through chroma subsampling, one is able to reduce the range of colors by as much as 75%.

STEP 3. Discrete Cosine Transform

JPEG compression is done by slicing the original image into 8x8 pixels chunk. In this step, we assign coefficients for the 8x8 chunk based on the signals shown below.

Figure 6 — Discrete Cosine Transform (DCT). Left image is a 8x8 signal reference used to cast weight on the original image. The right image is the resulting chunk after going through DCT.

The idea here is that as the human eye move from the upper left of the DCT reference to the lower right, the more difficult it is to perceive. So usually, what happens is that in assigning coefficients, the upper left gets a very high value and it goes down as one moves down diagonally.

Here’s how things might look like in numeric format:

Figure 7 — Original chunk (left) New chunk after DCT is applied (right)

STEP 4. Quantization

After DCT is applied, the next step is called quantization. Here, a quantization table is applied to the resulting DCT values. The table differ between compression algorithms and some software allows the user to set the amount of quantization they want to use. Below is a standard table:

Figure 8 — A Standard Quantization Table

Notice that the digits go higher as one moves from the upper left to the lower right. This is intentional. The idea of quantization is that the resulting data from the DCT is divided with the quantization table. This is where the compressed image lose much if its data. Because the lower right numbers are high, most of its values will eventually become zero after division. Here’s how it might look like:

Figure 9 — Quantization Table (left) Resulting values (right)

STEP 5. Entropy Coding using Huffman Coding

Huffman coding is the last step in compression. Here’s how it works.

Suppose I want to represent a range of numbers using bits. Moreover, I want to represent them such that I consume the least amount of bits for the representation. How it might be done is that the highly repeated numbers are given lower bits. For example, if zero is represented a lot, I generally would assign it lower bits. A more thorough explanation of the Huffman coding can be found here.

The idea here is that you use less bits to represent a longer set of values. Huffman coding is a lossless compression algorithm which is also used in compressing text files. Doing this, it is possible to save as much as 50% of the original size.

What’s Next?

Compression is simply one part of the equation. When one has to render the image, he has to reverse the compression process before heis able to render the image on screen.

JPEGs are roughly around 90% smaller than its bitmap counterpart. To this day is it still the most popular image compression format available. Newer algorithms such as HEIF (2013) and AVIF (2018) increases the range of pixels one can use for the compression algorithm.²

Despite JPEG’s popularity, newer formats provide better compression. WebP for example in general is around 70% the size of JPEG and yet still is able to maintain the visual integrity of the image. Hence, Google (the company who developed WebP) has been nudging developers to reencode their images from JPEG to WebP. Support for WebP however, is still less than JPEG. Hence, it is necessary to support both formats as a result.

¹ “The BMP File Format.” Prepressure. Accessed April 19, 2019. https://www.prepressure.com/library/file-formats/bmp.

² Netflix published the first set of AVIF images in 2018. As of this post, the images are still accessible here. Companies such as Firefox and Microsoft will support this image soon in their software offerings.