Image compression: convolutional neural networks vs. JPEG

Published in

Deelvin Machine Learning

9 min readApr 19, 2022

Introduction

Image compression-decompression is of paramount importance these days because new technologies allow users to transfer good-quality pictures while minimizing internet traffic. In the first days of the internet, simple codecs dealt with image compression. Now due to Machine Learning development, neural networks can solve the compression-decompression task in a more optimal way. An example of image compression is shown in Figure 1.

Figure 1. Example of image compression using JPEG compression method: initial image on the left side, compressed image with JPEG method(60% quality) in the middle, absolute difference defined by formula (Original — Compressed).abs() on the right

In this article, we compare different methods and models for image compression-decompression. We use several machine learning models (convolutional neural networks, such as Factorized Prior Autoencoder [5], nonlinear transform coder with factorized priors [4], and hyperprior model with non zero-mean Gaussian conditionals [6]), and computer vision method employing libraries for image processing (JPEG compression method made via PIL for python [10, 11]), and compare their performance against several metrics. To start with, we describe different aspects of the compression, such as quality, applied metrics, and show the task schema. After that, we present the dataset which is used for the evaluation of several models. Then we show how to use the JPEG compression method. Furthermore, we include the models’ description and code how to run machine learning models. Finally, we present the results obtained from running the JPEG compression method and machine learning models on the dataset and analyze the results in Conclusion section.

Compression-decompression description

Сompression-decompression task involves compressing data, sending them using low internet traffic usage, and their further decompression. The objective of the process is to achieve minimal difference between the original and the decompressed images as well as obtain the same image quality after compression-decompression as before data transfer.

The schema for a compression-decompression task is presented in Figure 2:

Figure 2. Schema for a compression-decompression method. Data is the initial image file. An encoder is a compression process, data compressed is a file after compression, a decoder is a decompression process, Data* is a decompressed file.

To compare the performance of different methods we, first, measure compression coefficient and, after that, we apply SSIM and PSNR metrics to measure similarities between the original image and the decompressed image (all these metrics are described in the section Metrics below).

As we demonstrate in the Results section, different methods achieve different objectives: some produce high-quality image results while having small compression efficiency, others reach high compression efficiency while producing low-quality image results.

Dataset

We selected 10 images to compare and test different methods for a compression task. The dataset represents 5 bottles of Italian wines and 1 bottle of sauce (we chose this type of picture to further use the methods for the bottle detection task as part of the ‘Bottle detection and classification’ company’s project). Examples of images are presented in Figure 3:

Figure 3. Dataset for experiments with image compression methods (test data).

JPEG compression method

For the JPEG compression method, we employ the PIL library for python to compress .bmp images to .png (code for running this is posted in GitHub), and JPEG format (Joint Photographic Experts Group)[10], which is a standard image format for containing lossy and compressed image data. The format was introduced in the early ‘90s, and since then, it became the most widely used image compression standard in the world[11]. The main basis for JPEG’s lossy compression algorithm is the discrete cosine transform: this mathematical operation converts each frame/field of the video source from the spatial (2D) domain into the frequency domain. The JPEG standard specifies the codec, which defines how an image is compressed into a stream of bytes and decompressed back into an image.

JPEG compression code:

from io import BytesIO
from PIL import ImageIMAGE_FILE = '1.bmp' # image file name
im1 = Image.open(IMAGE_FILE)

# here, we create an empty string buffer    
buffer = BytesIO()
im1.save(buffer, "JPEG", quality=60) # compressed file

Machine learning models

We tested several machine learning models (code for testing is posted in GitHub) and chose the most optimal models (which are effortless to run, require minimal GPU, and can be evaluated using the selected metrics).

Model 1 — ‘Factorized Prior Autoencoder’

The model is taken from the paper “Variational image compression with a scale hyperprior”[5]. The architecture is shown in Figure 4:

Figure 4. The architecture of the proposed network ‘Variational image compression with a scale hyperprior’.

We employed TensorFlow framework[9] to compare the models because all the models can be run within the same framework, and it is convenient for our task. We used Google Colab to run the models because it provides free GPU. Below, we show the code for running the framework for Factorized Prior Autoencoder model (installation instructions in Colab).

First, install tensorflow-compression library:

!pip install tensorflow-compression

Second, clone the project to Colab:

![[ -e /tfc ]] || git clone https://github.com/tensorflow/compression /tfc
%cd /tfc/models
import tfci  # Check if tfci.py is available.

Third, run the model.

Compression in TensorFlow for Factorized Prior Autoencoder optimized for MS-SSIM (multiscale SSIM) is the following:

!python tfci.py compress bmshj2018-factorized-msssim-6 /1.png

bmshj2018-factorized-msssim — model name;
number 6 at the end of the name indicates the quality level (1: lowest, 8: highest);
/1.png — input file name (image).

We experimented with several quality levels, and in the result table, we include the models which give an approximately similar performance for SSIM metrics (around 0.97), namely, bmshj2018-factorized-msssim-6 in Table 2.

This script runs compression and produces a compressed file with .tfci name in addition to the target input image (1.png). This file 1.png.tfci — is so-called compressed data from Figure 1.

Decompression in TensorFlow:

!python tfci.py decompress /1.png.tfci

This script produces a file with extension .png in addition to the compressed file name, for example, 1.png.tfci.png. The decompression code is the same for other models described below.

Model 2 — Nonlinear transform coder model with factorized priors

The second model is a nonlinear transform coder model with factorized priors (entropy models) optimized for MSE, with GDN (generalized divisive normalization) activation functions, and 128 filters per layer[4]. Its architecture is shown in Figure 5. It was also run on TensorFlow framework[9].

Figure 5. Schema of model architecture for nonlinear transform coder with factorized priors (entropy models) optimized for MSE, with GDN[12].

GDN is typically applied to linear filter responses z = Hx, where x is image data vectors; or applied to linear filter responses inside a composite function such as an ANN (artificial neural networks). Its general form is defined as

where y represents the vector of normalized responses, and vectors β, ε and matrices α, γ represent parameters of the transformation (all non-negative).

Compression in TensorFlow for nonlinear transform coder model with factorized priors (entropy models) optimized for MSE, with GDN (generalized divisive normalization) activation functions:

!python tfci.py compress b2018-gdn-128-4 /1.png

The number 1–4 at the end indicates the quality level (1: lowest, 4: highest). We experiment with different levels of quality and choose the model which produces SSIM quality of approximately 0.97 (b2018-gdn-128–4 in Table 2).

Model 3 — Hyperprior model with non zero-mean Gaussian conditionals

The third model is hyperprior model with non zero-mean Gaussian conditionals (without autoregression), optimized for MS-SSIM (multiscale SSIM)[6]. The architecture of the figure is shown in Figure 6. It was also run on TensorFlow framework[9].

Figure 6. Model architecture for hyperprior model with non zero-mean Gaussian conditionals (without autoregression) [6].

Compression in TensorFlow for hyperprior model with non zero-mean Gaussian conditionals (without autoregression), optimized for MS-SSIM:

!python tfci.py compress mbt2018-mean-msssim-5 /1.png

The number 1–8 at the end indicates the quality level (1: lowest, 8: highest). We experiment with different levels of quality and choose the model which produces SSIM quality of approximately 0.97 (mbt2018-mean-msssim-5 in table 2).

Metrics

The performance of image compression-decompression methods can be evaluated using several metrics [4]:

Compression efficiency/compression coefficient — the ratio between the compressed and the initial data (image) size,
Image quality (Distortion Measurement) — the difference between the original image and the compressed/decompressed image,
Computational cost — the number of seconds required for computing the compression and the additional physical tool, such as GPU units.

Below, we summarize two metrics used for comparison, namely, compression efficiency/compression coefficient, and image quality.

Compression efficiency/compression coefficient

Formula for this metric is the following:

N_compression = size(compressed data)/ size(uncompressed data).

N_compression is a compression coefficient equal to the size of the compressed data divided by the size of the initial data. Size(compressed data) — is the file size in bites after the models’ compression. Size(uncompressed data) equals the image’s height*width*channels in bites. Our dataset for evaluation has 10 equal images with width 576px, height 768px and channels =3, and size of the initial uncompressed data 576*768*3 = 1,327,104 bits = 165,888 bytes= size(uncompressed data).

Image quality

To compare the quality of compression we chose three metrics. We measure the quality of the compressed files using the formula:

N_quality = Quality_metric(Data, Data*),

where Quality_metric is either SSIM or PSNR. Below, we show formulas for those metrics.

SSIM

In image comparison, the mean squared error (MSE) is simple to implement, but it is not highly indicative of the perceived similarity. Structural similarity aims to address this shortcoming by taking texture into account[7].

where x, y — images to compare, μ — the average of image x or y respectively, σ — the variance of x and y respectively, c1 and c2 — two variables to stabilize the division with weak denominator.

from skimage.metrics import structural_similaritySSIM = structural_similarity(img1, img2, multichannel=True)

PSNR

Compute peak signal-to-noise ratio (PSNR) between images[8].

R is the maximum fluctuation in the input image data type. For example, if the input image has a double-precision floating-point data type, then R is 1. If it has an 8-bit unsigned integer data type, R is 255.

import mathfrom torch import Tensorimport torch.nn.functional as F
def psnr(x: Tensor, x_hat: Tensor) -> float:     return -10 * math.log10(F.mse_loss(x, x_hat).item())

Results

JPEG compression method using classical codecs for image compression via python library PIL gave the following results (see Table 1). For equal comparison, we intentionally chose the parameters to compress the images in such a way that SSIM would be approximately 0.97 (that means, images were compressed with a certain compression coefficient N_compression, which would give SSIM close to 0.97).

Table 1. Results were obtained for JPEG compression method.

In Table 2, we included models for neural network compression-decompression:

Table 2. Results obtained for three different neural networks models: FactorizedPriorAutoencoder: bmshj2018-factorized-msssim-6[5], nonlinear transform coder with factorized priors: b2018-gdn-128–4[4], hyperprior model with non zero-mean: Gaussian conditionals mbt2018-mean-msssim-5[6].

Conclusions

We compare the classical JPEG compression method with three different machine learning models for compression-decompression task with TensorFlow framework. Several metrics are applied to compare the performance. The results are as follows: with relatively equal SSIM quality (about 0.97), the best compression was produced by the mbt2018-mean-msssim-5 model (N_compression is approximately 0.13). The next best compression model is bmshj2018-factorized-msssim-6 (N_compression is approximately 0.23). After this, follows the classical JPEG compression method with N_compression of around 0.288. The latest in quality is the b2018-gdn-128–4 model (N_compression is approximately 0.29). At the same time, the PSNR metrics for all neural networks models are approximately the same (about 35) (meaning that the quality for MSE of images after compression-decompression is almost the same for every model). Also interesting to mention, that the PSNR metric is higher for the JPEG method.

The results indicate that classical codecs for image compression (JPEG compression method) produce worse compression (N_compression is higher or equal to one produced by the neural networks), which means that the size of the compressed files is bigger than the ones produced by neural networks. Therefore, we can conclude, that two machine learning models (namely, Factorized Prior Autoencoder and hyperprior model with non zero-mean Gaussian conditionals) produce better results in terms of compression efficiency with the same decompression quality (with similar SSIM), but those methods require more resources to be employed (GPU units).

Code for the article is available here: https://github.com/yustiks/video_compression

This project was conducted by Deelvin. Check out our Deelvin Machine Learning blog for more articles on machine learning.

References:

Johannes Ballé, Valero Laparra, Eero P. Simoncelli. “End-to-end Optimized Image Compression”, Computer Vision and Pattern Recognition, 2017.
Matthew Muckley, Jordan Juravsky, Daniel Severo, Mannat Singh, Quentin Duval, and Karen Ullrich. 2021. “NeuralCompression”. https://github.com/facebookresearch/NeuralCompression.
Sebastiano Battiato, “Image Compression Basis”
J. Ballé: “Efficient Nonlinear Transforms for Lossy Image Compression” Picture Coding Symposium (PCS), 2018: b2018-gdn-128-[1–4]
Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick
Johnston. 2018. “Variational image compression with a scale hyperprior.”
arXiv:1802.01436 [eess.IV]: bmshj2018-factorized-msssim-[1–8]
D. Minnen, J. Ballé, G.D. Toderici: “Joint Autoregressive and Hierarchical Priors for Learned Image Compression” Adv. in Neural Information Processing Systems 31 (NeurIPS 2018): mbt2018-mean-msssim-[1–8]
Z. Wang; E.P. Simoncelli; A.C. Bovik. “Multiscale structural similarity for image quality assessment”. 2003.
https://www.mathworks.com/help/vision/ref/psnr.html
https://github.com/tensorflow/compression/
https://en.wikipedia.org/wiki/JPEG
Hudson, Graham; Léger, Alain; Niss, Birger; Sebestyén, István; Vaaben, Jørgen (31 August 2018). “JPEG-1 standard 25 years: past, present, and future reasons for a success”. Journal of Electronic Imaging. 27 (4): 1
J. Balle, V. Laparra, E. P. Simoncelli, “END-TO-END OPTIMIZED IMAGE COMPRESSION”, 2017.

Image compression: convolutional neural networks vs. JPEG

Written by Iustina Ivanova