The Impact of artifacts on the accuracy of network prediction.

Maiya Rozhnova
Deelvin Machine Learning
7 min readOct 6, 2020

Application of neural networks is ubiquitous nowadays in addressing various tasks. There are pre-trained networks that are capable of instantly identifying the primary object in a picture. Also, a network pre-trained on a large dataset could be used to tackle other related problems. This is often referred to as transfer learning. In particular, transfer learning is the application of knowledge extracted by a neural network while solving other problems.

While solving real-life problems, we can often transfer data to the network that contain minor artifacts. Then the network will yield wrong answer, even though the correct answer appears to be obvious. For example, when transcoding a video stream, we can get blockiness artifacts, or when photographing, we can get a blurry picture with a blur artifact.

In this article, I will test how image quality affects the prediction accuracy of the neural network. What happens if the network receives an image with a small amount of artifacts as input?

By way of illustration, I will consider three pre-trained networks: InceptionV3, MobileNet and ResNet50 with ImageNet weights. The Keras framework will be used and some characteristics of these networks (Fig. 1) are given on the website https://keras.io/api/applications/

Figure 1: Characteristics of the networks used

I will test if the network is able to distinguish which animal is shown in the picture if I input a corrupted image. Let the original image (Fig. 2) contain an elephant with the image dimensions 1280x852.

Figure 2: Original image

The image is corrupted in different ways: Gaussian blur, adding white noise, as well as the blocking effect that can occur when the image is compressed by codecs (including video codecs). Each artifact type is superimposed in 10 degrees, from weak to strong. Fig. 3 shows blurred images, the difference is in the size of the blur kernel. For example, in Fig. 3 (1), a square core with a side of 11 pixels is used — minor changes, in Fig. 3 (2) — blur with a core with a side of 67 pixels, Fig. 3 (3) — 137 px. I use the blur filter from OpenCV.

import cv2kernel = 11
blured_img = cv2.GaussianBlur(original_img, (kernel, kernel), 0)
Figure 3: Blurred images with 3 levels of distortion (level = 0.1, 0.5, 1.0)

In the second case, white noise is added using the skimage.utils module; here different values are responsible for the strength of the noise standard deviation of the Gaussian distribution. An example of obtaining a noisy image with a standard deviation of 0.1 is demonstrated below. Fig. 4 (1) presents the result. In fig. 4 (2 and 3), the standard deviation takes the values 0.5 and 1.0, respectively.

from skimage.util import random_noise
from skimage import img_as_ubyte
st_dev = 0.1
noise = random_noise(orig_img, mode=’gaussian’, var=st_dev ** 2, seed=44, clip=True)
noisy_img = img_as_ubyte(noise)
Figure 4: Noisy images with 3 levels of distortion (level = 0.1, 0.5, 1.0)

The third type is a frame from a video compressed with the H.264 codec, with a varied Constant Rate Factor (CRF) (Fig. 5). The experiment yields 10 corrupted pictures with CRF from 28 to 51, where CRF = 51 corresponds to the worst quality and strongest compression (Fig. 5 (3)). A small level of blockiness is shown in Fig. 5 (1) with CRF = 28 and Fig. 5 (2), where CRF = 38 was used. Such images can be obtained using the ffmpeg video utility. The commands are given below.

cmd_encode = ‘ffmpeg -y -framerate 1 -i concat:elephant.jpg -vcodec h264 -r 1 -g 1 -x264-params nal-hrd=cbr -x264opts no-deblock -crf 51 out.avi’cmd_decode = ‘ffmpeg -y -i out.avi -r 1/1 block.png’
Figure 5: Blocking image with 3 levels of distortion (level = 0.1, 0.5, 1.0)

So we have the original image. We get 30 corrupted ones: 3 artifacts, 10 levels each. For the original image, all the networks in question predict that the image depicts the “African elephant” class. Let’s check how the score of the African elephant class will change if the network receives a corrupted picture as an input.

The results are shown below in the form of graphs. Fig. 6 shows graphs depicting how the score of the tracked class changes with the level of image corruption. The blue dotted line is the score of the original image. Level of distortion — some relative value, varies from 0 to one, here 0 is the score of the original image, then there are scores of 10 damaged images, each line for one type of distortion (blur, noise and blocking) and the level of distortion 1 corresponds to the worst image. Figures 7 and 8 present similar results for MobileNet and ResNet50, respectively.

Figure 6: Dependence of the “African elephant” score on the level of distortion for the InceptionV3 network

I will now demonstrate what the top-5 predictions of the InceptionV3 network look like for the original and some damaged pictures.

InceptionV3 top-5

original image: (‘African_elephant’, 0.8060587)(‘tusker’, 0.14671499), (‘Indian_elephant’, 0.003548297), (‘Komodo_dragon’, 0.00044263015), (‘mushroom’, 0.00026282403)

The top-5 predictions of the InceptionV3 network for some damaged pictures
Figure 7: Dependence of the “African elephant” score on the level of distortion for the MobileNet network

Interestingly, some artifacts increase the score. In particular, these are blur and noise with the level 0.1 (weak blur and low noise) for InceptionV3 and MobileNet networks (Fig. 6, 7).

Figure 8: Dependence of the “African elephant” score on the level of distortion for the ResNet50 network

Also, for all considered networks, there is a certain value of the level of distortion of the blockiness line, at which an increase in the score is observed.

Experiment 1. Is it a coincidence that weak noise and blur increases the score? Let’s run the following experiment. Let’s take 1500 elephant images from the COCO dataset. I will spoil each image with weak noise (deviation = 0.1), add blur (area of ​​blur kernel / image area = 0.00015) and take blockiness with the level of 0.5 (CRF = 38). For each image, each network and each of the 3 artifacts, the difference in the scores of the ‘African_elephant’ class between the damaged and the original image is obtained. The table below shows average values ​​of the differences.

Mean difference in the scores of the ‘African_elephant’ class between the damaged and the original image

Conclusion 1. It can be seen that the blur and blockiness for all 3 networks tend to decrease the score (i.e., on average, Score(‘African_elephant’) distorted — Score(‘African_elephant’)original <0). At the same time, weak noise on the InceptionV3 network gives an average score gain of about 1%. This is very strange, but interesting. Perhaps this is a sampling error.

Experiment 2. By how much the picture quality needs to be degraded for the network to make a mistake? The experiment was carried out on 10,000 random pictures with different classes from the COCO dataset. A picture is selected and noise is superimposed on it with varying degrees. The resulting noisy images are sent to the network and the minimum noise level is selected at which the network makes a mistake according to the top-1 metric, i.e. the other class gets the maximum score. This experiment is performed on each of 10,000 randomly selected images. The results are averaged and presented in the following table.

Mean minimal noise level when network mistakes

Conclusion 2. Here the InceptionV3 network is selected. On average, for its error, noise with a minimum level of 0.26 is enough. At the same time, MobileNet and ResNet50 networks, on average, make mistakes with less noisy pictures with a noise figure of 0.19.

Conclusion

In real life, images that are submitted to standard pre-trained CNN networks differ in quality from those on which the model has been trained. Consequently, the results will be much lower than expected. For example, for the original picture (Fig. 9(1)) the ResNet50 network predicts an object of class “steam locomotive” (with score=0.999) and for the picture with a noise figure of 0.2 (Fig. 9 (2)), the ResNet50 network is mistaken and responds with “chainlink fence” (with score=0.3) and the score for the “steam locomotive” class is 0.000213.

Figure: 9 Original image and noisy image with level = 0.2

Therefore, be careful when working with pre-trained models.

We conduct various studies in order to understand how well the models are trained and incorporate the best results into our product ML Visualizer.

--

--