Review — Nasiri VCIP’20: Prediction-Aware Quality Enhancement of VVC Using CNN (VVC Filtering)

Network Inspired by EDSR, With Intra Prediction Signal as Additional Input, Average BD-Rate Gain of 6.7%, 12.6% and 14.5% on Y, U and V Components, Respectively.

Sik-Ho Tsang
Nerd For Tech
Published in
4 min readMar 21, 2021


Quality Enhancement (QE) Framework in VVC

In this story, Prediction-Aware Quality Enhancement of VVC Using CNN, (Nasiri VCIP’20), by IRT b<>com, Univ Rennes, INSA Rennes, and AVIWEST, is reviewed. In this paper:

  • Convolutional Neural Networks (CNN) to enhance quality of VVC coded frames after decoding in order to reduce low bitrate artifacts.
  • The prediction information of intra frames is used for training as well.

This is a paper in 2020 VCIP. (Sik-Ho Tsang @ Medium)


  1. Relationship Between Intra Coding and Compression Artifacts, and the Motivation
  2. Proposed Network Architecture
  3. Experimental Results

1. Relationship Between Intra Coding and Compression Artifacts, and the Motivation

A 16×16 block, k, and its two best IPMs (i = 38, 50), with similar costs but different rate-distortion trade-offs resulting in distinct compression loss patterns

There are 67 Intra Prediction Modes (IPMs) in VVC, representing 65 angular IPMs, plus DC and planar.

Strict bitrate constraints might cause a situation where the best IPM minimizing the R-D cost of a block, is not necessarily the IPM that models the block texture most accurately.

  • As can be seen from the above example, despite their similar R-D costs, these two IPMs result in very different reconstruction signals, with different types of compression loss patterns.
  • This behavior is due to two different R-D trade offs of the selected modes.

The task of quality enhancement (QE) for a block, frame or an entire sequence could be significantly impacted by different choices of coding modes (e.g. IPM) determined by the encoder.

This assumption is the main motivation in this work to use the intra prediction information for training of the quality enhancement networks.

2. Proposed Network Architecture

Network architecture of the proposed method using the prediction and the reconstruction signal as the input
  • The network is inspired by EDSR.
  • The first convolutional layer receives the reconstruction C and prediction frames P as input by concatenation.
  • In the next step, after one convolutional layer, 32 identical residual blocks (ResNet), each composed of two convolutional layers, and one ReLU layer in between, are used.
  • Batch normalization is applied after the residual blocks.
  • A long skip connection between the input of the first and the last residual block is used.
  • Two more convolutional layers after the residual blocks are used.
  • Finally, the last convolutional layer has one feature map which constructs the output frame ^O.
  • where F1() and F2() are 3×3×256 convolutional layer, with and without the ReLU activation layer, respectively.
  • F3() is a 3×3×1 convolutional layer with ReLU.
  • The L2 norm with respect to the original frame O is used as the cost function of the training phase:
  • One network for each component in different QPs is trained with the above network architecture.
  • Two image datasets of DIV2K and Flickr2K are used for training.
  • VTM-5.0 is used with all-intra configuration, using 6 QPs, between 22 and 47.
  • 64×64 patches with mini batch of 32 is used.
  • At the end of the training, a total of 36 trained models were obtained for 3 components in 6 QPs.

3. Experimental Results

BD-Rate (%)
  • At the CTC QP range, the proposed method can achieve an average BD-rate gain of 6.7%, 12.6% and 14.5% on Y, U and V components, respectively.
  • The proposed method with the prediction signal outperforms the proposed method without the prediction signal by 0.9%, 8.1% and 4.8%, on Y, U and V components, respectively.
  • Compared to the other two JVET solutions, the proposed method shows a significant gain.
  • At high QP range, where artifacts are significantly stronger, the proposed method can achieve an average BD-rate gain of 8.3%, 15.8% and 16.2% on Y, U and V components, respectively.
  • The achieved BD-rate gain of using the prediction signal is relatively higher for the U and V. It is because in VVC, there are advanced tools for chroma coding to exploit the redundancies. Examples of such tools are Luma Mapping with Chroma Scaling (LMCS), Joint Cb-Cr residual coding (JCCR), Cross-Component Linear Modeling (CCLM) and a specific chroma IPM called luma Derived Mode (DM).



Sik-Ho Tsang
Nerd For Tech

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.