Review: DCAD — Deep CNN-based Auto Decoder (Codec Post-Processing)

Achieve 5.0%, 6.4%, 5.3%, 5.5% BD-Rate Reduction for All Intra, Low Delay P, Low Delay B, and Random Access Configurations Respectively Compared to HEVC

Sik-Ho Tsang
Aug 4 · 3 min read

In this story, Deep CNN-based Auto Decoder (DCAD), by Sun Yat-sen University, is briefly reviewed. For some applications limited by the bandwidth and storage, e.g. surveillance, the high compression ratio is usually used, which will heavily affect the accuracy of the follow-up computer vision tasks, such as retrieval, detection, and recognition, using the decoded videos as inputs. By using DCAD, artifacts are removed and details are enhanced for the HEVC-compressed videos after decoding. This is the paper in 2017 DCC with more than 30 citations. (Sik-Ho Tsang @ Medium)


Outline

  1. DCAD Network Architecture
  2. Experimental Results

1. DCAD Network Architecture

DCAD Network Architecture
  • A stack of 3×3 convolutional filters are used, just like VGGNet.
  • Only ReLU is used without max pooling.
  • The depth is 10. Depth of 20 is also tried but without significant coding gains.
  • Mean square error (MSE) is used as loss function.

2. Experimental Results

BD-rate (%) for DCAD against HEVC for different configurations
  • DCAD achieves 5.0%, 6.4%, 5.3%, 5.5% BD-Rate reduction for All Intra (AI), Low Delay P (LDP), Low Delay B (LDB), and Random Access (RA) configurations respectively compared to HEVC.
Time (Sec) To Decode One Frame
  • Using GeForce GTX 980T, for a 1080p frame, with the use of CUDNN, 0.0090 sec is needed.
  • And without CUDNN, 0.653 sec is needed.
Some Visualizations
Some Visualizations

Reference

[2017 DCC] [DCAD]
A Novel Deep Learning-Based Method of Improving Coding Efficiency from the Decoder-End for HEVC

My Previous Reviews

Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]

Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]

Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [LC] [FC-DenseNet] [IDW-CNN] [SDN]

Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet]

Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet]

Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM]

Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD]

Sik-Ho Tsang

Written by

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade