Deeplearning.ai: CNN week 1 — Convolutional Neural Network terminology

From edge filtering to convolutional filters

Nguyễn Văn Lĩnh
datatype
4 min readJan 31, 2018

--

Applications: e.g Neural Style Transfer

Edge detection,

  • Vertical edge detection, 6x6 grayscale image, 3x3 filter/kernel (sobel, robert, prewitt filter) -> 4x4 image. In images with clear edges, visualize the image and the Prewitt filter.
  • Vertical vs horizontal
Vertical edge detection example. Source: deeplearning.ai C4W1L02
Vertical vs horizontal edge detection example. Source: deeplearning.ai C4W1L03
  • prewitt filter → sobel filter: more weighted for the center, more robust
  • What about auto learn the filter ? hard-code filter -> data-learned filter, robust with the rotation. data-dependent filter.
Can learn w from data. Source: deeplearning.ai C4W1L03

Padding:

  • The output filtered is depended on how “fit” the filter [f x f] to the image [n x n]. Then, the output is [(n-f+1), (n-f+1)]. “f” usually odd.
  • The “edge” pixels are used less times in filtering step, rather than more center pixels. Loss the informative and size of the ouput dismiss very fast.
  • Solution: padding edge “p” pixels to expand the size of the input image -> (n+2p-f+1) x (n+2p-f+1)
  • “Valid” convolution: no padding, “Same” convolutions: padding, #input == #output => p = (f-1)/2
Blue area is added to keep the result is equal size with the input image. Source: deeplearning.ai C4W1L04

Strided convolution:

  • Sweeping the filter not in 1 by 1 pixel, but jump 2 two pixels step or more => smaller output image.
  • Jumping “s” pixels => output[( (n+2p-f)/s+1) ; ( (n+2p-f)/s+1)]
  • Typical “covolution” operator in math/signal processing textbooks: flip and multiply. The “covolution” of deep learning is “cross-correlation”.
Example of the convolution when stride = 2. Source: deeplearning.ai C4W1L05

Convolution over volume:

  • RGB images (3 channels= Nc), including 3 layers of matrices => 3 covolution matrices to do the filterting for each channel.
  • The mechanism: in each area, doing the filtering for 3 channel by 3 layers respectively. Add all of them together to get the final number ( 1 number).
Doing the covolution on 3 channels in each 6x6 element. Source: deeplearning.ai C4W1L06
  • Multiple filters: use different filters and stackup the results. Each filter for different purpose.
Doing hortizontal and vertical filter “cube” covolution and stack the results. Source: deeplearning.ai C4W1L06

One layer of the covolutional neural network

  • Input -> Filtering -> output_tmp -> add bias -> relu -> final ouput
  • Parameters are the number of elements in these filters cubic. NOT the number of weights in the full connected neural network. Less parameters + less overfitting + easier to learn + more robust.
Example of a CNN layer. Source: deeplearning.ai C4W1L07

Simple convolutional network

  • One convolution layer: Images input (3 channels) -> filtering -> output (affected by padding, stride, size of the filter) x (number of filters).
  • Input -> many convolution layers -> flattening -> full connected layer -> softmax -> outputs.
  • Type of layer in the convolutional network: convolution (conv) + pooling (pool) + fully connected (fc)
A Convolution Network. Source: deeplearning.ai C4W1L08

Pooling

  • Max operator: break image -> region -> pick the max value of each region.
  • Hyper parameter: stride (s) and filter size (f), do not need to be learned.
  • Reserve the key feature by max operator, if the feature is dismiss in the output, it is not “key” feature.
Example of Max pooling for f= 2, s= 2. Source: deeplearning.ai C4W1L09
  • The average operator can be used in the pooling step.

CNN example:

  • Andrew Ng convention: Convolution+pool = one layer.
  • LeNet-5: Input -> layer 1 -> layer 2 -> fully connected layer 3 -> fully connected layer 4 -> softmax -> output
LeNet-5 structruture. Source: deeplearning.ai C4W1L10
  • Activation size decrease fastly, but the number of parameters increase (still a lot smaller than the full connected weights).
Network size in each step. Source: deeplearning.ai C4W1L10

Why CNN ?

  • Parameters sharing and Sparsity of the connection
  • Parameters sharing in the cubic of features, it learns the data-dependent filter based on areas/parts of input images. Useful in one part is also useful in other parts. Less parameters
  • Sparsity of the connection: each output element is depended only some number of input : as this element is the result number of the convolution operator of the filter on a part of the image. Less overfitting, less translation invariance.
  • How to train ? need a “cost” function, gradient descent

--

--