Convolutional Neural Network

4 min readJun 1, 2017

Architecture Overview
ConNet Layers
ConNet Kernel
CIFAR-10（Example）

1. Architecture Overview

Training ConNet model is train the weights and bias through various type of layers:Conv layer,Pooling layer,Normalization Layers and Fully-Connected Layers to aggregation the value ,take the value into loss function (usually softmax), and back forward backpropagation update weights, loop the process.

The whole training process is non-linearity, from the raw image pixels on one end to class scores at the other.

Below is the normal NetConv process:

In the real training, there should be an activation function insert into two layers,except fully-connected layer . so the process is :Conv-> ReLU -> Pool -> ReLU -> Conv -> ReLU -> Pool -> FullCon -> FullCon

A ConvNet is made up of Layers. Every Layer has a simple API: It transforms an input 3D volume to an output 3D volume with some differentiable function that may or may not have parameters.

2.Conv Layers

The layers of a ConvNet have neuron arranged in 3 dimensions:width,height and depth. E.g. the input images in CIFAR-10 are an input volume has dimension:32*32*3, depth represents RGB.The final output layer have dimensions 1*1*10,bacause by the end of the ConvNet arthitecture we will reduce the full image into a single vector of class scores,arrange along the depth dimension.

Convolutional Layer

This layer is the core layer of ConvNet, here is a visual demo to show how this layer work.

The Conv Layer parameters consist of a set of learnable filters:

filter is a small dimension of matrix. width * height * 1
filter attributes :size, stride,zero-padding.
every depth of layer share a same filter.
filter is one type of feature.

Every filter slice the layer into special dimension, e.g. input layer(10*10*3), if the filter is (2*2*10:means 1o*(2*2*3)), multiple them, output layer is 9*9*10.

ReLU Layer

This layer is an activation layer, click this blog activation chapter to see detail,the layer won’t change dimension of matrix, but amplify the positive value,reduce negative value.

Pooling Layer

There have two types of pooling layers: max pooling,average pooling, when we mention pooling layer, max pooling is by default,this layer is downsampling operation along the spatial dimensions(width,height).

Fully-Connected Layer

This is the final layer, that aggregate the previous layer and weight into range of classification.

why tensorflow has output_num? how to output specific number?
https://stackoverflow.com/questions/35788873/what-is-the-output-of-fully-connected-layer-in-cnn/35789367

Normalization Layer

This layer is rare to use...

A ConvNet architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)
There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)
Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function
Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)
Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)

3.ConNet Kernel

Kernel is also called filter, that can extract image features, but what types of feature we want to get is unknown, our purpose is input the datasets and train the filter, adjust filter on every epoch.

Here are some simple kernels:

Average Kernel: