Convolutional Neural Networks for the Rest of Us Part III: Architectures
This is the third and final( yaaaay :) ) essay discussing the principles of convolutional neural networks(CNNs) in deep learning solutions. The first part provided an overview of CNN and some of the mathematical ideas behind them. The second essay focused on some of the benefits of CNN models compared to traditional neural networks. If by now you are convinced about the advantages of CNNs then let’s discuss how to you use them in deep learning models.
Typically, CNN are not used as a standalone algorithm in deep learning models. Instead they are often combined with other algorithms in order to implement complete deep learning solutions. It is also important to notice that a single deep learning model can have many convolution operations sometimes executed in parallel. This is typically due to the fact that convolution operations using a single kernel(one of the tensor parameters of the convolution function. Read part I) is likely to only be effective extracting a single feature. Ideally, we would like to architect deep neural networks in a way in which each layer is able to extract multiple features.
From the architecture standpoint, CNN layers can be seen as a combination of three fundamental stages:
Convolution Stage ==> Detector Stage ==> Pooling Stage
The first two stages are relatively trivial to explain while the third one requires some fait in the magic of statistics :).
The convolution stage executes a series of convolution operations in order to extract specific features. As mentioned before, many of those convolution operations are scheduled in parallel in order to provide the expected linear activations.
2) Detector Stage
In this stage, the linear activations produced by the convolution tier are ran through a series of non-linear activation functions like the rectified linear activation. The purpose of this stage is to produce a non-linear out that that facilitates the use of statistical functions.
The pooling stage is the most complex component of a CNN architecture. Conceptually, pooling stages are a combination of pooling functions applied to the non-linear output produced by the Detector Stage. What is a pooling function then ?
A pooling function is a mathematical operation that attempts to minimize the statistical error of the output. Pooling functions replace a specific input with the output of a statistical function that aggregates points closer to the original input. For instance, a popular pooling function in deep learning models replaces a specific data point with the average of values in a rectangular neighborhood.
Remember the equivariance property of convolution operations? (read part II). Well, pooling functions make their output consistent despite small changes in the input. That property is known as invariance and is one of the most important characteristics of CNNs.
So there you have, when evaluating CNN model remember to think about in three stages: convolution, detector and pooling. I hope you found the information useful.