Deep Neural Networks. Theory. Convolutional Networks.

Machine Learning & Data Science A-Z Guide.

Dmytro Nasyrov
Pharos Production
3 min readJun 19, 2017

--

LeNet-5

Give us a message if you’re interested in Blockchain and FinTech software development or just say Hi at Pharos Production Inc.

Or follow us on Youtube to know more about Software Architecture, Distributed Systems, Blockchain, High-load Systems, Microservices, and Enterprise Design Patterns.

Pharos Production Youtube channel

Let’s look at Convolutional Neural Networks(ConvNet). ConvNets are well suited for the image classification. What is the difference between networks with fully connected layers and ConvNet? Network with fully connected layers doesn’t take into account the spatial structure of the image. It treats input pixels which are far apart and close together on exactly the same footing. Instead such structures should be inferred from the training data. ConvNets use next basic ideas:

  • Local Receptive Fields
  • Shared Weights and Biases
  • Pooling

Local Receptive Fields.

We connect the input pixels to a layer of hidden neurons, as usual, but only in small localized regions of the input. For example, 5x5 region which corresponds to 25 pixels. This region is called Local Receptive Field(kernel) for the hidden neuron. Note if we have 28x28 input image and 5x5 kernel than hidden layer will have 24x24 neurons grid. An offset is called stride. We can move kernel over inputs with any stride we like, 1 or 2 or any. The size of the hidden layer will look like a blurred image with higher stride values and, of course, a hidden layer will become smaller in a size with higher strides.

Interlayer connections

Shared Weights and Biases.

Each neuron has a weight and a bias to its kernel from the previous layer. All weights and biases are the same for each of the hidden neurons. All of them are the same. From every 5x5 kernel. That means all neurons in the hidden layer detects the same feature but at different locations of the input image. To recognize objects on an image we definitely need more than one feature to recognize. So a complete convolutional layer consists of several different feature maps.

ConvLayer

Pooling.

ConvNet also contains pooling layers. They sit immediately after convolutional layers. Pooling simplified convolutional output. For example, median pooling slides with a 2x2 kernel over convolutional output and computes the median of its values. One other is a max pooling, it takes a maximum of 4 values when a kernel is 2x2. In the result, we will have smaller output. Pooling means — when a feature is found it doesn’t really matter it’s the exact location but the rough location is enough. There are more different techniques likes L2 pooling.

Pooling

The last pooling layer is connected to a couple of fully connected layers. Like on the image on top. BTW, that was LeNet-5 convolutional network.

End of Part 3

Thanks for reading!

--

--

Dmytro Nasyrov
Pharos Production

We build high-load software. Pharos Production founder and CTO.