From Perceptron to Deep Networks

Deep learning is one of the hottest fields in data science with many case studies with marvellous results in robotics, image recognition and Artificial Intelligence (AI).It is a prominent topic & has been in the spotlight for quite some time now. Perceptron is a basic neural network building block.I will start with overview & then deep dive into different techniques & how they evolve.

Deep learning refers to neural networks with multiple hidden layers that can learn increasingly abstract representations of the input data.For example, deep learning has led to major advances in computer vision. We’re now able to classify images, find objects in them, and even label them with captions. To do so, deep neural networks with many hidden layers can sequentially learn more complex features from the raw input image:

  • The first hidden layers might only learn local edge patterns.
  • Then, each subsequent layer (or filter) learns more complex representations.
  • Finally, the last layer can classify the image as a cat or kangaroo.

These types of deep neural networks are called Convolutional Neural Networks.I have explained this below in detail.CNN’s can drastically reduce the number of parameters that need to be tuned. Therefore, CNN’s can efficiently handle the high dimensionality of raw images.

One of the most powerful and easy-to-use Python libraries for developing and evaluating deep learning models is Keras. It wraps the efficient numerical computation libraries Theano and TensorFlow. The advantage of this is mainly that you can get started with neural networks in an easy and fun way.

Ng’s breakthrough was to take these neural networks, and essentially make them huge, increase the layers and the neurons, and then run massive amounts of data through the system to train it. In Ng’s case it was images from 10 million YouTube videos. Ng put the “deep” in deep learning, which describes all the layers in these neural networks.

Today, image recognition by machines trained via deep learning in some scenarios is better than humans, and that ranges from cats to identifying indicators for cancer in blood and tumors in MRI scans. Google’s AlphaGo learned the game, and trained for its Go match — it tuned its neural network — by playing against itself over and over and over.

Few Popular Deep Learning libraries:-

Thanks to Deep Learning, AI Has a Bright Future

Deep Learning has enabled many practical applications of Machine Learning and by extension the overall field of AI. Deep Learning breaks down tasks in ways that makes all kinds of machine assists seem possible, even likely. Driverless cars, better preventive healthcare, even better movie recommendations, are all here today or on the horizon. AI is the present and the future. With Deep Learning’s help, AI may even get to that science fiction state we’ve so long imagined. You have a C-3PO, I’ll take it. You can keep your Terminator.

Let’s start with Single Neuron Network

To represent this mathematically, let our separator be defined by a vector of weights w and a vertical offset (or bias) b. Then, our function would combine the inputs and weights with a weighted sum transfer function:

The result of this transfer function would then be fed into an activation function to produce a labeling.

The single perceptron approach to deep learning has one major drawback: it can only learn linearly separable functions.Solution is Multilayer Perceptron i.e. Feedforward Neural Networks .A neural network is really just a composition of perceptrons, connected in different ways and operating on different activation functions.The most common deep learning algorithm for supervised training of the multilayer perceptrons is known as backpropagation. The basic procedure:

  1. A training sample is presented and propagated forward through the network.
  2. The output error is calculated, typically the mean squared error:

Where t is the target value and y is the actual network output. Other error calculations are also acceptable, but the MSE is a good choice.

3. Network error is minimized using a method called stochastic gradient descent.

However, increasing the number of hidden layers leads to two known issues:

  1. Vanishing gradients: as we add more and more hidden layers, backpropagation becomes less and less useful in passing information to the lower layers. In effect, as information is passed back, the gradients begin to vanish and become small relative to the weights of the networks.
  2. Overfitting: perhaps the central problem in machine learning. Briefly, overfitting describes the phenomenon of fitting the training data too closely, maybe with hypotheses that are too complex. In such a case, your learner ends up fitting the training data really well, but will perform much, much more poorly on real examples.

Let’s look at some deep learning algorithms to address these issues.


An autoencoder is typically a feedforward neural network which aims to learn a compressed, distributed representation (encoding) of a dataset.


A generative stochastic neural network that learn a probability distribution over its set of inputs.

Above mentioned algo’s were good effective feature detectors but it was rare that we can use these features directly to solve some practical business problems Hence came stacked Autoencoders & Stacked RBM i.e. Deep Belief networks.

CNN is a special class of feedforward networks that are very well-suited to image recognition.This algorithm apply a number of filters to the input. For example, the first convolutional layer of the image could have four 6x6 filters. The result of one filter applied across the image is called feature map(FM) and the number feature maps is equal to the number of filters. If the previous layer is also convolutional, the filters are applied across all of it’s FMs with different weights, so each input FM is connected to each output FM. The intuition behind the shared weights across the image is that the features will be detected regardless of their location, while the multiplicity of filters allows each of them to detect different set of features.

Subsampling layers reduce the size of the input. For example, if the input consists of a 32x32 image and the layer has a subsampling region of 2x2, the output value would be a 16x16 image, which means that 4 pixels (each 2x2 square) of the input image are combined into a single output pixel. There are multiple ways to subsample, but the most popular are max pooling, average pooling, and stochastic pooling.

The last subsampling (or convolutional) layer is usually connected to one or more fully connected layers, the last of which represents the target data.

Training is performed using modified backpropagation that takes the subsampling layers into account and updates the convolutional filter weights based on all values to which that filter is applied.

Show your support

Clapping shows how much you appreciated Amit Dua’s story.