Introduction to Deep Learning

Sumeet Agrawal
8 min readAug 28, 2021

--

In recent years, deep learning has become somewhat of jargon in the tech community. We always seem to hear about it in news regarding AI, and yet most people don’t actually know about it. In this article, I’ll be illustrating an introduction to deep learning and providing an intuition of how it works.

It also powers some of the most interesting applications in the world, like autonomous vehicles and real-time translation, and many others. There was certainly a bunch of excitement around Google’s Deep Learning-based AlphaGo beating the best Go player in the world, but more importantly, the business applications for this technology are more immediate and potentially more impactful. This post will break down where Deep Learning fits into the ecosystem, working on it, and why it is more important in the recent world.

So what is deep learning?

Deep learning is termed as the type of machine learning (or subset of machine learning) and artificial intelligence (AI) that mimic the way humans gain certain types of knowledge. It is an important element of data science, which includes statistics and predictive modeling.

Some of the most magnificent advances in artificial intelligence in recent years have been in the field of deep learning. Natural language translation, voice and image recognition, face emotion recognition, etc. are all tasks where deep learning models have neared or even exceeded human-level performance.

The intuition behind Deep Learning

Generally, deep learning is a machine learning method or a subset of machine learning that takes in an input X and uses it to predict an output of Y.

For example, let’s say that inputs are images of dogs and cats, and outputs are labels for those images. So here, in this case, X is input images and Y is the predicted output i.e. labels for cat or dog.

Image source becominghuman.ai/beginners-guide-cnn-image-classifier

How Do Deep Learning algorithms “learn”?

Deep Learning Algorithms use a network of structures called a neural network to find associations between a set of inputs and outputs. The basic structure is shown below:

Image Source — https://www.researchgate.net/figure/An-artificial-neural-network

A neural network is composed of input layers, hidden layers, and output layers — all of which are composed of “nodes”. The first layer is input layer which receives the input information in the form of texts, numbers, audio, image pixels, etc. In the middle of the model are the hidden layers. There may be a single hidden layer, as in the case of perception, or maybe multiple hidden layers. Various mathematical computations performed by the hidden layers on the input data and recognize the patterns from it. In the output layer, we obtain the result that we obtain through meticulous computations performed by the middle layer.

Image Source — https://www.simplilearn.com/tutorials/deep-learning-tutorial/

Each node in the layers has some weights assigned to it. A function is used which is called the transfer function which is used for calculating the weighted sum of the inputs and the bias.

After the transfer function has calculated the weighted sum, the activation function obtains the result. Based on the output, the activation functions fire the appropriate result from the node. For example, if the output received is above 0.5, the activation function fires the value as 1 otherwise it remains 0.

The information is passed between network layers through the function and the major points to keep a note here that tunable weight and bias parameters which are represented by w and b respectively in the function above. These are the essential for actual “learning” process of a deep learning algorithm.

After the neural network passes its inputs to its outputs, the network evaluates how good its prediction was (relative to the expected output) through something called a loss function. For example, the “Mean Squared Error” loss function.

Here in the equation Y hat represents the prediction, while Y represents the expected output. A mean is used if inputs and outputs are used in batches simultaneously (n represents sample count)

The goal of the network is ultimately to minimize the loss by adjusting the weights and biases of the network. It uses something which is called “backpropagation” through gradient descent. The network backtracks through all the layers in the opposite direction of the loss function to update the weights and biases of each node. in simpler words, every iteration of backpropagation should result in a smaller loss function than before.

How does Deep Learning work?

The inspiration for deep learning is same as the way that the human brain filters information. Its main motive is to simulate decision-making just like human. Neurons in the brain that passes the signals to perform the various actions. Similarly, artificial neurons connect in a neural network to perform tasks such as clustering, classification, or regression. The neural network also try to sorts the unlabeled data according to the similarities of the data. That’s the whole idea behind the deep learning algorithm.

Neurons are grouped into three different types of layers:

1. Input Layer

2. Hidden Layer

3. Output Layer

Image Source — https://www.janbasktraining.com/blog/deep-learning-with-keras/

1. Input Layer

It receives the input data from the observation (in the form of text, image, audio etc). This information breaks into bits of binary data and numbers.Computer can understand the data into numbers only. Variables need to be either standardized or normalized within the same range at each nodes.

2. Hidden Layer

In this layer various mathematical computations done on the input data. It is challenging process to decide the number of hidden layers and the number of neurons in each layer. Also in this layer, It does the non-linear processing units for feature extraction and transformation. Each following layer uses the output of the previous layer as input.

3. Output Layer

In the output layer, we get the desired result.

Consider an example of an image of a face, whose input might be in the form of a matrix of pixels. The first layer encodes the edges by using different filters and tries to composes the pixels. The next layer encodes a nose and eyes or maybe anyone of them. The next layer might recognize the face, and so on.

The connection between neurons is called weight. The weight between neurons decides the learning ability of the neural network. During the learning of neural networks, the weight between the neurons changes. Initial weights are set randomly.

To standardize the output from the neuron, we use the “activation function”. Each neuron has its activation function and this activation function also helps to normalize the output in a range between 0 to 1 or -1 to 1.

How to initialize weight?

Techniques generally practiced to initialize parameters are:

1. Zero Initialization

2. Random Initialization

1. Zero Initialization

Generally, biases are initialized with 0, and weights are initialized with random numbers, what if weights are initialized with 0?

If all weights are initialized with the value 0, then the derivative with respect to loss function is the same for every weight. This makes the hidden units symmetric for all the n iterations i.e. setting weights value to 0 does not make it better than a linear model. An important thing to note is that biases have no effect when weights are initialized with the value 0.

2. Random Initialization

In random initialization, we initialize random values to weights which is better than just 0 assignment. But there is one issue with random initialization that if weights are initialized high values or very low values then it suffers from the problem of vanishing gradient.

a) If weights are initialized with very high values then the term becomes remarkably higher and if an activation function like sigmoid activation is applied, then the function depicts its value near 1. Due to this the slope of gradient changes slowly and learning takes much more time.

b) If weights are initialized with very low values then the term becomes significantly lower and it gets mapped to 0, where the case is the same as above.

How to train Neural Network

Training the neural network is a difficult task as it requires a huge data set and also a large amount of computational power. Iterating through the observations in the data set and comparing the outputs will produce a Cost Function.

There are two processes for minimizing the cost function.

1) Backpropagation

Image Source — https://www.researchgate.net/figure/

Back-propagation is a popular approach that is similar to the gradient descent algorithm. The back-propagation algorithm allows the error to flow backward through the network, in order to compute the gradient. Once we have the gradient of this error as a function of the neuron’s weights, we can generate these output errors backward to infer errors for the other layers and adjust its weights in the direction to decreases the error.

The chain rule of calculus is used to compute the derivatives of functions formed by adding other functions whose derivatives are known. The chain rule states that the right thing to do is to simply multiply the gradients together to chain them.

2) Forward propagation

The information enters into the input layer and forwards in the network to the hidden layer and finally to the output layer to get the output value. Then we can compare the value to the expected results. After that, the next step is calculating errors and propagating the information back to the input layer to update the weights. This permits the algorithm to train the neural network and modernize the weights. Due to the structured algorithm, the algorithm can adjust weights simultaneously. It will help us to know that which weight of the neural network is responsible for the error.

Applications of Deep Learning

  1. Self Driving Cars
  2. News Aggregation and Fraud News Detection
  3. Natural Language Processing
  4. Virtual Assistants
  5. Entertainment
  6. Visual Recognition
  7. Fraud Detection
  8. Healthcare
  9. Detecting Developmental Delay in Children
  10. Language Translations

Summary

In conclusion, Deep Learning has given a boom to the technology industry in recent times. By reading this article, you know how deep learning works like the human brain. It mimics the way our brain works and learns from experiences. You also now know about the different layers of artificial neural networks. Moreover, you now know about the weights, activation functions, and training of neural networks. Hope this article will help you to understand the basis of deep learning and how it works.

--

--