## Deep Learning

# Exploring Neural Networks

## Understanding the concepts behind neural networks.

# Introduction ☕

Our brain is one of the wonders of the world that we are still trying to understand. We have been curious about its functioning and complexity for a long time. At present we are able to understand it to some extent, but there is a long way to go! On the quest to understand more about the brain and it’s wonderful capabilities we were inspired to develop something that we today call — **Artificial Intelligence**.

“Our intelligence is what makes us human, and A.I. is an extension of that quality.” — Yann LeCun Professor, New York University

We will be going from basics and covering all parts needed to understand neural networks in depth. This part of the series contains a brief introduction to common terms that you may have come across like, deep learning, machine learning, artificial neural networks, etc.

In the following paragraphs, I have also discussed why Machine learning and Deep Learning is needed for tasks that traditional programming could not achieve. Common doubts and terminologies are also discussed, in an effort to explain in the simplest manner, so that a beginner/reader understands.

# Motivation 😇

Are computers smart?

Consider images of a handwritten digit of the number ‘1’ that a young child is trying to learn. The child learns by looking at the images (*for example,* fig (a)) after viewing many times. Next time, even if we show the number ‘1’ in a different handwriting-style(*for example,* fig (b)), the child will recognize that it’s ‘1’. We don’t have to teach the child to learn every unique handwriting style in the world or all the cases even with a different position. The brain learns even though the specific values of image pixels will be different from each other for the two images.

The particular pixel intensities from the image that our visual system receives, are different. But still, we don’t have any difficulty in recognizing them both as to be number ‘1’. Similarly, anything that we sense is not constrained to the exact set of examples that we first come across while learning it.

What if we have to teach the same task to *computers* so that they recognize the images of digits? Computers are dumb, they don’t have such a complex system (yet). The obvious way would be… to develop a program!

But, then we would have to consider every possible steps and all the minute details! Wouldn’t that be so difficult?

In general, programming ( — **Traditional programming**) is writing a function that takes inputs and maps it to outputs. *Traditional programming* seems to be more tedious, daunting, and limited. A better solution is needed for such a challenging task.

Soon we came up with a method that we call — **Machine Learning. **Machine Learning models are further classified into supervised, unsupervised, and reinforcement learning. The *Machine Learning* technique is a far more acceptable approach for such challenging tasks. But, it needs lots of data for accurate results. The idea of *Machine Learning* is,

Instead of telling computers the exact details to learn, we now let the programs (computer) to figure out how to solve by itself.

There are certain performance metrics that give an idea of how wrong is the model’s prediction. Based upon the score of metrics the parameters are updated. The model follows a function that maps a combination of input and parameters to the output. This is generally called the *hypothesis* of the model.

Consider a simple model which is defined as follows,

`Output(Y) = Weights(W)*Inputs(X) + Biases(B)`

Initially, data is split into train and test data. The first part is** training **in which in the end we will have a set of parameters (

*Weights and Biases*) with less error and good accuracy.

The function above represents a line in 2-dimensional space. Our model while training calculates the output(Y) with a set of inputs from train data, which are the ‘*predictions’ of the model*. The difference between the outputs(Y) and the predictions represents *error*. If we get a large error, it means our model pure. Hence, we then choose another set of Weights(W) values and again train them. Now, to check whether our model is good or not, it is decided by measuring the performance after training over unseen data (test data) also called ** predicting**.

The model as described above when replaced with layers of neural networks (Neural Network Architecture) is what we call — **Deep Learning.**

# Deep Learning 🤖

What exactly is deep learning? What is so special about it?

*Deep Learning* is a subfield of *Machine Learning* that uses neural network architectures.

The ‘neural network’ is inspired by the cells present in the brain, named ‘**neurons**’ which are responsible for firing information from one part to another. The notion here is to find a function, which should be so flexible that it is not limited to one specific task. Neural networks are super fit for such tasks. The effectiveness to solve any problem with good accuracy is proven in the **Universal Approximation theorem**.

I have written a post on it, check below for more details,

# Questionnaire 🤔

Answering What, Why, and How.

Here we discuss a few questions that generally arise among the beginners. (*Complete in-depth theory can be referred from resources mentioned at last*)

## — What is a neuron?

A “

function” that takesinputs(x ={x1, x2, x3, ……, xn}) as vectors and then maps it to scalaroutput(y) usingweights(w = {w1, w2, w3…….wn}) andnon-linear function(a).

Its name is inspired by the 'neurons' present in our brain that is responsible for firing information from one part to another.

## — What is a neural network?

A

neural networkis layers of neurons connected to each other with a set of weights. The connection flows from input vectors to output through hidden layer.

## — What is the structure of the neural network? What does it consist of?

A

neural networkconsists ofinputlayer,hiddenlayer(s) andoutputlayer. The input layer consists of input vectors connected to hidden layer neurons. A set of weights is applied on each connection. Further, hidden layer’s neurons are connected to output layer’s neurons with another set of weights on each connection. Each layer’s output goes through a non-linear function. These are passed from one layer to another, which are also called ‘activations’.

## — What is neural network architecture?

The arrangement of neural network layers stacked one upon another. These contain activation functions and various mathematical operations between the layers. The number of layers is generally high.

For example, ** LeNet**, which is a

*CNN**(Convolutional Neural Network)*. One of the first architecture to spread the buzz of deep learning in the world. Its architecture can be viewed below,

The blue blocks represent mathematical operations( Convolution, Average Pooling). ‘FC’ stands for Fully Connected Layers.

## — What is a “non-linear ” function?

Non-Linear functions are the ones that has slope which varying between a interval. These are used to clip-off outputs or ‘activate’ the outputs to next layer (for e.g. input to hidden layer).

For example, one such powerful function is Sigmoid, which is defined as,

where x is the input vector. Its plot is as follows,

## — Why do we need non-linearity?

Non-linearity is needed because its aim is to produce a nonlinear decision boundary via non-linear combinations of the weight and inputs.

A linear function may not get a good boundary for data set with such noise (observe below). It can be more clear from following the decision boundary plot,

## — What role does data play in deep learning?

It’s all about data!

The better is quantity and quality(variety) of data we train, the better is the accuracy of our neural network.

## — How far have we reached? How intelligent are computers?

Computers are performing best in various areas such as healthcare, industrial, banking, etc. Some hot research areas are,

Computer Vision, Automatic Text Generation, Automatic Machine Translation, Natural Language Processing,

One such example of recent development is GPT-3 by Open AI.

## — Are neural networks recent advances in technology?

No! It has been with us since 1940s. It has gained more attention due to recent technological advancements (GPUs, TPUs, and CPUs). Also, the availability of huge volumes of data plays an important role in the neural network’s growth.

## — What tech stack do deep learning practitioners generally use?

The most common frameworks are PyTorch and Tensorflow. The tech stack is generally PyTorch, Fastai, Tensorflow, Keras, Gluon, mxnet, Scikit Learn, Pandas and Numpy.(not necessarily restricted to these)

# Elements of Neural Network 🔥💦🌱💨

What are the parameters of a neural network?

The elements(parameters) of neural networks can be defined as below —

**Weights**— These can be considered as ‘knobs’ of tuning a model to give better predictions. The weights are values using which the importance of a particular input is decided.**Biases**— These are added to meet the desired decision boundary. These are like the intercept.**Target Function**— The relation between inputs, weights, and biases. It maps inputs to results.**Learning Rate**— This is what controls the step size at each iteration while minimizing the error function.**Error**— The performance metrics that give an idea of how wrong our model predicts with respect to actual labels/ground truth. It is defined to tune the weights, with the goal of minimizing it. e.g. — Cross Entropy.**Optimizer**— The updating rule(function) to tune weights, for accurate results, by considering the error metrics. e.g. — Stochastic Gradient Descent.

The neural network *operations* can be subdivided into the following,

- Forward Propagation
- Activation Functions
- Backward Propagation
- Updating rule
- Training
- Predicting

These are discussed briefly with code in the next part of the series.

# Conclusion 😇

Finally, a long brief introduction comes to an end!

One can check the following resources. You can also reach out to me if facing any difficulties with the explanations given above.

Deep Learning can be performed by anyone! *You don’t need a Ph.D.*

# References and Resources to check 🔎

What to check next?

- http://neuralnetworksanddeeplearning.com
- https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
- http://alexminnaar.com/2015/02/14/deep-learning-basics.html
- All the images and diagrams are generated from my notebook shared here. Also, the next post to be shared contains all the code.

# About Me 😃

Hi! I am **Pratik Kumar**. Reach me out through the following,