Introduction to Neural Networks

9 min readJan 14, 2023

Introduction to Neural Networks!!

Welcome to NeeshAi

The history of Neural Networks

You saw the title of the blog and probably you might be eager to jump right into Backpropagation, Gradient Descent, or application to datasets in practice and what not. But before we get there, don’t you think it’s a good idea to see how Neural Nets originated in the first place because as they say don’t become a mere recorder of facts, but try to penetrate the mystery of their origin.

History of Neural Network goes back to as early as 1943. McCulloch and Pitts (1943) introduced the concept of Artificial Neuron based on their understanding of neurology. It was an extremely simple artificial neuron. The output of this model was either a zero or a one. Here is a simple model proposed by them.

Fig. 1- http://aishack.in/tutorials/artificial-neurons-mccullochpitts-model/

W represents the weight, a weight of 1 represents excitatory input and a weight of -1 represents inhibitory input. X1, X2, and X3 are the inputs of the model. A sum was being calculated by multiplying the inputs with the corresponding weights.

Sum = x₁ w₁ + x₂ w₂ + x₃ w₃ +…

This sum was called a weighted sum. Then a threshold value was being chosen and if the weighted sum was greater than threshold, output was given as 1 otherwise 0.

After that many people attempted to produce a model that imitates biological neurons with little or no success. The first real success came in 1954 when Belmont Farley and Wesley Clark of the MIT succeeded in running the first simple neural network. Farley and Clark were able to train networks containing at most 128 neurons to recognize simple patterns.

Rosenblatt (1958) took considerable interest in the field and he succeeded to design and develops the Perceptron. The Perceptron had three layers. The middle layer was known as the association layer. This system was able to connect or associate a given input to a random output unit.

Fig.2 — https://www.researchgate.net/figure/The-structure-of-Rosenblatts-Perceptron_fig6_271841595

This model could learn how to sort simple images into categories such as triangles and squares.

Although public interest and available funding were minimal, several researchers continued working to develop models for problems such as pattern recognition.

In 1974 Paul werbos in his famous paper Backpropagation through time: What it does and how to do it came with the idea of Back-propagation probably the most well-known and widely applied algorithm of the neural networks till today.

Fig. 3- http://www.scaruffi.com/mind/ai.html

In late 1970s and early 1980s comprehensive books and conferences provided a forum for people in diverse fields with specialized technical languages, funding became available throughout Europe, Japan and the US and all this led to a vibrant emerging field of neural networks. In 1979 Kunihiko Fukushima came up with the ground breaking idea of Convolutional neural networks (see). It proved to be extremely useful in the cases dealing with visuals like image identification etc.

In 1998, Sutton R.S and Barto A.G came up with the concept of Reinforcement learning (see). Reinforcement learning, one of the most active research areas in the field of artificial intelligence today, works on reward based mechanism. It rewards the model if it performs better otherwise impose a penalty; ultimately model learns its task by hit and trial method.

The evolution kept happening throughout the journey and as computational power increased tremendously with the advent of distributed and GPU systems, neural networks could learn faster and faster and gained the shape that we see today.

A Resurgence of Neural Network:

While all this was happening neural network never become major tool in the industries and was limited to Research. There were various issues with backpropagation algorithm, if someone wanted to extend the neural network beyond 2–3 hidden layers it often got stuck in local optima. The fact that it was not able to use multiple hidden layers well, led to poor results in benchmark datasets. The major issue being found was the initiation of weights in a deep network.

Also the availability of data was very limited and nobody bothered to collect and store the data because nobody was familiar with the kind of benefits they can get by analyzing the data.

Now that companies and Industries have realized the benefits of collecting the data. There is Availability of large unstructured training data sets what we call today is big data. With the multidimensional grids of GPU’s, which packs thousands of relatively simple processing cores on a single chip and enables smooth working of 10-, 25-, even 50-layer networks of today compared to one-layer networks of the 1960s and the two- to three-layer networks of the 1980s. This is the concept called “Deep Learning”. “Deep” in “deep learning” refers to — the depth of the network’s layers. Currently, deep learning is responsible for the best-performing systems in almost every area of artificial-intelligence research. Large scale networks can be built and trained in very small time. All this has led to ground-breaking improvements across a variety of applications including image classification, video analytics, speech recognition and natural language processing. We can’t connect the dots looking forward; we can only connect them looking backwards. So while we are looking back into the history we have discussed earlier and try to connect them, all of them start to make sense, and we realize those small contributions to make the deep learning so deep.

Large companies like Google and IBM are betting big on deep learning with the kind of data set they have and what they are gaining from it. More and more companies are looking to provide smarter solutions for their customers. Explosion of new AI related companies that are looking to provide these solutions are emerging rapidly. Industries that don’t want to be left behind have to adapt to this change. Fraud detection, medical diagnostics, personal assistants, defense you name the field and deep learning has certainly created an impact there. Believe me or not future belongs to it and there is no denial of this fact.

Why the name “Deep Learning”?

Before we start with what deep learning does? Let’s break down and understand what we mean by two terms “deep” and “learning”. We will familiarize with the term “deep” later, first let’s understand the meaning of “learning”. Here “Learning” stands for learning through “Artificial Neural Network” about which we are talking about for so long and probably there would be a picture in your mind, how does it look like.

So here is a simple artificial neural network.

Fig. 5- https://www.tutorialspoint.com/artificial_intelligence/images/atypical_ann.jpg

To put it simply, it is a rough mimic of neurons in the brain. This means that inputs are passed from the network and finally it processes those inputs and gives us the final output (more on this part later).

Now coming to the second part of the question what does “deep” stands for?

If you have some way or another a little interest in technology, chances are that you have probably come across this term quite a few times. Over the past year or two, it’s a buzz word being tossed around a lot, and it’s something that has seized everybody’s curiosity.

We hear the term “deep” and instantly intimated by it but believe me it’s not as intimidating as it seems. Ok now you will be probably thinking and saying enough beating around the bush- tell me what is it? So here it goes.

Deep neural network is simply a feed forward network with many hidden layers. What as simple as it? Yeah, and now it does not sounds quite intimidating or is it?

This is more or less all I have to say about the definition. The only thing that I have to add is- neural networks can be recurrent or feed forward.

Fig. 6- https://www.researchgate.net/profile/Wim_De_mulder/publication/266204519/figure/fig5/AS:270318480654346@1441460356163/Recurrent-versus-feedforward-neural-network.png

Feed forward network do not have any loops and can be organized in layers, while recurrent neural networks have loops in their graphs.

How deep is your “deep”?

If there are “many” layers in the network, then we say that the network is deep. The question that should be flashing through your mind right now is how many layers does a network have to have in order to qualify as deep?

Well there is no single answer to this actually there is no answer to this. It’s the same question like asking how many hours do I have to read for being qualify to a particular exam. Only one thing is definite here- a network with only a single hidden layer cannot be called “deep” and is conventionally called “shallow”. While a network having two or more hidden layers counts as deep. Now 10 years down the line it may happen that a network with 2 or 3 layers will be called shallow and a network with 10 or more layer will be called “Deep”. “Deep” and “shallow” are relative terms and can be, will be changed with change of reference.

Fig. 7- https://www.druva.com/blog/understanding-neural-networks-through-visualization/

While logic behind the artificial neural network and deep learning is fundamentally same but this does not convert into the fact that the two artificial neural networks combined together will perform similar to that of deep neural network when trained using the same algorithm and training data.

So what differentiates deep neural nets from ordinary networks? One of the main differences between deep neural networks and simple artificial neural networks is the way we use backpropagation. In ordinary way backpropagation trains later layers more efficiently than it trains earlier layers — as we go back into the network, the errors get smaller and more diffuse.

So what we do in deep neural network is we first try to solve the problem of building ordinary good first layer, and then try to solve the problem of building a good second layer, eventually we’ll have a deep feature space that we can feed in to our actual problem.

What makes these deep neural nets so useful is there capability of discovering latent structures (what we call feature learning) from within vast majority of unlabeled, unstructured data i.e. big data.

Fig. 8- https://www.researchgate.net/figure/Data-science-techniques-scale-with-amount-of-data_fig1_333570727

Also the fact that deep learning excels on problem domains where the input data is in the form of images (pixel data) or documents (text data) or files (audio, video data) instead of conventional tabular form makes it more unique and more useful.

This was the first part in the series “Deep learning: Imagine the unimaginable”. I hope this write-up has cleared your doubt-why Deep Learning is called “Deep” and at the same time have given some flavor of Neural Network’s history. Nothing will make me happier than knowing the fact that article has helped you in some way or another, do let me know ☺

We will continue this series with the next part of the series as “Applications of Deep Learning in today’s world” coming soon; till then adios. Stay tuned.

Follow NeeshAi to learn more about Data Science, Product Management and more in Tech!

References:

Introduction to Neural Networks

Written by Neesh AI