What Is a Neural Network and How Does It “Learn”?

Justin Fernandez
The Startup
Published in
5 min readFeb 1, 2021
[credit]

As humans with extremely complex brains, we make extremely complex decisions in a fraction of a second without even recognizing the process that happens behind that decision making. Simple things like reading your own handwriting may seem mundane but the amount of knowledge and processing power done by your brain makes these extremely hard to replicate in a computer program. Even decisions like whether you want to go to the grocery store now or later may take a little more time but still is done relatively quickly. This process of weighing the reasons for going or not are exactly the decisions that neural networks try and mimic.

Let’s expand on the example of to go to the grocery store or not.

In order to make this decision there must be reasons for going and for not going. For example:

  1. Whether or not you have enough food to last the next 7 days
  2. Whether or not its a weekend because it will be busy and you would rather go on a weekday
  3. Whether or not you need to go to the far grocery store or the close because the far one has your favorite cookies but may not feel like driving that far

Now because some may be true and some false we need to apply a weight to each of those dependencies in order to determine if it will be worth it to go today.

  1. The most important is if you have enough food, so we will give that a weight of 5
  2. You don’t want to go on a weekend but its not that important so we will give it a weight of 3
  3. Finally, the cookies are great but not very important so we will give it a weight of 2.

In order to make your final decision you say you need at least 5 points to get yourself to go to the grocery store. Let’s look at the options:

  • You need food (+5), it is the weekend (+0), but you don’t need to get cookies (+2). So your score totals 7 so you go to the grocery store.
  • You don’t need food (+0), it is a weekday (+3), and you need cookies (+0). So your score is 3 so you do not go to the grocery store.
  • You don’t need food (+0), it is a weekday (+3), but you don’t need cookies (+2). So your score is 5 so you go to the grocery store.

This is a logical process that we are able to do in a couple minutes if not seconds in our brain and is the exact same process represented in a neural net. A neural net (in this case a multi layer perceptron or just the mosts basic neural net) is a large network of many different decisions being made to form a single output. At the lower level of complexity is a single perception making a single decision. It will have an input that is an n amount of binary values that represent the decisions that we talked about before in our grocery store example. Each of these binary values will have a weight that expresses the importance of those inputs in the decision making process. Then it will sum these values and determine if the summed value is larger than a predetermined threshold and output 1 or smaller than that threshold and output 0.

A neural net is a collection of perceptrons that will collectively make many dependent decisions in order to make one final decision. Neural nets and their different variations (convolutional neural nets, recurrent neural nets, feed forward neural nets) are able to to read hand written digits, detect cancer in x-rays, simulate stock price movement, predict sports game outcomes, and much more.

How can a neural net learn?

The reason that these programs are able to tackle a singular task so well is that they are able to practice the task that they set out to do thousands and thousands of times through a process called training. A neural net will be provided a set of labeled data that it can train on to see what it gets wrong and what it gets right. After it sees what it get correct and what it gets wrong, it will make changes to weights and baises to be able to get more correct the next go around. In the case of hand written digits, the data will be images of hand written numbers paired with a machine readable label that is the number represented in the picture.

Hand written digits

In the picture we can easily see that these are all the number 2. But to a computer all it can see is a matrix of pixel values representing how dark that pixel is. The hope of a neural network is after the input layer passes through the next layer (a hidden layer) it will activate (the threshold value was met) different perceptrons and will learn patterns of what perceptrons are activated when the number is a two is seen and how that is different from the perceptrons that are activated when the number is an eight.

Math Time

[link]

The whole neural net can be seen as one large functions, y = fₙₙ(x), where x is the input and y is the output. Each layer can be explained by the following equation:

Going to apologize for the lack of format below, Medium does not support Latex or formula creation.

  • l = the current layer that this perceptron is a part of
  • g_l= (g subscript l) the activation function that will scale the value to be in between 0 and 1
  • W_l= (W subscript l) a matrix of weights that is learned through the process similar to gradient descent (back propagation) [this is a matrix because each row represent the connections from every perceptron to a single perceptron in the next layer]
  • b_l = (b subscript l) a bias that makes it easier or harder to reach the threshold that is also learned
  • z = the input vector
  • f_l(z) = (f subscript l of z) the output vector of the row that is the input layer of the next row

To give a general summary of what has been explained so far. An input vector is seen as the first layer in a neural net. Each perceptron in the first layer is connected to every layer in the next layer meaning that the above described formula is applied to the first layer and the result is the value of the next layer of perceptrons. Each perceptron in the second layer has a different value because the weights and baises applied are different. Each perceptron will then activate based on whether the threshold was met and then the same process happens moving on to the next layer.

An extremely well done visual representation and description is seen in this video:

The blogs following this will be describing the different variations of neural nets. I will update this blog to include links to those when they are done.

--

--

Justin Fernandez
The Startup

Soon to be data scientist! I also love shoes and my dogs!