What EXACTLY is a Neural Network? (with the math) — Let’s build one from scratch!
Most articles on Neural networks provide a high-level explanation with minimal math and use diagrams that look like this:
While it’s a great way to get people interested in neural networks, readers are often left with a blurry understanding and do not have enough knowledge to build one from scratch.
In this article, we will build a neural network from scratch with the math, make sense of the diagrams often seen online, and explain the reasons behind its building blocks(weights, biases, etc…).
So What is a Neural Network?
For starters, a neural network is simply a function, like f(x)=mx+b. In fact, f(x)=mx+b is a valid neural network — Let’s build it!
Looking at the diagram, the input is multiplied by weight, and is then summed with bias: f(x) = w*x+b (from now on we’ll use w for weight, and b for bias)
Now, if you were like me, you are probably wondering why there are weights and biases? What do they mean?
It turns out, this function is similar to how a physical neuron in your brain works! The weight tells you how “connected” the neuron is to its input, and there has to be a bias or else if the input is 0, the output is always 0(w*0 = 0). The bias allows the function to be more flexible. The weights and biases are also called parameters.
Knowing this, we can call our function a single “neuron”, and let’s encapsulate it so we can use it as a building block for more complex neural networks.
But why build a function like this?
It turns out, you can pass the output of one neuron to the input of another(connect them), and if you have enough of these neurons and connections, by adjusting its parameters you can approximate almost every function in the universe no matter how complex!(read this if you don’t believe me).
After all, the goal is to use a neural network to approximate another function through training: adjusting the parameters to an ideal state(but let’s not get ahead of ourselves).
Let this sink in for a minute, this means you can approximate a function that outputs the corresponding number given an image of a handwritten digit, predicts the weather, or even whether or not someone has cancer!
Of course, one neuron can only approximate a linear polynomial function, so let’s build something more complicated.
A Neuron Layer(or Hidden Layer)
What you see above is an input connected to 3 neurons, and their outputs are summed together to get the final output. Try to work out the math yourself.
Finished? Okay. The final formula you should get is:
f(x) = (w1 * x + b1) + (w2 * x + b2) + (w3 * x + b3)
Let’s step up the game again by adding multiple inputs:
Here is when things start to get complicated…(trust me, you’re not going to get this the first time, get ready to re-read this section 10 times)
Hmmm, let’s try to simplify the formulas! First, since there are clearly defined columns in this neural network, let’s call them layers. There’s an input layer consisting of 2 elements, a neuron layer consisting of 3 elements, and an output layer consisting of 1 element.
It turns out, there’s a cool notational trick you can use if you know Linear Algebra(learn it here). You can think of the input layer as a vector [input1, input2], and the weights as a matrix where the rows represent all the connections between the inputs to one specific neuron in the next layer. The biases can also be expressed as a vector [b1, b2, b3]
If you take the dot product of the weight matrix and input vector, then add the bias vector, the output vector you can is equivalent to the output values of the neurons.
The output vector is essentially the “neuron layer”: [neuron 1 output, neuron 2 output, neuron 3 output]
There’s no other way to gain an intuition for this other than working out the math yourself.
Using this trick, you can now express the formula as: f(X) = WX + B
where W is the weight matrix, X is the input vector, and B is the bias vector.
This means, no matter the number of neurons in a layer or the number of inputs & outputs, to get the values of the next layer, simply use the formula above.
With this knowledge, we can now encapsulate our “neuron layer”, “input layer”, and “output layer”.
Make sure you understand this section completely before moving on.
Finally, A Proper Neural Network
From now on everything should be pretty straightforward, here’s a neural network with 3 hidden layers. To get the values of the next layer, simply use f(x)=WX+B.
- We built a single neuron using a weight and a bias: f(x) = w*x+b
- Weights and biases are used to mimic a physical neuron in the brain
- We built a neuron layer using multiple neurons
- To the get values of the next layer use: f(X) = WX+B
- Given enough neurons and connections, a neural network can approximate almost any function in the universe!
Now, the neural networks described in this article has many limitations, one element that’s left out are activation functions which prevents exploding/vanishing gradients. However, that is outside the scope of this article, here are some pointers for what to learn next:
- Activation Functions(ReLU, Sigmoid, etc…)
- Gradient Descent(The training process of a neural network)
- Recurrent Neural Networks
- Convolutional Neural Networks
👋 Hey! I’m a 16-year-old entrepreneur, speaker, & Machine Learning/Python/C++/Web developer located in Toronto.
If you have any suggestions, ideas for projects, or just want to connect, feel free to message me on LinkedIn(http://linkedin.com/in/realmichaelye) or email firstname.lastname@example.org!
If you enjoyed reading this article, don’t forget to hit that Clap button below and share this with all your Medium friends!