What EXACTLY is a Neural Network? (with the math) — Let’s build one from scratch!

Michael Ye
Sep 15, 2019 · 5 min read

Most articles on Neural networks provide a high-level explanation with minimal math and use diagrams that look like this:

While it’s a great way to get people interested in neural networks, readers are often left with a blurry understanding and do not have enough knowledge to build one from scratch.

In this article, we will build a neural network from scratch with the math, make sense of the diagrams often seen online, and explain the reasons behind its building blocks(weights, biases, etc…).

So What is a Neural Network?

For starters, a neural network is simply a function, like f(x)=mx+b. In fact, f(x)=mx+b is a valid neural network — Let’s build it!

A neural network consisting of a single neuron

Looking at the diagram, the input is multiplied by weight, and is then summed with bias: f(x) = w*x+b (from now on we’ll use w for weight, and b for bias)

Now, if you were like me, you are probably wondering why there are weights and biases? What do they mean?

It turns out, this function is similar to how a physical neuron in your brain works! The weight tells you how “connected” the neuron is to its input, and there has to be a bias or else if the input is 0, the output is always 0(w*0 = 0). The bias allows the function to be more flexible. The weights and biases are also called parameters.

Knowing this, we can call our function a single “neuron”, and let’s encapsulate it so we can use it as a building block for more complex neural networks.

let w be weight and b be bias

But why build a function like this?

It turns out, you can pass the output of one neuron to the input of another(connect them), and if you have enough of these neurons and connections, by adjusting its parameters you can approximate almost every function in the universe no matter how complex!(read this if you don’t believe me).

After all, the goal is to use a neural network to approximate another function through training: adjusting the parameters to an ideal state(but let’s not get ahead of ourselves).

Let this sink in for a minute, this means you can approximate a function that outputs the corresponding number given an image of a handwritten digit, predicts the weather, or even whether or not someone has cancer!

Of course, one neuron can only approximate a linear polynomial function, so let’s build something more complicated.

A Neuron Layer(or Hidden Layer)

The subscript represents the “row” of the neuron

What you see above is an input connected to 3 neurons, and their outputs are summed together to get the final output. Try to work out the math yourself.

Finished? Okay. The final formula you should get is:

f(x) = (w1 * x + b1) + (w2 * x + b2) + (w3 * x + b3)

Let’s step up the game again by adding multiple inputs:

The 1st subscript represents the “row” of the neuron, and 2nd subscript represents which input it’s connected to

Here is when things start to get complicated…(trust me, you’re not going to get this the first time, get ready to re-read this section 10 times)

Hmmm, let’s try to simplify the formulas! First, since there are clearly defined columns in this neural network, let’s call them layers. There’s an input layer consisting of 2 elements, a neuron layer consisting of 3 elements, and an output layer consisting of 1 element.

It turns out, there’s a cool notational trick you can use if you know Linear Algebra(learn it here). You can think of the input layer as a vector [input1, input2], and the weights as a matrix where the rows represent all the connections between the inputs to one specific neuron in the next layer. The biases can also be expressed as a vector [b1, b2, b3]

If you take the dot product of the weight matrix and input vector, then add the bias vector, the output vector you can is equivalent to the output values of the neurons.

The output vector is essentially the “neuron layer”: [neuron 1 output, neuron 2 output, neuron 3 output]

There’s no other way to gain an intuition for this other than working out the math yourself.

Using this trick, you can now express the formula as: f(X) = WX + B
where W is the weight matrix, X is the input vector, and B is the bias vector.

This means, no matter the number of neurons in a layer or the number of inputs & outputs, to get the values of the next layer, simply use the formula above.

With this knowledge, we can now encapsulate our “neuron layer”, “input layer”, and “output layer”.

Make sure you understand this section completely before moving on.

Finally, A Proper Neural Network

From now on everything should be pretty straightforward, here’s a neural network with 3 hidden layers. To get the values of the next layer, simply use f(x)=WX+B.

To Recap:

Now, the neural networks described in this article has many limitations, one element that’s left out are activation functions which prevents exploding/vanishing gradients. However, that is outside the scope of this article, here are some pointers for what to learn next:

👋 Hey! I’m a 16-year-old entrepreneur, speaker, & Machine Learning/Python/C++/Web developer located in Toronto.

If you have any suggestions, ideas for projects, or just want to connect, feel free to message me on LinkedIn(http://linkedin.com/in/realmichaelye) or email realmichaelye@gmail.com!

If you enjoyed reading this article, don’t forget to hit that Clap button below and share this with all your Medium friends!


Smart Insight Communities

Michael Ye

Written by

machine learning developer, entrepreneur, life long learner



Smart Insight Communities

More From Medium

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade