# Explaining neural networks 101

Neural networks reflect the behavior of the human brain. They allow programs to recognise patterns and solve common problems in machine learning. This is another option to either perform classification of regression analysis. If you did not see my series of articles on the logistics regression, then have a look at those first as this series will use the same set of data. At Rapidtrade, we use neural networks to classify data and run regression scenarios.

So to **visualise** the data we will be working with in this series, see below. We will use this to train the network to categorise our customers according to column J. We will also use the 3 features highlighted to classify our customers. Feature selection is important and have a look at this article to see why I chose those 3 features.

Just keep in mind, we will convert all the alpha string values to numerics. After all, we can’t plug strings into equations ;-)

Neural networks are always made up of layers, as seen in figure 2. It all looks complicated, but let’s unpack this to make it more understandable.

A neural network has 6 important concepts, which I will explain briefly here, but cover in detail in this series of articles.

- **Weights** — *These are like the **theta’s** we would use in other algorithms*

- **Layers** — *Our network will have 3 layers*

- **Forward propagation** — *Use the features/weights to get Z and A*

- **Back propagation** — *Use the results of forward propogation/weights to get S*

- Calculating the **cost/gradient** of each weight

- **Gradient descent —** find the best weight/hypothesis

In this series, we will be building a neural network with 3 layers. Our’s will have the following 3 layers.

# Input Layer

We will refer to the result of this as **A1**. The **size **(# units)** **of this layer depends on the **number of features** in our dataset. Building our input layer is not difficult you simply **copy in X**, but add what is called a **biased layer**, which defaults to “1”.

Col 1: **Biased** layer defaults to ‘1’

Col 2: **“Ever married”** our 1st feature and has been re-labeled to 1/2

Col 3: **“Graduated”** our 2nd feature and re-labeled to 1/2

Col 4: **“Family size” **our 4rd feature

# Hidden layer

We only have **1 hidden layer**, but you could have a hidden layer per feature. If you had more hidden layers then the logic I mention below, you would replicated the calculations for each hidden layer. The size (#units) is up to you, we have chose #features * 2.

This layer is calculated during forward and backward propagation. After running **both** these steps, we **calculate Z2, A2 and S2 **for each unit. See below for the outputs once each of these steps are run.

**Forward propagation**

In this step, we calculate Z2 and A2. You can visualise the results below.

- **Z2** contains the results of our **hypothesis** calculation for each of the **6 units** in our hidden layer.

- While **A2** also includes the biased layer (col 1) and has the sigmoid function applied to each of the cell’s from Z2.

Hence Z2 has 6 columns and A2 has 7 columns.

Don’t worry on the equations just yet, that will come in the next article.

**Back propagation**

So, after forward propagation has run through all the layers, we then perform the **back propagation **step to calculate **S2**. S2 is referred to as the **delta** of that units hypothesis calculation. This is used to then figure out the **gradient** for that theta and later on, combining this with the **cost** of this unit, helps gradient descent figure out what is the best theta/weight.

Again, the equations will come later, for now, understand that back prop helps us decide the cost/gradient of each hypothesis in each unit.

# Output layer

Our output layer gives us the result of our hypothesis. ie. if these thetas were applied, what would our best guess be in classifying these customers. The **size** (#units) is derived from the number labels for Y, or in our case in figure 1, column J. As can become seen in figure 1, there are 7 labels, thus the size of the output layer is 7.

As with the hidden layer, this is calculated during the 2 steps of **forward and backward** propagation. After running both these steps, here is the results:

**Forward propagation**

Again, in this step we will calculate **Z3** and **A3 **for the output layer, as we did for the hidden layer. Refer to figure 1 above to see there is no bias column needed and you can see the results of Z3 and A3 below.

**Back propagation**

Now that(referring to figure 1) we have Z3 and A3, lets calculate **S3**. As it turns out S3 is simply a basic cost calculation, subtracting A3 from Y, so we will explore the equations in the up coming articles, but we can none the less see the result below

# Putting it all together

So, above is a little awkward as it visualises the outputs in each layer. Our main focus in neural networks, is a function to compute the cost of our neural network. The coding for this function will take the following steps.

**Initialise**a set of weights/thetas- Perform
**cost optimisation**that does steps**(3) to (6)**until it finds the best weight/theta to use for predictions - Perform
**forward propagation**to calculate in the following order:

Z1 > A1 > Z2 > A2 > Z3 > A3 - Perform
**backward propagation**So calculate in the order:

S3 > S2 - Calculate the
**cost**of forward/back propagation - Calculate the
**deltas**and then**gradients**. (Used by gradient descent or cost optimisation)

Ok, so that was a bucket load of information, go onto part 2 where cover forward propagation in detail.

# Sources

Great article: https://towardsdatascience.com/under-the-hood-of-neural-network-forward-propagation-the-dreaded-matrix-multiplication-a5360b33426

If you are looking for a course: https://www.coursera.org/learn/machine-learning/