# Implementing Backprogation with Plain Python Lists: It’s simple!

*Sources: *I learned this material with these exercises with the Deep Learning course at the Vrije Universiteit in Amsterdam. Check https://dlvu.github.io/ and the YT-channel of the course for the course material.

Although in these days a lot of high-level frameworks exist for deep learning, understanding the basics is still really important for implementing neural networks. A lot of tutorials exist on the internet for implementing neural networks with NumPy. In this post, we will take it one step further: we will implement a neural network only using Python lists and the Math library. Let’s dive in!

The neural network which we will implement is visible in figure 1. In this network, we have:

- 2 inputs
*x₁*and*x₂*with one bias*b₁*. - These inputs flow via a fully connected linear layer with weights
*w*to the first hidden layer which exists of*k₁, k₂*and*k₃*. For*k₁*, the mathematical expression would look like:**k₁ = w₁₁ ⋅ x₁ + w₂₁ ⋅ x₂ + b₁** - The
*k*values are going through a non-linearity. We will implement a sigmoid non-linearity. The output we name*h*. The expression for*h₂*looks like:**h₂ = 1/(1+ exp(- k₂))** - The
*h*values plus bias*b₂*then flow via the second fully connected linear layer with weights*v*to produce the output value*y*. This looks like:**y = v₁ ⋅ h₁ + v₂ ⋅ h₂ + v₃ ⋅ h₃ + b₂** - This output value
*y*is then compared with the target output*t*via the Mean Squared Error loss computation to get the loss*l*:**l = (y - t)²** - The loss is then used to update the values of our network via backpropagation using the chain rule which we will now code!

Let’s use pseudocode to first implement and understand forward pass using Python lists.

# The only library we use is the math library:

import math# The parameters:

x = [x_1, x_2]

w = [w_11, w_12, w_13;

w_21, w_22, w_23]

b = [b_1, b_2]

v = [v_1, v_2, v_3]

t = target# The sigmoid function:

def sig(k):

return 1/(1+math.exp(-k))# The first layer:

k = [0.0,0.0,0.0]

for j in range(3):

for i in range(2):

k[j] += w[i][j]*x[i]

k[j] += b[0]# The second layer (non-linearity):

h = [0.0,0.0,0.0]

for i in range(3):

h[i] += sig(k[i])

y = 0.0

for i in range(3):

y += v[i]*h[i]

y += b[1]# The MSE loss:

l = (y-t)**2

Now that the forward pass is complete, let’s calculate the backward pass via the chain rule. All values get updated with respect to the loss, which means that the chain rule can be simplified: We can walk backward from the computation graph in figure 1 and compute the derivative of the loss for every node. For the nodes earlier in the network, we just multiply the local gradient. This means we can very efficiently compute any derivatives we need.

For calculating the derivative of for example *v₂ *with respect to *l*, we need the derivative of the loss *l *with respect to *y *multiplied with the derivative of *v₂* with respect to *y*. **dv₂_dl = dl_dy ⋅ dy_dv₂**

Recall that the loss function was:**l = (y — t)²**And that the linear layer to calculate

*y*was:

**y = v₁ ⋅ h₁ + v₂ ⋅ h₂ + v₃ ⋅ h₃ + b₂**

By calculating the derivatives, we get:

**dv₂_dl = dl_dy ⋅ dy_dv₂**

dv₂_dl = 2 ⋅ (y - t) ⋅ h₂

dv₂_dl = 2 ⋅ (y - t) ⋅ h₂

For b₂, we have:**db₂_dl = dl_dy ⋅ dy_db₂**Which gives:

db₂_dl = 2 ⋅ (y - t) ⋅ 1

db₂_dl = 2 ⋅ (y - t) ⋅ 1

We can now update these weights and biases with learning rate α:**v₂ = v₂ - α ⋅ dv₂_dlb₂ = b₂ - α ⋅ db₂_dl**

The derivatives of v with respect to l can be used to calculate derivatives earlier in the network, which in our case are the weights *w *and bias *b₁. F*or every weight, we calculate all derivatives in the path of that weight to the loss *l*. For the weight *w₁₂*, we have the path as in figure 2.

All by all, we get:**dw₁₂_dl =** **dl_dy ⋅ dy_dh₂ ⋅ dk_dh₂ ⋅ dk_w₁₂**

Of which dl_dy we already calculated,. The other derivatives are:**dy_dh₂ = v₂**For dk_dh₂, we need the derivative of the sigmoid function:

**Lastly, we need the derivative dk_w₁₂. Note that:**

h₂ = 1/(1+ exp(- k₂)) = sigmoid(k₂)

dk_dh₂ = sigmoid(k₂) ⋅ (1 - sigmoid(k₂))

h₂ = 1/(1+ exp(- k₂)) = sigmoid(k₂)

dk_dh₂ = sigmoid(k₂) ⋅ (1 - sigmoid(k₂))

**So:**

k₂ = w₁₂ ⋅ x₁ + w₂₂ ⋅ x₂ + b₁

k₂ = w₁₂ ⋅ x₁ + w₂₂ ⋅ x₂ + b₁

dk_w₁₂ = x₁

dk_w₁₂ = x₁

Filling in the derivative in the full path:**dw₁₂_dl =** **2 ⋅ (y - t) ⋅ v₂ ⋅ sigmoid(k₂) ⋅ (1 - sigmoid(k₂)) ⋅ x₁**

Lastly, we calculate the derivative of b₁, *which is influenced by k₁, k₂ and k₃! We need to add up all the derivatives.*

db₁_dl +=**2 ⋅ (y - t) ⋅ v₁ ⋅ sigmoid(k₁) ⋅ (1 — sigmoid(k₁)) ⋅ 1db₁_dl +=**

**2 ⋅ (y - t) ⋅ v₂ ⋅ sigmoid(k₂) ⋅ (1 — sigmoid(k₂)) ⋅ 1**

db₁_dl +=

db₁_dl +=

**2 ⋅ (y - t) ⋅ v₃ ⋅ sigmoid(k₃) ⋅ (1 — sigmoid(k₃)) ⋅ 1**

Now we have mathematically expressed all the derivative values. Finally, we will program this!

dl_dy = 2*(y-t)

dl_dv = [0.0, 0.0, 0.0]

dl_db = [0.0, 0.0]for i in range(3):

dl_dv[i] += dl_dy*h[i]

dl_db[1] += dl_dydl_dw = [0.0, 0.0, 0.0;

0.0, 0.0. 0.0]

for j in range(2):

for i in range(3):

dl_dw[i][j] += dl_dy * v[i] * sig(k[i])*(1 - sig(k[i])) * x[j]for i in range(3):

dl_db[0] += dl_dy * v[i] * sig(k[i])*(1 - sig(k[i]))

Finally, all you need to do is calculate the updated values with your learning rate α and you have implemented backpropagation in a neural network using only plain Python!

Try for yourself if you can implement the network in figure 3 below, which consists of a linear layer with weights *W* and biases *b*, followed by sigmoid non-linearity, then again an linear layer with weights *V *and biases *c *and lastly a softmax layer. Calculate the loss with respect to the target with cross-entropy loss. With the initialization of weights and biases as in figure 3, the derivatives after one forward and backward pass should look like:

derivatives wrt *W*: [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]

derivatives wrt *b*: [0.0, 0.0, 0.0]

derivatives wrt *V*: [[-0.4404, 0.4404], [-0.4404, 0.4404], [-0.4404, 0.4404]]

derivatives wrt *c:* [-0.5, 0.5]

I hope this post helped you to better understand backprogagation. Good luck on your journey in Deep Learning!