A concise description of neural network calculating weight (w) and bias (b)
Here are the basic steps for calculating the weight and bias.
Setup: input is x, output y, layers a described below
Step 1: initiate weight (w) and bias (b)
Step 2: propagate the w and b in the layers
Step 3: calculate the cost function and its gradient
Step 4: backpropagate update the weight (w) and bias (b), w=w+dw, b=b+db
Step 5: repeat step 2~4 until the cost function is low enough
A detailed example of linear regress will be presented below:
with input x=[1,2,…10], y=[2,4, …, 20], in other word, y=2x (w=2, b=0)
We have a linear activation function (g=1), and Mean Squared Error loss function L=(a-y)²/2. We omit the superscripts for simplicity (since we only have one lay).
Step 1: initiate w=1, b=0
Step 2: propagate
Step 3: calculate the cost function J
Calculate the gradient for weight
Similarly, calculate the gradient for bias
Step 4: update the weight (w) and bias (b), where alpha is the learning rate
similarly, for b
Now repeat step 2~4 to until we get pretty low cost function J, and then w and b will be close to 2 and 0
The code to do that (python)
import numpy as np
import matplotlib.pyplot as plt
#function to fit
x=np.arange(1,11) #input x=[1,2,...10]
y=2.*x #output
#learning rate
alpha=0.01
#criteria for cost function J
J_CRIT=0.01
#def cost function
def J_calc(a,y):
J=0.5*np.mean( (a-y)**2. ) #Mean Squared Error
return J
#Step 1: Initate w (weight), b (bias)
w=1
b=0
g=1 #linear activation function
J=100
w_list=[w]
b_list=[b]
while J>J_CRIT:
#Step 2: Propage
a2=g*(w*x+b)
#Step 3: calculate cost function
J=J_calc(a2,y)
#Step 4: calculate the gradient
#4.1.1 dJ_dw=dJ/dw
dJ_dw=np.mean((a2-y)*x)
#4.1.2 dw=-alpha*(dJ/dw)
dw=-alpha*dJ_dw
#4.2.1 dJ_db=dJ/db
dJ_db=np.mean(a2-y)
#4.2.2 db=-alpha*(dJ/db)
db=-alpha*dJ_db
#4.3 update w and b
w=w+dw
b=b+db
w_list.append(w)
b_list.append(b)
print('w_list: ')
print(w_list)
print('b_list: ')
print(b_list)
x=np.arange(0,11,1) #include 0 to plotting
plt.clf()
for (w,b) in zip(w_list,b_list):
plt.plot(x,w*x+b,color='blue',alpha=0.3)
plt.plot(x,w_list[-1]*x+b_list[-1],color='blue',label='final fit')
plt.scatter(x,2*x,color='red',label='data')
plt.grid(alpha=0.3)
plt.xlabel('x')
plt.ylabel('y')
plt.xlim(0,10)
plt.ylim(0,20)
plt.legend()
plt.show()
The outputs
w_list:
[1, 1.385, 1.61875, 1.760676125, 1.846855961875, 1.89919220538125, 1.9309819765339844, 1.9502979545119492, 1.9620410610579773, 1.969186644705852]
b_list:
[0, 0.055, 0.08827499999999999, 0.108361, 0.12044020312499999, 0.127658723190625, 0.13192656466275, 0.13440329030675335, 0.1357928699055286, 0.13652268284828456]
The plot
Appendix
The format for derivation is as follows: the down arrow is the additional information to arrive at the next step.
Link: https://youtu.be/nNNRM-Wf_Z0
There is a video that I made on my YouTube Channel. I think it may be helpful.