Mathematics Fun!!! in Neural Network

Woow I love Maths !!! I am feeling excited to do some maths in machine learning and describe the use of linear algebra, differential calculus that we studied in school and college.

a simple neural network model on which we do and understand the use of maths is comprised of one input layer containing two inputs neurons (i1,i2), one hidden layer containing two hidden neurons (h1,h2) and one output layer containing two output neurons (o1,o2). w1 and w2 are the weights of the input neuron i1 connected with the h1 and h2 neuron of the hidden layer. b1 is the bias neuron whose weights v11 connected to the h1 neuron of hidden layer. w3 and w4 are the weights of the input neuron i2 connected to the h1 and h2 neuron of hidden layer. it’s same as in our brain every neuron is connected to the every other neuron. similarly same connections happens for hidden layer to the output layer.

Image by myself

now we focus on how we calculate the value of h1 and h2. as we see from the above image h1 depends on w1 of i1, w3 of i2, v11 of b1. h2 depends on w2 of i1, w4 of i2, v12 of b2. we apply relu function on net(h1) and net(h2).

Image by myself

suppose we at the output layer we have sigmoid function as we use this function in binary classification type problem because it gives value in range [0,1]. then we take probability of sigmoid output to classify whether final output is 0 or 1.

to calculate o1 and o2 of the output layer.

Image by myself
Image by myself

Total error is

Image by myself

we can see below how sig or sigmoid(o1) affect total error, that's how partial differentiation comes to action. as you can see above total error depends on target(o1),sig(o1),target(o2) and sig(o2). finding partial differential means differentiation of function with respect to one variable making all other variable constant.

from here backward propagation starts to update the weights of every neuron. lets see how sig(o1) and sig(o2) affects total error by using partial differentiation.

Image by myself

lets see how weights w5, w6, w7 and w8 affect the total error. we want to calculate the partial differentiation of the total error function with respect to each weight of each neuron of particular layer. taking as an example how weight w5 affect total error and updates itself during back propagation using learning rate eta.

Image by myself

I hope, after looking my above maths calculations in the image i have made you enjoying fun with maths. keep learning and doing awesome work and be happy always:).

--

--

Amir Khan
Secure and Private AI Math Blogging Competition

Deep Learning, Machine Learning, NLP, Pytorch, Tensorflow, Reinforcement Learning and Computer Vision Enthusiast