Math behind Artificial Neural Networks

Published in

Analytics Vidhya

6 min readJul 20, 2020

Artificial neural networks seen to be useful in many applications in recent times like prediction, classification, recognition,translation and many more. The current example is an application of simple ANN in predicting the output given the input numbers.

We will be considering an example of an machine which takes in input A, B, C and produces an Output. The example includes training the artificial neural network with set of data(training data) and testing with a different set of data which network wasn’t fed before(test data). Data in this case, is collected from experiments with different experimental settings. Example, given input settings A=1,B=1,C=1 produces an output =1, number of runs has to be performed with different experimental settings to get the data. Acquired data has to be divided into two sets- training set( used for training the neural network, test set- used for testing the performance of trained neural network) with ratio of train to test data generally being 80:20.

ANN is similar to human neural network consisting of connected neurons processing information. ANN architecture is shown below with three layers, input layer- layer through which input information is fed, hidden layer- layer connecting input and output layers processing the information, output layer-layer delivering the output.

Below is the sample data set with inputs A,B,C and output labelled “Target”. I have shown only 10 rows of data, in general there has to be more sets of data for training the network.

Normalizing the inputs is not needed for the current example, however input normalization is must when inputs are of different scales.

A simple feed forward back propagation artificial neural network is shown below with one input, hidden and output layer.

For better understanding, let us consider feeding only one input set from first row in the above table, A=1,B=1,C=1 with an output target =1. Once the architecture is chosen, one has to initialize the weights for synapses connecting different layers. We can initialize weights randomly or choose small values to start with. Here, I have initialized weight of 0.1 for all the connections, making the math simple in following discussion.

Forward Pass and Back-Propagation

The training consists of two steps -Forward pass : The inputs pass through the network into output layer producing the output. Back-propagation: The error is propagated backwards into the network adjusting the weights.

Forward Pass:

A simple neuron is shown below, it does two functions

1. Summation of input values with respective weights at each node.

2. Activating the input signal using activation function F(X). There are few activation functions like sigmoid, ReLu, tanh etc. in practice, current example uses sigmoid activation function at nodes in both layers.

Let us only consider calculations at hidden node 1, H1 in the above ANN figure, Value at hidden layer 1 = (1 * 0.1) + (1 * 0.1) + (1 * 0.1) = 0.3

Applying sigmoid activation function = sigmoid(0.3) = 1/(exp(0.3)+1) = 0.57

Similar calculations are done at each hidden node and the values are passed onto output layer. Value at node in output layer,

Value at Output node = (0.57 * 0.1) + (0.57 * 0.1) + (0.57 * 0.1) + (0.57 * 0.1) = 0.228

Applying sigmoid activation function = sigmoid(2.28) = 1/(exp(2.28)+1) = 0.56

Back Propagation:

Error is the difference between the value predicted by network(output) and the original value(Target), value of error is Error = 0.5 * (1–0.56)² = 0.0968. Error value here is calculated only for one data point. Error is normally calculated once over all the data points or even in batches.

Error propagation at Output node:

Change of error with respect to only one weight is shown below, which is the weight connecting node H1 in hidden layer to output layer.

Below the term on left hand side of equation is the change by which weights connecting hidden-output layer has to be updated to minimize the error so that output value matches the target. By applying chain rule we get:

Splitting and calculating each term in the above equation:

Grouping all the terms 1, 2, 3 to get the error information by which the weight

Error information by which the weights at output node has to be updated is

= -(1–0.56) * 0.246 * 0.57 = -0.0617

Similar calculations for Error information at hidden node:

Splitting and calculating each term in the above equation:

Grouping all the above terms to get the error information at hidden layer,

Error term = -(1–0.56) * 0.246 * 0.1 * 0.245 * 1 = -0.00265

The weights get updated by error information at each node in hidden and output layers

The updated weights after first iteration, weights connecting input-hidden layers

= 0.1-(-0.00265 )

Updated weights connecting hidden-output layers,

= 0.1-(-0.00617)

With the new updated weights the feed forward pass and back propagation iterates until the weights settle at particular value to minimize the error making the output value match the target value. The algorithm iterates for all the rows in data table and finally the ANN gets settled with particular weights making it ready for prediction. The final step after training is to feed the input data into trained neural network for prediction results.

This article provides the basic understanding of artificial neural networks, however there are many concepts to explore like bias, learning rate, activation functions, momentum factor, architectures which makes the neural network robust and efficient..

Please report any mistakes. Thank you for reading.

Math behind Artificial Neural Networks

Written by Sai