Deep Learning — Backpropagation Algorithm Basics

Oscar Okello
oscar okello
Published in
7 min readNov 20, 2019

Backpropagation is a method used in artificial neural networks to calculate a gradient that is needed in the calculation of the weights to be used in the network. Backpropagation is shorthand for “the backward propagation of errors,” since an error is computed at the output and distributed backwards throughout the network’s layers. It is commonly used to train deep neural networks.

Backpropagation is a generalization of the delta rule to multi-layered feedforward networks, made possible by using the chain rule to iteratively compute gradients for each layer. It is closely related to the Gauss-Newton algorithm and is part of continuing research in neural backpropagation.

Backpropagation is a special case of a more general technique called automatic differentiation. In the context of learning, backpropagation is commonly used by the gradient descent optimization algorithm to adjust the weight of neurons by calculating the gradient of the loss function.

Backpropagation Algorithm — An important mathematical tool for making better and high accuracy predictions in machine learning. This algorithm uses supervised learning methods for training Artificial Neural Networks. The whole idea of training multi-layer perceptrons is to compute the derivatives of the error function or gradient descent with respect to weights using the backpropagation algorithm. This algorithm is actually based on the linear algebraic operation with a goal of optimising error function by harnessing its intelligence and provisioning updates.

what is the Backpropagation Algorithm

As mentioned above “Backpropagation” is an algorithm which uses supervised learning methods to compute the gradient descent (delta rule) with respect to weights.

This algorithm is used for finding minimum value error function in the neural network during the training model stage. The core idea of backpropagation is to find, what impact it would bring to the overall cost of the neural network if we play around with weights.

Weights are used to minimise the error function, so where it minimises that point is considered as the solution to our learning problem. To understand this better we can take an example below.

Let’s take below table to demonstrate our weights importance

So now if we start playing with weights we will see the real game. With weight as 4, we will have below output. Point to note in below table the difference between the actual and the desired output:

Now let’s compare the two tables above, the weighted output has a huge error margin as 6,18 and 54 for 3 input values and only one value is correct. When we do the square we will notice a further increase. Let’s change our weight value to 3 from 4, the error margin reduces to 0, 3, 9 and 27 but it’s still not optimal. So one thing clear our approach is in the correct direction i.e. reducing weights is the correct decision here. Let’s decrease it further to 2. With weight value as 2 our Desired output value on the spot with zero error margin.

What was done here

  • With initial random value for “Weight” (W), we actually used forward propagate method. This is actually the first step in any neural networks. Forward propagate helps to get the output to be compared with the desired output real value to get the error.
  • We got our error values like 0, 6, 18 and 54 which were really not appealing values for obvious reasons. To reduce the error, the backwards propagation method was used i.e. reduced the value of ‘W’.
  • After reducing there was still an error (0, 3, 9, and 27) though it was decreased, it was not our desired result. Learning was made as “Reducing the value of “W” is in the correct direction and any increment will never yield desired output“.
  • Again we propagated backwards and reduced the value of ‘W’ to 2 from 3.

The whole idea of forward/backward propagation and playing with weights is to reduce/minimise or optimise the error value. After a couple of iteration, the network learns, which side of the number scale it needs to move until error gets minimised. There is a sort of breakpoint where any further update to the weight results in the increase of error and an indication to stop, take it’s as a final weight value.

Backpropagation algorithm Step by Step

In neural networks the learning about to make neuron intelligent on activation process i.e. when to get activated and when to remain mum. The human brain is not designed to accommodate or allow any of the backpropagation principles. The basic steps in the artificial neural network for backpropagation used for calculating derivatives in a much faster manner:

  • Set inputs and desired outputs — Choose inputs and set the desired outputs
  • Set random weights — This is needed for manipulating the output values.
  • Calculating the error — Calculating error helps to check how far is the required output from the actual. How good/bad is the model output from the actual output.
  • Minimising the error — Now at this step, we need to check the error rate for its minimization
  • Updating the parameters — In case the error has a huge gap then, change/update the parameters i.e. weights and biases to reduce it. Repeating this check and update process until error gets minimised is the motive here.
  • Model readiness for a prediction — After the last step, we get our error optimised and once it’s done, we can now test our output with some testing inputs.

The human brain is a deep and complex recurrent neural network. Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. In very simple words and not to confuse anything/anyone here, we can define both models as below.

  • Feedforward propagation — Type of Neural Network architecture where the connections are “fed forward” only i.e. input to hidden to output The values are “fed forward”.
  • Backpropagation (supervised learning algorithm) is a training algorithm with 2 steps:
  • Feedforward the values
  • Calculate the error and propagate it back to the layer before.

Propagating forward help to see the behaviour of neural network i.e how well the performance is. Observe the error and then backpropagation comes in to reduce the error (also update the bias and weight) in the gradient descent manner. In short, forward-propagation is part of the backpropagation algorithm but comes before back-propagating.

The Need of Backpropagation

Backpropagation or backward propagation comes in as a very handy, important and useful mathematical tool when it’s about improving the accuracy of our prediction in machine learning. As mentioned above as well it is used in neural networks as the learning algorithm for computing the gradient descent by playing around with weights.

Backpropagation is a very efficient learning algorithm for multi-layer neural networks as compared with the form of reinforcement learning. In perturbation, we try to randomly perturb one wight at a time to measure the change in performance and saving of any improvement is seen thus quite inefficient.

In backpropagation, computation of efficient error derivatives is possible and that too for all hidden units at the same time. So in this regard backpropagation is far better as you don’t need to randomly change one wight and do the whole forward propagate. This is kind of supervised machine learning algorithm with reason in which it requires a known, desired output for each input value. This way it calculates the loss function gradient descent. This algorithm is emerging as an important machine learning tool for predictive analytics.

How the backpropagation algorithm works

Back-propagation works in a logic very similar to that of feed-forward. The difference is the direction of data flow. In the feed-forward step, you have the inputs and the output observed from it. You can propagate the values forward to train the neurons ahead.

In the back-propagation step, you cannot know the errors occurred in every neuron but the ones in the output layer. Calculating the errors of output nodes is straightforward — you can make the difference between the output from the neuron and the actual output for that instance in the training set. The neurons in the hidden layers must update their errors from this. Thus you have to pass the error values back to them. From these values, the hidden neurons can update their error and other parameters using the weighted sum of errors from the layer ahead.

A step-by-step demo of feed-forward and back-propagation steps can be found here.

If you’re a beginner to neural networks, you can begin learning from Perceptron, then advance to NN, which actually is a multilayer perceptron.

Backpropagation Visualization

For an interactive visualization showing a neural network as it learns, check out my Neural Network visualization.

--

--