Glossary of Deep Learning: Bias

Jaron Collis
Deeper Learning
Published in
2 min readApr 14, 2017

--

Three sigmoid curves — the same input data, but with different biases [Source]

The activation of a node in a neural network is determined by the following:

output = activation_function(dot_product(weights, inputs) + bias)

This means when calculating the output of a node, the inputs are multiplied by weights, and a bias value is added to the result. The bias value allows the activation function to be shifted to the left or right, to better fit the data. Hence changes to the weights alter the steepness of the sigmoid curve, whilst the bias offsets it, shifting the entire curve so it fits better. Note also how the bias only influences the output values, it doesn’t interact with the actual input data.

You can think of the bias as a measure of how easy it is to get a node to fire. For a node with a large bias, the output will tend to be intrinsically high, with small positive weights and inputs producing large positive outputs (near to 1). Biases can be also negative, leading to sigmoid outputs near to 0. If the bias is very small (or 0), the output will be decided by the values of weights and inputs alone.

Bear in mind, though, that the bias in a neural network nodes is not equivalent to the threshold of a perceptron, which only outputs 1 if sufficient input is supplied. Neurons don’t have binary silent/fire thresholds, instead they have activation functions that produce a non-linear output between 0 and 1. So the role of bias isn’t to act as a threshold, but to help ensure the output best fits the incoming signal.

Biases are tuned alongside weights by learning algorithms such as gradient descent. Where biases differ from weights is that they are independent of the output from previous layers. Conceptually bias is caused by input from a neuron with a fixed activation of 1, and so is updated by subtracting the just the product of the delta value and learning rate.

Typically biases are initialised to be zero, since asymmetry breaking is provided by the small random numbers in the weights (see Weight Initialisation).

Note: the term Bias is also used to refer to the systematic errors in a model, the tendency to keep getting the same results incorrectly. This tends to arise when a model is too simple, and so fails to represent the complexity of the underlying data, causing an algorithm to miss relevant relations between salient features and the expected outputs, despite having more than enough training data — a problem called underfitting. Something I explain here.

See also:

--

--