The Math Behind Gradient Descent

Zackary Nay
3 min readFeb 10, 2024

The following is a brief derivation of the batch gradient descent algorithm.

In Part 1–1 I discussed the Perceptron algorithm, which took the predicted output of 1 or -1 to update the weights of the net input. There is a better way however. In this article I will go through the derivation of the gradient descent optimization algorithm.

An activation function, is a function that determines the output of a node based upon the inputs and weights. In our case lets define the activation function as the following :

Or, equivalently:

This activation function is simply just the identity function, but it will be important that we get used to the activation functions as it will be a reucuring function in the series. Note that w and x are in this case vectors and we are taking the dot product, the output of which is the same as the predicted value of the equation discused in Part 1–1.

A cost functition is a function that measures the performance of a machine learning model. Lets now define the cost function that we would like to minimize:

J(w) is the sum of squared errors between the true class label and the predicted value, with a inconsequential one-half in order to make then derivation a little cleaner. Ideally we would like to change w such that it approaches the global minimum of J(w). Hence, lets find the gradient with respect to the j-th weight:

Then applying chain rule we get:

Finally, with some simplification we get:

Which is the exact same as the Perceptron algorithm! Thus, we can change the j-th weight by the following formula:

In conclusion, the derivation is not too difficult, and by minimizing the cost function we can see where the Perceptron algorithm comes from. The name batch gradient descent comes from us applying this gradient descent on all of the training set data.

If you would like to see the proof of convergence of the Perceptron algorithm or why the algorithm for non-seperable data, post a comment!

Zackary Nay is a full-time software developer working within the construction industry, where he implements artificial intelligence and other software to expedite repetitive tasks. In his free time, he likes to read and go backpacking.

--

--