Backpropagation algorithm part 4

Herman Van Haagen
3 min readJun 21, 2023

--

Linear regression

In this lesson, we are going to train a real machine learning model using what we have learned so far (epoch, learning rate, update rule, squared error, gradient descent, chain rule). You can find the code related to this lesson on GitHub. We are going to train a regression line that best fits the data. Let’s take the following data as an example.

Data points that can be best modeled using linear regression

This line has been artificially created with a slope coefficient of a=2 and an intercept term of b=3. Some noise has been added to it. In formula form we have the following:

y = 2x + 3 + noise

During the training, we eventually want to find values that yield approximately a=2 and b=3. The general regression line is:

y = ax + b

Since we want to optimize a and b, this is a function of two variables. Therefore,

y = f(a, b) = ax + b

During optimization, we can take the partial derivative with respect to a or b. Using the fraction notation, the partial derivative with respect to a is as follows:

Now, we need to define the error we want to minimize. The y values are predictions, and we can compare them with the actual values Y (a data point from the graph). We use the squared error E as you already know from the gradient descent example.

E = (y-Y

Taking the derivative of this with respect to y, we have:

Now, we can use the chain rule and take the derivative of the error E with respect to a. This will be the derivative of the error multiplied by the derivative of the regression line (chain rule).

Similarly, you can do this for the intercept b.

These are the derivatives we ultimately need for the update rules. The ‘2’ cancels out because we merge it with the learning rate. In Python code, it looks as follows:

We have used both the partial derivative and the chain rule. Another observation is that we have two update rules, one for each variable we want to optimize. During the training, we find the following regression line:

The red line is the best fit as a regression line using Gradient Descent

In the Python code on GitHub, we set the initial values to 10. You can see that during the training, the value for a starts at 10 and eventually converges to 2.

You can view the entire Python code on GitHub. That concludes this lesson on linear regression. In the next lesson, we will train a perceptron. The Perceptron is the fundamental building block of a neural network.

--

--

Herman Van Haagen

Data scientist by profession. Deep Learning & AI. Based in The Netherlands. Follow my blog on Machine Learning and AI tutorials