Linear Regression with PyTorch

Published in

Learn The Part

7 min readFeb 16, 2019

This medium article is an excerpt from our PyTorch for Deep Learning and Computer Vision course. The course covers a lot of ground and incorporates the latest ideas in teaching Deep Learning using Pytorch.

The sections include:

Linear Regression
Perceptrons
Deep Neural Networks
Image Recognition
Convolutional Neural Networks
CIFAR 10 Classification
Transfer Learning
Style Transfer

The full codes can be found at the following github repository:

rslim087a/PyTorch-for-Deep-Learning-and-Computer-Vision-Course-All-Codes-

PyTorch for Deep Learning and Computer Vision Course (All Codes) …

github.com

The main focus of this section is to get you familiar with common machine learning algorithms and train a linear model to properly fit a set of data points. You will learn various fundamental concepts which involve training a model. This includes loss functions, gradient descent optimization, learning rates and so on.

In future articles we will use these fundamental concepts to train very complex models through the applied theme of advanced visual imagery and sophisticated style transfer techniques.

However in this section, you will familiarize yourself with very common machine learning concepts that you will use to train a simple linear model to fit a set of data points.

Train on Colab

We will use Google Colab for training the model. Google provides free processing power on a GPU. You can see this tutorial on how to create a notebook and activate GPU programming.

PyTorch Installation

Installing PyTorch on Google Colab is very simple. We will be doing it in our first cell with the following code:

!pip3 install torch

Imports

This is followed by the basic necessary imports:

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np

Load The Data

We want our tensor to have 100 rows and one column such that there will be one hundred points, where each point has a single value within the normal distribution. X = torch.rand(100, 1) will create a column of 100 distributed values each centered at 0 with a small variance. We multiply it by 10 to increase the range of the values.

y = X would create a straight line since the slope is equal to 1. So we add to the same variance distribution torch.randn(100, 1) and multiply by 3 to create noise.

X = torch.randn(100, 1)*10
y = X + 3*torch.randn(100, 1)
plt.plot(X.numpy(), y.numpy(), 'o')
plt.ylabel('y')
plt.xlabel('x')

The code above will create the following distribution:

Plotting the Line Fit

we unpack the model parameters into a list of two elements w for weight and b for bias. The weight is a 2 dimensional tensor with 1 row and 1 column so we must specify the 0 index for row and column. The bias only has a single dimension which can accessed at the first index. We create a function get_params() which will return those two parameters.

[w, b] = model.parameters()
def get_params():
  return (w[0][0].item(), b[0].item())

We then proceed to create our line fit function plot_fit() we specify our line x component to have a range from -30 to 30. Then we write the y component which holds the for y = mx+b where w1 is the slope in this case and b1 is the y-intercept. We proceed to plot our line using plt.plot(x1, y1, ‘r’) followed by plotting the scattered points using plt.scatter(X, y).

def plot_fit():
  w1, b1 = get_params()
  x1 = np.array([-30, 30])
  y1 = w1*x1 + b1
  plt.plot(x1, y1, 'r')
  plt.scatter(X, y)
  plt.show()

We then proceed to plot the untrained fit:

plot_fit()

and as we could have predicted, the untrained linear model is not optimized to fit the scattered points.

Create The Model

We will begin by creating our linear regression model class with the argument class LR(nn.Module): such that this subclass will leverage code from our base class. This is immediately followed by the init method where the first argument for our initializer here is usually self which simply represents the instance of the class.

Initializing a linear model requires that we have an input size as well as an output size. They will be included as our second and third argument in our constructor: def__init__(self, input_size, output_size) .Calling super().__init__() will simply allow us more freedom in the use of multiple inheritance from parent classes.

This is just boilerplate code that you always need to write to create your custom class.

class LR(nn.Module):
  def __init__(self, input_size, output_size):
    super().__init__()
    self.linear = nn.Linear(input_size, output_size)
  def forward(self, x):
    pred = self.linear(x)
    return pred

We then declare our linear model self.linear = nn.Linear(input_size, output_size) .

This is followed by the forward method, In def forward, where the first argument is self which is the instance to the class, followed by x which is the input being passed in, and we return our prediction from the model using self.linear(x) .

We then initialize our model by inputting 1 as the first argument for the input and 1 as the second argument for the output.

model = LR(1, 1)

Loss Function

We will then initialize our mean square Loss function criterion = nn.MSELoss() . This measures the average of the squares of the errors — that is, the average squared difference between the estimated values and what is estimated.

This is followed by setting theoptimizer which will use a gradient descent algorithm notably stochastic gradient descent (SGD) as optimization algorithm that we can access from torch. SGD is used to minimize the total loss. The sample will converge by frequently updating the weights of our model within the same sample size.

criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

the optimizer takes two parameters. The first is model.parameters() which is the array of weight and bias which will be updated throughout training. and the second one is the learning rate lr which is a tunable paramater that was chosen to be 0.01. The learning rate corresponds the tiny steps the network will take to reduce the error. The smaller the learning rate, the smaller the steps.

Training

We will train our model for a specified number of epochs. An epoch is simply whenever we perform a single pass throughout the entire dataset. We iterate through this dataset calculating the error function and back propagated the gradient of this error function to update the weights. To ensure optimized results we chose 100 epochs.

We then specify an empty array of losses losses = [], where we append all of our loss values to plot in the future. We then initialize a for loop with the range of epochs specified.

epochs = 100 
losses = [] 
for i in range(epochs):   
   y_pred = model.forward(X)   
   loss = criterion(y_pred, y)   
   print("epoch:", i, "loss:", loss.item())      
   losses.append(loss)     optimizer.zero_grad()   
   loss.backward()   
   optimizer.step()

We first grab the predictions of each iteration with y_pred = model.forward(X) so for each x value, we make a prediction using the forward method all of which is stored inside of y_pred. We then compute the loss, that is equal to the criterion which we set equal to the means squared error, and we will calculate the mean squared error between both the predicted values as well as the actual values loss = criterion(y_pred, y). The computed loss is then appended to losses .

Now that we have computed the loss, we must minimize that loss with gradient descent. We take the gradient of the loss function and compute its derivative using loss.backward() . Then we update our model parameters with optimizer.step() . Since gradient losses accumulate, We must set the gradient to zero so we call optimizer.zero_grad() right before the loss.backward() method.

We then run the code and let our model train.

Results

The model starts with a loss of 34 and ends with a loss of 10.

…

We then proceed to plot our loss and we get the curve below:

plt.plot(range(epochs), losses)
plt.ylabel('Loss')
plt.xlabel('epoch')

Let us plot our new linear model by simply calling:

plot_fit()

We see our linear model, which has been trained to fit on the provided dataset.

Subscribe to our newsletter

If you enjoyed this article and are also interested in free or discounted courses, feel free to subscribe to our mailing list.