The journey to deep learning expertise: Linear Regression — first steps

3 min readJan 6, 2024

Prerequisites: Elementary linear algebra, python

Linear regression is one of the simplest deep learning models ever created; As such, learning it is a brilliant way to get into the world of deep learning and neural networks in particular.

In this blog series, we’ll explain everything linear regression: The history, the maths, and the Pytorch implementation.

Hop on soldier, we’ve got a great journey ahead!

The History

The earliest form of regression was published by Adrien-Marie Legendre in 1805. Some speculate that the method was invented by Gauss, which is also the inventor of the Gaussian distribution (also known as the “Normal distribution”), which we’ll learn and use in the following articles.

Scientists then used neuroscientific discoveries to develop the first artificial neural networks, using linear regression as a starting point.

The deep learning discoveries made in our day and age draw inspiration from much wider sources, from mathematics, linguistics,
psychology, statistics, computer science, and many other fields. I mean, compare that to being based on a single feature of our biological neural system!

The basics

As of notation, we’ll use 𝑛 to denote the number of examples in our dataset.

x(𝑖) denotes the 𝑖th sample and 𝑥 (𝑖) 𝑗 denotes its 𝑗th coordinate.

At the heart of every solution is a model that describes how features(inputs) can be transformed into an estimate of the target (desired output).

The assumption of linearity means that the expected value of
the target can be expressed as a weighted sum of the features.

Let’s say we want to predict the price of an apartment given its area and the number years since it has been built:

Here w area and w age are called weights, and b is called a bias (offset). The weights determine the influence of each feature(input) on the output prediction. The bias determines the value of the prediction when the features are set to zero (w * 0 + w* 0 + b = b).

Given a dataset, our goal is to reach weights w and bias b such that, on average, our estimated price is as close to the real price as possible.

Usually, we won’t have just a handful of features, so we’ll accompany aech weight and feature with its index (between 1 and d on a d-feature example); In this case, we’ll express our prediction 𝑦 as:

y = w1 x1 + w2 x2 + … + wd xd + b

Collecting all the features in a vector x ∈ R, where x has dimensions (d, 1), and all the weights in a vector w ∈ R, where w also has dimensions (d, 1), we then can express our model compactly as the dot product of the weights w and the features x:

y = w.T * x + b

(w.T is the transpose of w)

The Pytorch implementation is as simple as this:

import torch

features = torch.arange(3, 12)
weights = torch.arange(4, 13)

prediction = torch.dot(features, weights)
same_prediction = features @ weights

print(prediction, same_prediction)
# out: tensor(564) tensor(564)

Next: The design matrix and loss functions

The journey to deep learning expertise: Linear Regression — first steps

The History

The basics

x(𝑖) denotes the 𝑖th sample and 𝑥 (𝑖) 𝑗 denotes its 𝑗th coordinate.

Written by Musli Hyseni