Linear regression with multiple targets

4 min readNov 15, 2023

In this article we will go through how linear regression can be used not only to predict a single variable, y, but a matrix of target variables, Y.

You may have noticed in the scikit-learn documentation for linear regression that in fitting, y can refer to both a vector (n_samples,) or a matrix (n_samples, n_targets):

Most commonly only one target needs to be resolved for but with no effort, this can be extended to multiple targets if necessary. There may be times when it is convenient or faster to make predictions on multiple targets assuming the targets are somewhat related and share the same set of independent features, X.

Note: we will try to follow the standard notation that a lower case letter refers to a vector (single column) and an upper case one refers to a matrix (e.g. multiple columns).

Derivation of linear regression

The standard regression formula can be expressed as:

y = X.w + e

The objective is to find the set of weights, w, that minimises the sum of the squared residuals, e. To solve, the equation needs to be rearranged to find w in terms of y and X. But you cannot simply divide by a matrix. And a matrix needs to be square (equal rows and columns) for it to have an inverse. As such we multiple by the transpose of X -> Xt to give Xt.X, which is always a square matrix. And finally multiplying by the inverse of Xt.X -> (Xt.X)^:

-> y = X.w
-> Xt.y = Xt.X.w
-> (Xt.X)^.Xt.y = (Xt.X)^.Xt.X.w

Now the (Xt.X)^.Xt.X can disappear and we are left with:

w = (Xt.X)^.Xt.y

Where (Xt.X)^.Xt is the pseudoinverse, Z:

Z = (Xt.X)^.Xt

Such that:

w = Z.y

If X is a matrix of n_samples rows and n_features columns then Z, the pseudoinverse, is a matrix of n_features rows and n_samples columns. An (n_features, n_samples) matrix multiplied by an (n_samples, 1) vector gives w a dimension of (n_features, 1) — i.e. one coefficient for each feature.

Z(n_features, n_samples) . y(n_samples, 1) = w(n_features, 1)

Adding targets

Going back to basics, two matrices, A and B, can be multiplied together if the number of columns on the first matches the number of rows on the second. This results in a matrix, C, with the number of rows of the first and the number of columns of the second:

A(p, q) . B(q, r) = C(p, r)

This is quite convenient because in solving for the weights in linear regression, increasing the number of columns on y, is not prohibited in any way. So if the number of targets is increased from 1 to n_targets, this gives:

Z(n_features, n_samples) . Y(n_samples, n_targets) = W(n_features, n_targets)

And this also aligns nicely with the scikit-learn documentation for coef_ — though as can be seen in the scikit-learn GitHub repository, the transpose is taken so that the resultant coefficients takes the shape (n_targets, n_features):

Comparison

If all is good and correct, linear regression with multiple targets should give exactly the same output as if each target was fitted independently. This can be verified:

from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
import numpy as np


n_targets = 5
n_features = 10

X, Y = make_regression(n_samples=100,
                       n_features=n_features,
                       n_targets=n_targets
                       )

# fitting all targets at once
linear_model = LinearRegression()
linear_model.fit(X, Y)

# fitting each target independently
coefs = np.empty(shape=(n_features, n_targets), dtype=np.float64)
for t in range(n_targets):
    lm = LinearRegression()
    lm.fit(X, Y[:, t])
    coefs[:, t] = lm.coef_

np.testing.assert_allclose(linear_model.coef_, coefs.T, rtol=10e-9, atol=10e-9)

Considerations

We mention above about how this can be convenient especially when the targets are related and have the same feature set, such as when predicting multiple KPIs for a single product. But we may wish to avoid this approach if the targets are completely unrelated and have a different sets of features.

Whilst standard linear regression has been used in this example, the same principles applies to other regression models that have a similar closed-form solution such as Ridge regression.

Multivariate regression

Finally, this is commonly known as multivariate regression. Depending what you read online, the terminology can be mixed up but this is how I interpret the following definitions:

— simple linear regression: single independent variable and single dependent variable

— multiple linear regression: multiple independent variables and single dependent variables

— multivariate linear regression: multiple independent variables and multiple dependent variables