Regression, Mapping, Matrices Multiplication

Qiang Chen
Machine Learning and Math
3 min readAug 14, 2018

Regression: The scalar value prediction problem

There many problems in machine learning area. Simply the problems can be classified into two categories, classification, and regression.

  • predicting it is a cat, dog or some other animals by an image is the classification problem
  • predicting house price and temperature are regression problem

A regression example is presented here.

Formulation of the regression problem

As we know, a well-defined machine learning problem should include these elements

  1. The target which needs to be predicted
  2. The provided information for predicting the target. we can use matrices and the meaning of each dimension to represent the provided information

For math language, we will say:

  1. one matrix of shape m×k, and each meaning of k dimension
  2. one matrix of shape m×1, and the unit of the price

The mapping from house information to house price

The current target is finding a mapping which can map the k dimension information to the one price dimension and in the meantime, the mapping result can be closed to the real price as much as possible. The mapping can be very complex, let note the k dimension number as [x₁, x₂…,x𝒌], and the target price is y.

  1. Let guess the map is a complex combination of [x₁, x₂…,x𝒌], here is an example y=2×x₁×x₂+x₃×x₃+5×x₄+(x₅)⁴+…8.9×x𝒌
  2. The mapping can be also very simple. it can be the linear combination of [x₁, x₂…,x𝒌]. such as y = 5.4×x₁+3.8×x₂+2.2×x₃…5.5×x𝒌

If the mapping is the linear combination, it can be expressed by matrices multiplication. as the above example, we can have the coefficient [5.4, 3.8, 2.2, …5.5], the mapping can be expressed as the multiplication of two matrices, one is of shape 1×k, [x₁, x₂…,x𝒌], another is of shape k×1, the price will be generated after multiplication.

How to measure the performance of our coefficient matrix? How much is our predicted price closed to the real price? How do we like our coefficient matrix?

If we have m house instances in the training set, we can transfer the m house’s information into their prices, a matrix of shape m×1.

Y’=X×wX is the house information matrix and it is of shape m×k, w is the coefficient matrix, it is of shape of k×1, Y’ will be of shape m×1, the price matrix.

Then the predicted prices can be compared with the real prices to measure our coefficient matrix performance.

The coefficient performance can be measured as the variable cost, the cost is smaller, the performance is better.

Taking an example where there is only one instance [1, 4, 4, 4565], and the coefficient matrix is [w₀,w₁,w₂,w₃].

y’=w₀×1+w₁×4+w₂×4+w₃×4565

cost(y’,y)=|y’-y|=|y’-2890|=|w₀×1+w₁×4+w₂×4+w₃×4565–2890|

Summary

The calculation of cost can be simplified as matrices multiplication operation. cost =𝐼×|X×w-Y|,𝐼 is a unit matrix of shape 1×m. In the equation, 𝐼,X and Y are given number. The easy to find the best w to make our cost as small as possible is to try all of the possibility. Then you will find the one which can make cost smallest.

Another way is to differentiate cost with respect to w, the result will tell you how to change w to make cost smaller a little. You can choose a w randomly and optimize it in this way. You will find the best w to make cost smallest, but sometime you will be failed. This is the optimization content, I can explain it later.

--

--