Scalars, Vectors, and Matrices

Qiang Chen
Machine Learning and Math
3 min readAug 12, 2018

Scalars

In our life, we use the scalar to describe a lot of things, such as the length of object, the height of a person, the temperature which can be high or low, and the air pollution index. For any of these, we need a number and a unit, such as 10 centimeters, 23 degrees Celsius. Some other unobvious things can be also expressed with scalars by some definition, for example, we can represent colors by three number which is named RGB representation, certainly, we do have many other ways to describe the color. Machine learning can be utilized to deal with anything which can be represented in scalars. Image, video, game, face etc which is involved in machine learning are being represented in scalars or numbers.

Vectors

For the given machine learning tasks, the first thing is to represent the input. Taking house price prediction as an example, the first thing is to represent the house, which can include the size of the house, how many floors the house has, the geographic location of the house etc. We can simply select the three elements to describe one house. How to represent our input is a very important topic in machine learning which will affect the last prediction precision. In order to describe the three elements, we need to use scalars and unit. One of the candidates can be 100 square meters, 4 floors, (30 degrees north latitude, 108 degrees east longitude). If we ignore the unit after the number, we can represent it using a vector, [100, 4, 30, 108], this is of shape 1⨉4 vector, which is also called feature, a 4 dimension feature. and the price is 1 million yuan in RMB. This feature and price with number and unit is a complete representation.

Matrices

In machine learning problem, usually, A training dataset includes a number of training instances. For example a training dataset for house price prediction 300 instances, which can be represented as 300 vectors of shape 1⨉4, which can be also represented as a matrice of shape 300⨉4. besides the feature matrix, you will have another matrix of shape 300⨉1 which means the 300 prices for each house. If the test dataset has 100 records, you will have a 100⨉4 matrics as your test dataset. you need to predict the prices matrix of shape 100⨉1 by given the 100⨉4 matrics.

Summary

We need to select the aspects to describe our objects. After choosing the proper unit for the representation, we can have a list of number to describe the aspects. The list of number forms a feature vector. In a dataset, there are many feature vector which forms a matrix. The matrix can be used to describe the observed information as the input of our machine learning algorithm. Also, we will have another matrix to describe our target.

The matrices and unit for each dimension meaning define a machine learning problem.

There is no doubt that we do have a lot of undefined machine learning problem to be solved. The way to select key information to represent our input, the unit for each dimension is important for our last result, which is always able to be improved.

--

--