Use of Essential Mathematical Objects of Linear Algebra in Machine Learning and their implementation using MxNet

Anshul Gupta
6 min readJul 28, 2021

--

Linear Algebra by far is the most important aspect when it comes to machine learning. These mathematical objects discussed below with the coding implementation form the basis of getting started with Machine Learning and act as a stepping stone for building Neural Networks.

1. Scalars

Scalars, in simple terms, represent a single numerical value. For an instance, taking the example of Celsius to Kelvin scale conversion, we need to convert the temperature from Celsius to Kelvin using the formula k= c+273. In this formula 273 is a scalar value while the placeholders, k, and c are known as variables having to represent unknown scalar values of temperature at a particular scale. Mathematically, these scalar variables are denoted using lowercase alphabets (a, b, c, etc.) belonging to a set of space ℝ having every real-valued scalar. As per the literature stated, it should now be clear that aℝ represents a real-valued scalar, while there can be other similar representations if the given scalar is an integer (ℤ), natural number (ℕ) or so. Similarly, if a, b ∈ {1,2}, represent that a, b are numbers having their value as either 1 or 2. Also, a scalar is addressed by a tensor with only one component scalars referred to as 0-ordered tensor (or in simpler terms an array with a single value) in Machine Learning which can be implemented as the built-in python types with the Numpy Library (if GPU is not required) and using mxnet if the case is different.

The following code snippet refers to the operations that can be performed on a scalar and the corresponding outputs.

2. Vectors

Vectors can be considered as a combined list of scalars or 0-D Tensors whereby the individual scalar value is a component or an element of a vector. In Machine learning, each vector (also known as feature vector) itself holds a particular column of our data frame whose values are of great significance while making predictions. For example, while predicting the house rent, the area of the house, number of rooms, etc. separately form a vector that can act as an essential component in determining the final price. Similar other scenarios can be generated as in the case of fake news classification the text column can be treated as a separate feature vector while training the model. Mathematically, these vectors are a subset of vector spaces (ℝ^n representing an n-dimensional real-valued vector space)and are represented by bold lowercase alphabets (a, b, c, etc.) whereby each component (the scalar) of this vector is represented with a letter in subscript. Say a∈ℝ^n then a is a vector belonging to n-dimensional real-valued vector space as shown,

where a1, a2, …, an are the elements of the vector a, each of which can be accessed by using its particular index. The above explanation also clearly justifies why a vector is known as a 1-D tensor (or in simpler terms an array holding multiple values across a single dimension) in Machine Learning.

The following code snippet refers to the formation of vectors and performing different operations of addition, subtraction, and element-level multiplication with the corresponding outputs and internal working at each step.

3. Matrices

Matrices again derive themselves from a combination of vectors just as equivalently vectors were composed of scalars conventionally denoted using bold uppercase letters (like A, B, C) and are coded as tensors along with two axes due to which it is also known as 2-D Tensor (or in simpler terms an array having multiple values across two dimensions).

In mathematical documentation, we use M∈ℝ(r×c) to communicate that the matrix M comprises r rows and c columns, formulated by real-valued scalars. Outwardly, we can show any matrix M∈ℝ(r×c) as a table, where every component m(i,j) has a place with the ith row and jth column in the matrix M as shown in the figure below.

It should also be clear if r = c i.e. the number of rows of a matrix equals the number of columns in a matrix then the matrix is called a square matrix. Next, if we swap the axis of the given matrix i.e., the first column becomes the first row, the second column becomes the second row, and so on until the last row becomes the last column of the matrix, then the corresponding matrix will be known as the transpose of the given matrix. Here M’s transpose would be represented by M^T whereby if K=M^T, then k(i,j)= m(j, i) for every value of i and j that exists. Thus M^T is represented as shown

Also if the transpose of a matrix M is equivalent to the original matrix then, the matrix is called a symmetric matrix.

Similarly, there are other various operations like the addition of two matrices (provided they have the same rows x columns (dimensions)), matrix product (provided the number of columns of the first matrix = number of rows of the second matrix) that returns a matrix of dimension having rows = the rows of first matrix and columns = columns of the second matrix.

All of these operations are implemented in the code snippet shown below with necessary interpretations and outputs.

Matrices play a primary role in deep learning where the weights of a particular feature vector/s are to be kept or stored. By and large, the matrices further help in the efficient model formation and reducing the computations in Deep Learning. Overall, matrices are valuable information carrying structures: they permit us to arrange the information that has various levels of diversity. For instance, rows in our matrix may relate to various houses, while sections or columns may carry different attributes. Subsequently, albeit the default direction of a sole vector is a column vector, in a matrix that addresses a plain dataset, it is more advantageous to deal with every information model as in the case of a row vector.

4. Tensors

Similarly, as vectors sum up scalars, and matrices sum up vectors, we can assemble information structures with significantly more axes. For the purpose when our data is incompatible to fit in with two dimensions, tensors act as a savior and helps in accommodating more dimensions to our data set. Simply tensors can be compared to n-d or the n-dimensional arrays, where if:

n is 0, it forms a scalar.

n is 1, it forms a vector.

n is 2, it forms a matrix.

n≥3, it has the name associated with it is a tensor.

Tensorflow itself is the biggest implementation of a tensor which involves a tensor in its every computation.

Also, there are a variety of operations that can be performed using tensors including those of what has been discussed in the previous three sections. Certain element-wise operations must be performed only if the two tensors are of the same shape just as it was in the addition of matrices. A few of such operations have been shown in the code below with necessary output-based explanations in comments.

Summary

The above material can be summarized in the following figure whereby a vector formed by a scalar, matrix formed by the composition of vectors, and tensors finally represented as n-d arrays. Overall every mathematical object has its own significance and is used for different purposes in Machine Learning but it becomes important to gain appropriate experience of working and using these algebraic structures so as to know when to use what.

Hope you had a good read here!!

Will be happy to connect with you on LinkedIn and discuss more on such stuff and research work in Machine Learning!

--

--