Intuitional Linear Algebra Cheatsheet
This post reviews basic linear algebra for introductory machine learning.
Note: It will develop over time. Constructive comments are welcome.
Linear algebra is a branch of math providing a way of compactly representing and operating on sets of linear equations. For example, consider the following set of equations:
In linear algebra notation, it can be written as:
The concepts have geometrical meanings. Understanding and imagining their intuition helps practitioners have a sense of the problem and come up with appropriate solutions instead of just following a set of arithmetical steps.
Linear Algebra basics
The story begins with vectors. We, as computer science people, look at them as a list of values. For instance, a vector can show the values of different features of a house like its area, its distance from the city center, being in one of the main streets or not, its price, and other features. Vectors are shown with an arrow ending at a point that the values of those elements determine. The following figure shows vector [3, 3] in 2D space. We consider all vectors starting from the origin [0, …, 0].
The following example shows another vector [-1.01, 5.51, 1.67] in 3D space. Red, Green, and Blue show x, y, and z axes, respectively.
Vector Add and Product Operations
For adding vectors, they have to be of the same dimension. Being of the same dimension means having the same number of elements (for vectors). It means that they should be in the same space. For adding them we just add their corresponding elements together. See the following example:
v1 = [1, 3, 5]
v2 = [-2, 4, 1]
v1 + v2 = [-1, 7, 6]
v1 = [3, 4]
v2 = [4, 1]
v1 + v2 = [7, 5]
From geometry view,
In subtraction, only the second vector is considered negated (all of its values are negated).
The length or magnitude of a vector is a number showing how much it stretches in the space it resides (or the distance the end of the vector from the origin) and is calculated as follows:
On the dot product (scalar product), the corresponding elements are multiplied together and summed up and the final result is just a value. In the geometrical view, it is the result of multiplying the length of two vectors and the cosine of the angle between them. As we know that cosine of 90 degrees is zero, the dot product gets handy for finding orthogonal or perpendicular vectors. The following figure demonstrates the dot product.
For finding the angle between two vectors we can trust what we saw in the dot product, so:
The number of the dot product of two vectors can be interpreted as a size of a vector in one of the source vectors' directions. This is important when we want to understand the projection that we will later in this article. I guess we can refresh our minds on why we divide the result of the dot product of two vectors by the size of the one that we want to project on it.
It is an expression constructed from a set of terms by multiplying each term by a constant (coefficient) and adding the results. The following figure demonstrates the concept:
They are unit vectors (their length is 1) that every vector in their space can be built as a linear combination of them. In 2D space, i [1, 0] and j[0, 1] vectors can build any vector. In 3D space, i [1, 0, 0], j [0, 1, 0], and k[0, 0, 1] can build any vector in that space. The following figure shows the basis vectors of 3D space, and how a vector can be made with a linear combination of them.
For example, in 3D space consider the v1(5, -1, 0.5) vector that can be built as follows:
v1 = [5, -1, 0.5] = 5 * i + -1 * j + 0.5 * k
= 5 * [1, 0, 0] + -1 * [0, 1, 0] + 0.5 * [0, 0, 1]
= [5, 0, 0] + [0, -1, 0] + [0, 0, 0.5] = [5, -1, 0.5]
Two vectors are called linearly independent when they are not on the same line. In other words, the linear combination of them equals zero, only when the coefficients are zero. This is needed when we want to build a space like our 3D space. That space can be built with i, j, and k, which are linearly independent. Consider this way, when we have the following vectors in 2D space:
v1 = [1, 2]
v2 = [2, 4]
Every combination of these vectors ends in vectors on y = 2x line. These vectors just end in a line. But, the following ones can build a plain because they are linearly independent.
v1 = [1, 2]
v2 = [-2, 1]
Let’s check it how:
a*v1 + b*v2 = 0
[a, 2a] + [-2b, b] = [a-2b, 2a+b] = [0, 0]
a-2b = 0 =>a = 2b (1)
2a + b = 0 => 2a = b => a=(1/2)b (2)
From (1) and (2), we will have: (a = b = 0)
Linear transformations are transformations that keep the linear structure of the previous space. One good way to look at these transformations is that two points in the previous space that was on a line, in the new space will on a line too. The scale of numbers won’t change. The following figure can give a better idea:
Intuitively, they are considered transformers that change the basis vectors of a space. Matrices are columns of vectors that each one can be considered as the new corresponding basis vector. The following figure shows a 3x3 matrix:
Maps a vector or vectors to their corresponding vector in the space that matrix transforms to. Note that the dimension of the vector must correspond to the dimensions of the matrix (transformer). The following figure shows how the calculation is done:
Each column in the first matrix defines the new coordinate system basis vectors. In the second matrix, each column is a vector and in the result matrix, each column is the transformed vector in the new coordinate system. The following figure shows the intuition of transformation that happens in matrix multiplication.
The most important property of matrix multiplication is that:
A*B != B*A
Transpose of a Matrix
It is flipping a matrix over its diagonal or changing rows to columns or vice versa. The following example shows how it operates.
Determinants of a Matrix
We know that the intuition behind a matrix is a transformation. When we are transforming we are changing the shape indeed. The determinant of a matrix, which is a single value, states the area/volume/? in 2D/3D/nD spaces, the scaler that is applied when that transformation is applied. Remember that when we are in the base coordinate system with i[1 0] and j[0 1] basis vectors the area of the unit square built with i and j is one. The following figure shows its intuition in a pretty way. Determinant one means no area changes, less than 1, and greater than zero means squishing. Negative numbers show that flipping happens in that transferring matrix.
When it is zero, it is going to transfer all vectors on a line, look at the following example. A matrix with its determinant equal to zero is called a “Singular Matrix.”
We can look at it from a different view. When the determinant of a matrix is non-zero, this fact shows that the column vectors of that matric are linearly independent.
For a 2-dimensional matrix, the determinant is calculated as follows:
The determinant of a 3-dimensional matrix:
The inverse of a Matrix
The first thing we reviewed about linear algebra was to solve a system of linear equations. The inverse of a matrix is usually used in solving a system of linear equations. The reason for needing it, consider A is a matrix:
A * A^-1 = I
The result of multiplying a matrix with its inverse is the Identity matrix. Also, if we multiply every vector with an identity matrix, the result will be itself.
Consider the following system of linear equations:
It can be stated as:
For solving the equation, the inverse of the A matrix is required. This is calculated for 2D and 3D matrices as follows:
Anyway, these are just for seeing, otherwise computers calculate these for us. But, knowing what is happening under the hood would be insightful.
The projection leans on the dot product that we mentioned earlier. We saw that the value can be interpreted as a vector size in the direction of each of the vectors. But, in this one, we want to mirror one vector on another one, so the vector that mirrors the other one should be a unit vector, and because of that its gratitude is used for division. The following figure shows its intuition.
Eigen Vectors and Eigen Values
The following equation shows how for a matrix, eigenvectors and values can be found:
The intuition is that for matrix A, we want to find vector x when x is transformed with that matrix, it scales with a value on its own.
As the figure shows it gets important because with that transformation the eigenvectors keep the ratio between x and y. It is important in data science when it gets to feature reduction with Principal Component Analysis (PCA), by having those eigenvectors for data covariance, the most important features can be selected and other features that have a negligible effect can be put aside for the sake of doing less computation. The intuition is that as in the following figure, the feature reduction will keep the variance of data that we can use for learning. So, instead of having 3 dimensions, we can work in 2D.
In this concept, a coordinate is added to ease the representation of useful transformations. It is very useful in computer graphics. Furthermore, in data augmentation for image data sets, these techniques are used. For example, for 2D vectors adding a one to vectors the transformation matrices can be considered as follows:
Data augmentation means increasing the number of data we have. If we don’t have enough data, we will end in the overfitting problem, in which the model will exactly remember the dataset instead of learning patterns in the data.
In this post, we reviewed the intuitional linear algebra required for understanding introductory machine learning. Before, starting to learn machine learning, we should have a good understanding of linear algebra.