Math for Machine Learning -Part 1

Swaathi Sundaramurugan
Analytics Vidhya
Published in
8 min readOct 15, 2021

Many of you would have always wanted to learn ML and shift careers maybe. But the fear of venturing into Math basics always would have kept you away from exploring the subject. Coming from a programming background, I was afraid to step into ML because of Math. But once I actually started to understand the concepts, I was able to explore more content without fear.

This series is a part of my #30DaysOfData learning journey, where I have committed to learning about Data Analysis and Machine Learning. Do check out my GitHub repository to learn together.

All the best on your learning journey! Let’s get started!

Scalars

If you are from a physics background, you would say that scalars are some values with magnitude. They do not have a direction property. To put scalars in simple words, they are just numbers on which basic arithmetic functions can be performed.

Example: 2, 0, -5, 1.7 and so on

Vectors

Vectors are called scalars with directions in physics. It’s a value that is located in a particular direction. As programmers, we would call vectors a list of some numerical values.

In Math, we consider vectors to be coordinates in a graph. That is the given vector a, is considered to lie on an XY plane where 2 is the x-coordinate and 1 is the y-coordinate. We can also consider a vector to be a column matrix (i.e., a matrix with a single column [n x 1])

So, when you come across a vector always assume it as a line in XY plane (or any dimension plane according to the number of elements in the vector) which is rooted in the origin.

An object must satisfy two properties to be considered a vector,

  • A vector can be added together with another vector and produce a resultant vector
  • A vector can be multiplied with a scalar to produce a vector as a resultant product

When a vector is multiplied by a scalar (a number), the vector either increases in length (expands) or decreases in length (shrinks). The direction of the vector is not affected when multiplied by a scalar.

Now let’s learn more about matrices before diving deep into the vectors.

Matrix

A matrix is a collection of values that are arranged in rows and columns.

A lot of real-world problems can be represented as matrices and can be solved by applying basic operations on matrices. For instance, let’s say we have an image and we need to process it to make it look differently (maybe change the tone of the image). We can represent the image in a matrix (a collection of rows and columns contain pixel values) and then perform some operations to change the tone of the image.

To put it in other words, matrices are collections of vectors. You can picture each column of the matrix as a vector.

A matrix dimension is represented as m * n where m represents the number of rows and n represents the number of columns. The total number of elements in a matrix can be acquired by the product of m and n.

Let’s peek into a dataset and see how it is represented as a matrix for further analysis and calculations.

This is a small dataset (a table) containing some properties of three persons.

Firstly, we neglect all the column names and the columns that are not numerical. Some non-numerical columns (here, Wears Glasses) can be converted to numerical values (that we can learn later). And the remaining values can be transformed into a matrix.

This matrix is then processed and analyzed to learn more about the dataset and answer several problem statements.

Matrix Multiplication

When two matrices are multiplied, we simply sum the products of elements multiplied by the row of the first matrix and the column of the second matrix.

For two matrices to be multiplied with each other,

the column size of the first matrix = the row size of the second matrix

Here to check if these two matrices A and B can be multiplied or not,

Let’s take the number of columns in A, which is 2. Now let’s take the number of rows in B, which is 2. Since both of them are equal, matrices A and B can be multiplied.

The new resultant matrix would have a dimension of

(number of rows in A * number of columns in B)

For more details on matrix multiplication steps by hand, check this link.

As programmers, we can write a single line of code to perform this action in a program, but still knowing to solve the problem by hand is beneficial.

In Python, we can use a library called NumPy to perform matrix multiplication easily.

matrix_c = numpy.matmul(matrix_a, matrix_b)

Do check out this link to learn more about the syntax used.

Inverse of matrix

Only square matrices (matrices that have the same number of rows and columns) can have an inverse.

Non-square matrices cannot have an inverse and they are called singular or degenerate matrices.

For calculating the result by hand, check this link.

In Python, we can use the NumPy library to calculate the inverse of a matrix

import numpy as npmatrix_a = np.array([[1,2][3,4])
inverse_a = np.linalg.inv(matrix_a)
print(inverse_a)
# Answer is
# [[ -2 1 ]
# [1.5 -0.5]]

Transposition of matrix

Interchanging the rows and columns of the matrix is the transposition of the matrix.

Inner Products

We just saw how to multiply two matrices, now let’s see how to multiply two vectors.

So the elements of the same coordinates are multiplied and every product is summed up.

The inner product of two vectors is represented between two brackets <>

We know that a matrix is a collection of vectors. Hence in matrix form, we can achieve the dot product or inner product of two vectors by,

The inner product of two vectors always produces a scalar value (a number)

In geometrical form, the inner product of two vectors can be represented as,

The \theta represents the angle between the two given vectors. This angle can be used to find the correlation between the vectors. The correlation here means whether the vectors are dependent on each other or independent.

The angle between the vectors have certain implications as mentioned above,

If angle = 90 ,

The resultant inner product = 0 and the correlation between the vectors is 0

Hence, the vectors are independent of each other.

If angle = 0,

The resultant inner product = |u||v| and the correlation between the vectors is 1

Hence, the vectors are totally dependent on each other.

If the angle is between 0 and 90,

The resultant inner product is between minimum and maximum value of the multiplication of length of two vector

Hence, the vectors are partially dependent on each other.

You might be wondering what is meant by a vector being independent or dependent on another vector. To understand that, first, we must learn about the basis and span of vectors.

Basis vectors

Any vector that has a value (or magnitude) of 1 from the origin in any direction is called a basis vector.

Here are some examples of basis vectors,

The most common vectors are the unit vectors that lie exactly on the x and y-axis in an XY plane.

So, we can denote any vector as a linear combination of basis vectors.

Linear combination is a linear equation that contains many different scalar multiplications and vectors additions of vectors.

Here is an example of linear combinations,

This equation has two scalar products that are added together.

Span of Vectors

The span of vectors is the set of all possible vectors that can be formed when the given vectors are summed and multiplied with scalars. It is basically the set of all their linear combinations.

The span of most vectors ends up being the whole plane (i.e., every point of the plane) if the given vectors are not in the same direction. Whereas, if the given vectors are in the same direction, then their span is a straight line (in the case of two-dimensional planes).

For a visual representation of the basis and span of vectors, do check out this video.

Linear Dependency

If a vector lies in the span of another vector(s), then the vector is said to be linearly dependent. In other words, if a vector can be expressed as a linear combination of other vectors, then it’s said to be dependent.

If a vector is not in the span of another vector(s), then it is said to be linearly independent. In other words, a vector cannot be expressed in form of the linear combination of another vector(s). When two vectors are independent of each other, there is no correlation between them.

--

--

Swaathi Sundaramurugan
Analytics Vidhya

Data Engineer Intern | Graduate Student at Simon Fraser University | Full Stack Developer | Writer