Basic Linear Algebra for Machine Learning

Manuel Leiria
5 min readMay 14, 2020

--

This is a brief recap of the most basic matrix operations used in machine learning algorithms with some code examples in Python NumPy.

Let’s consider the following system of two equations with two unknown variables:

We can write this system in it’s matricial form:

Ax = b

where

Using numpy arrays we can write the matrix A and vector b:

import numpy as npA = np.array([[4.0, -5.0],[-2.0, 3.0]])
b = np.array([-13., 9.])
# Let's check the dimension and data type of matrix A:
print(A.shape)
print(A.dtype.name)

(2, 2)

float64

So, matrix A has a dimension of 2 x 2. The first index represents the number of rows of the matrix and the second index represents the number of columns. By default the numpy arrays are 64 bits floating point numbers but we can specify the type if we want. For instance:

a = np.zeros([3, 4], dtype=int)
print(a.dtype)

int64

Matrix Operations

The product of two matrices
𝐀∈ℝ𝑚𝑥𝑛 and 𝐁∈ℝ𝑛𝑥𝑝 é a matriz 𝐂=𝐀𝐁∈ℝ𝑚𝑥𝑝, onde:

And here the big rule is: we can only multiply two matrices if the number of columns of the first matrix is equal to the number of rows of the second matrix. Always keep this in mind when solving those classical linear regression problems y = wx + b

Let’s implement this multiplication in python:

a = np.array([[1,2],[3,4],[5,6]])
b = np.array([[7,8,9],[10,11,12]])
a_rows, a_cols = a.shape
b_rows, b_cols = b.shape
c = np.zeros([a_rows, b_cols])
for i in range(a_rows):
for j in range(b_cols):
for k in range(a_cols):
c[i, j] += a[i, k] * b[k, j]

print(c)

[[ 27. 30. 33.]
[ 61. 68. 75.]
[ 95. 106. 117.]]

In practice we don’t use this code because those nested for loops are computational expensive and numpy has native support. This is how we do:

d = np.dot(a,b)
print(d)

[[ 27 30 33]
[ 61 68 75]
[ 95 106 117]]

Just to have an ideia of the performances:

import timesize = 500
m1 = np.random.rand(size, size)
m2 = np.random.rand(size, size)
m1_rows, m1_cols = m1.shape
m2_rows, m2_cols = m2.shape
def slow(m1, m2):
c = np.zeros([m1_rows, m2_cols])
for i in range(m1_rows):
for j in range(m2_cols):
for k in range(m1_cols):
c[i, j] += m1[i, k] * m2[k, j]
return c
tic = time.process_time()
resSlow = slow(m1, m2)
toc = time.process_time()
print(toc-tic)
tic = time.process_time()
resFast = np.dot(m1, m2)
toc = time.process_time()
print(toc-tic)
#Just to check we're getting the same result
print(resSlow[0,0])
print(resFast[0,0])

54.270800048999945
0.012746195999966403
117.45054333256053
117.45054333256049

It’s the difference between running a simulation in half a hour or half a year.

Note: In numpy dot executes matrix multiplication. The symbol * is used to execute element-wise multiplication.

Some properties of matrix multiplication:

  • Associative: (𝐀𝐁)𝐂=𝐀(𝐁𝐂)
  • Distributive: 𝐀(𝐁+𝐂)=𝐀𝐁+𝐀𝐂
  • (In general) non commutative: 𝐀𝐁≠𝐁𝐀

Given two vectors 𝐱,𝐲∈ℝ𝑛 , the product 𝐱^𝑇𝐲, it’s called dot product:

Note: if we look at vector as matrix of only one column, to be compliant with the multiplication rule stated above, we must transpose (more about this below) the first vector.

# Here we declare two vectors so there's no need to transpose
a = np.array([1,2])
b = np.array([2,4])
print(np.dot(a,b))
# Here we declare two vectors as a onde column matrix, so we must transpose the first vector
# Number of columns of the first matrix must be equal to the number of rows of the second matrix
aa = np.array([[1], [2]])
bb = np.array([[2], [4]])
print(np.dot(aa.T,bb))

10
[[10]]

because if we don’t:

# Here we declare two vectors as a onde column matrix, so we must transpose the first vector
# Number of columns of the first matrix must be equal to the number of rows of the second matrix
aa = np.array([[1], [2]])
bb = np.array([[2], [4]])
print(np.dot(aa,bb))

ValueError Traceback (most recent call last)
<ipython-input-10–5187dbcd391f> in <module>
8 aa = np.array([[1], [2]])
9 bb = np.array([[2], [4]])
— -> 10 print(np.dot(aa,bb))

<__array_function__ internals> in dot(*args, **kwargs)

ValueError: shapes (2,1) and (2,1) not aligned: 1 (dim 1) != 2 (dim 0)

Outer product:

Given two vectors 𝐱∈ℝ𝑚 e 𝐲∈ℝ𝑛 we define the outer product 𝐱𝐲.𝑇∈ℝ𝑚𝑥𝑛 as a matrix with entries given by:

Identity Matrix:

𝐈∈ℝ𝑛𝑥𝑛, it’s a square matrix filled with ones where i = j and zeros else where:

This type of matrix has an interesting property: For any matrix 𝐀∈ℝ𝑛𝑥𝑛,𝐀𝐈=𝐀=𝐈𝐀.

In numpy, this matrix is given by the eye function:

I = np.eye(2, 2)
x = np.array([ [1, 2], [3, 4] ])
print('Identity matrix\n', I)print('Some square matrix\n', x)print('Dot product\n', np.dot(I, x))

Identity matrix
[[1. 0.]
[0. 1.]]
Some square matrix
[[1 2]
[3 4]]
Dot product
[[1. 2.]
[3. 4.]]

Transpose Matrix:

Simply stated: swap the rows by the columns of the matrix. More formally we say that given a matrix 𝐀∈ℝ𝑚𝑥𝑛, it’s transpose 𝐀^𝑇∈ℝ𝑛𝑥𝑚 it’s a matrix of dimensions 𝑛 x 𝑚 where the entries are given by: (𝐴^𝑇)𝑖𝑗=𝐴𝑗𝑖. We have the following properties:

A = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ])
print('Matrix A\n', A)
print('Matrix transpose of A\n', A.T)

Matrix A
[[1 2 3]
[4 5 6]
[7 8 9]]
Matrix transpose of A
[[1 4 7]
[2 5 8]
[3 6 9]]

Symmetric Matrix:

We say that a square matrix, 𝐀, is symmetric if 𝐀=𝐀^𝑇

--

--