## ScPyT

# Linear Algebra in Python

## A thorough Linear Algebra Bootcamp as a Machine learning Practitioner

As a data scientist or machine learning practitioner, how good is your linear algebra?

No matter you have a positive or negative answer to this question, hopefully, after reading this post and practicing a bit, you will be able to grasp most of the Linear Algebra you need to know for your day-to-day work.

Well… I know it may sound a bit exaggerating, how is that possible? The reason is, we do not need to know or actually calculate everything by hand. Off-the-shelf packages like Python Numpy or Matlab have already done a lot of hard work for us underneath the hood. But, in order to correctly utilize these tools at hand, we still need to have a comprehensive understanding of Linear Algebra itself, and this is exactly what I am going to tell you in this article.

The code will be posted on my Github repository: https://github.com/frankligy/ScPyT

# Concepts you need to know in Linear Algebra

It’s key to speak Linear Algebra language, which allows you to better communicate with Linear Algebra experts and understand more advanced concepts. **I always found out that the most difficult part for every beginner to delve into linear algebra world is its somewhat counterintuitive terminologies**. So I decided to clearly explain everything from here. The concepts that I will cover based on my own learning notes from Prof. Gilbert Strang MIT Open CourseWare and Deep learning book written by Dr. Ian Goodfellow et al, but I add my own understanding and distill them into a coherent way.

I will provide a brief explanation for all the concepts and optionally I will introduce related Numpy function so you can play with them while reading, but I would also like to refer you to Wikipedia whenever you want to know more about them.

Let’s get started!

Vector space, subspace, span, column space, row space, null space, left-null space, rank, basis, orthogonal matrix, symmetric matrix

Imagine we have a matrix A, in python, A.T means transpose of A, @ means matrix multiplication.

a = np.random.randn(2,3)

a

# array([[ 0.39, 0.54, -0.71],

[-1.84, -0.6 , 0.53]])

a.T

# array([[ 0.39, -1.84],

[ 0.54, -0.6 ],

[-0.71, 0.53]])b = np.random.randn(3,2)

b

# array([[ 1., -0.],

[ 1., -0.],

[-3., 1.]])a @ b

# array([[-1.09762048, 0.96911979],

[-1.97859822, -3.15604207]])

What is a span? The ** span** of a set of vectors is all linear combinations of these vectors. Think about vector (0,1) and (1,0), a span of these two vectors would be the whole x-y plane.

Then thinking about a x-y-z 3D space (** Vector space** R3), it is composed of

**— (1,0,0), (0,1,0), and (0,0,1), since every element in Vector space can be written as a linear combination of the elements in the basis.**

*basis*Another concept is subspace, Span of Vector (1,0,0) and (0,1,0) constitutes a ** subspace** of x-y-z 3D vector space R3. (A subset of a larger vector space)

** Column space** of A is the span of all column vectors of A,

**of A**

*row space***is the span of all row vectors of A. If A @ x=0, the span of all solutions x constitutes the**

**of A. If A.T @ x= 0, span of all solutions x constitutes the**

*null space***of A. These four spaces will keep occuring in lots of linear algebra tutorials.**

*left-null space*How many linearly independent column vectors in matrix A? This is ** rank** of A. To better understand linearly independent, for instance (1,2,3) and (10,20,30) are linear dependent, because (10,20,30) is a multiple of (1,2,3)! Then how to compute the rank of a matrix?

`x = np.random.rand(2,3)`

np.linalg.matrix_rank(x)

# 2

# So, x has two linearly independent column vectors

Two vectors’ inner products are 0 and they are all of the unit lengths, then they are ** orthonormal vectors**. If matrix A’s rows and columns are orthonormal vectors, this matrix A is an

**. In math, it means A.T @ A = A @ A.T = I.**

*orthogonal matrix*** Symmetric matrix** means A = A.T, also, symmetric matrix and orthogonal matrix only apply to real value matrix, in a complex matrix, we have the counterparts, namely

**and**

*hermitian matrix***We will cover them in later part**

*unitary matrix*,**.**

Inverse, determinant

You can compute the ** determinant** of any square matrix.

`x = np.random.rand(4,4)`

np.linalg.det(x)

# 0.08437697073565216

Only invertible (determinant != 0) square matrix will have inverse. If A @ B = I, B is A’s ** inverse matrix**.

Gaussian Elimination, row echelon form, reduced echelon form, leading coefficient, pivot, elementary row operation

** Gaussian Elimination** can be used in (a) solving a linear system Ax=b, (b) compute inverse matrix (c) solve rank (d) solve determinant (details please refer to Wikipedia), via a series of

**including (1) swap rows, (2) scale rows, (3) Add one row to another.**

*elementary row operations*There are two other concepts that we should be familiar with,

** Row echelon form**: the first non-zero element from the left in each row (

**,**

*leading coefficient***) is always to the right of the first non-zero element in the row above.**

*pivot*** Reduced row echelon form**: row echelon form whose pivots are1 and column containing pivots are zero elsewhere.

The process of ** Gaussian Elimination** aims to obtain a

**, it is a simplified form of any linear system. Also, the ultimate goal would be getting a**

*row echelon form***, which allow you to solve Ax=b just by looking at the form itself.**

*reduced row echelon form*

Inner product, outer product, hadamard product, Projection, Gram-Schmidt process

First, let’s understand the inner product, outer product, Hadamard product of two 1D arrays:

`a = np.array([4,5,6])`

b = np.array([7,8,9])

np.inner(a,b). # 122

np.outer(a,b). # array([[28, 32, 36],

[35, 40, 45],

[42, 48, 54]])

a * b. # element wise or hadamard product

# array([28, 40, 54])

How to project one vector (a) to another vector (b)?

`a = np.array([4,5,6])`

b = np.array([7,8,9])

proj_b_a = np.inner(a,b) / np.inner(b,b) * b

# array([4.40206186, 5.03092784, 5.65979381])

** The Gram-Schmidt process** is to orthonormalize a set of vectors (v1,v2,v3…vn) to (u1,u2,u3…un) in the same Rn vector space but within which each element is of unit length and are mutually orthogonal. In a sense, it transforms a set of vectors into a set of orthonormal vectors in the same vector space. It is very helpful since, in quite a lot of cases, it is required for your matrix to be orthogonal. Hence, the

**can be your go-to approach when it comes to orthonormalize anything (vector or matrix).**

*Gram-Schmidt process*

LU decomposition, QR decomposition.

** LU decomposition** aims to decompose a matrix (no need to be square) to a lower triangular matrix (L, entries above diagonal are 0) and an upper triangular matrix (U, entries above diagonal are 0). It is related to Gaussian decomposition but has better computational efficiency. In order to make LU decomposition materialize, sometimes we reorder the matrix using a P matrix.

a = np.random.randn(3,4)

p,l,u = scipy.linalg.lu(a)#p

array([[0., 1., 0.],

[1., 0., 0.],

[0., 0., 1.]])#l

array([[ 1. , 0. , 0. ],

[-0.71426919, 1. , 0. ],

[-0.85039495, 0.12237106, 1. ]])#u

array([[-2.09219711, -0.48959089, 0.81707073, 0.77602155],

[ 0. , -1.53448255, -1.72785249, 0.04775144],

[ 0. , 0. , 1.5052947 , -0.27769281]])

QR decompostion aims to decompose a matrix to an orthogonal matrix (Q) and an upper triangular matrix (R). It is used in QR algorithms to solve the linear least square problem. Also, the resultant matrix Q sometimes is exactly what we want to have after the *Gram-Schmidt process.*

a = np.random.randn(3,4)

q,r = np.linalg.qr(a)#q

array([[-0.47569189, 0.71339517, 0.51457221],

[-0.82969021, -0.55818202, 0.00685511],

[ 0.29211536, -0.42367461, 0.85741964]])

#r

array([[-1.34089268, -1.73408897, -0.07436536, 0.78464807],

[ 0. , -1.66272812, 0.63477604, -1.60036506],

[ 0. , 0. , 0.43098896, -0.31316029]])

Eigen-decomposition, diagonalization, Characteristic polynomial.

I will illustrate Eigen-decomposition using a 3*3 square matrix, it will have 3 eigenvectors and cognate 3 eigenvalues.

ei = np.random.randn(4,4)

w,v = np.linalg.eig(ei)# w. eigenvalues

array([-2.21516912+1.65582705j, -2.21516912-1.65582705j,

1.45929568+0.99548974j, 1.45929568-0.99548974j])

# v. eigenvector, 4*4

array([[-0.1070701 -0.40447805j, -0.1070701 +0.40447805j,

-0.03773179+0.56113399j, -0.03773179-0.56113399j],

[ 0.15294619-0.0172865j , 0.15294619+0.0172865j ,

0.65763758+0.j , 0.65763758-0.j ],

[ 0.12433426+0.47215025j, 0.12433426-0.47215025j,

0.3236955 +0.37453337j, 0.3236955 -0.37453337j],

[-0.75023815+0.j , -0.75023815-0.j ,

0.00770123+0.07813095j, 0.00770123-0.07813095j]])

** Diagonalization** has a lot of thing to do with Eigen-Decomposition. Let us denote the original matrix as A, then by doing Eigen-Decompositon (see figure above), A can be decomposed into three part:

**V (eigen-vectors matrix)**times

**D**times

**V.I**

**(inverse of V, think about multiply V.I on both side of the equation)**. Since D is a diagonal matrix. We call this process the diagonalization of original matrix A. Generally, if a matrix D is symmetric (as below), then D it is diagonalizable. Here A is a diagonal matrix.

A square matrix A (n*n) can have a ** characteristic polynomial**, we are trying to compute:

Moore-Penrose pseudo-inverse, hermitian, conjugate transpose, full-rank matrix

We know only an invertible square matrix has an inverse, how about a non-square matrix? we have** pseudo-inverse**.

** Moore-Penrose** defines the

**(A+,n*m) of a rectangular matrix (A, m*n) if A+ meets four particular conditions:**

*pseudo-inverse*(1) AA+A = A

(2) A+AA+ = A+

(3) (AA+)* = AA+

(4) (A+A)* = A+A

Symbol * means ** conjugate transpose**, for a complex square matrix (meaning the entry may be the complex number), the conjugate transpose is to first transpose the matrix and then conjugate (flip the sign of imaginary part but leave real part untouched) every entry of the matrix.

`a = np.random.randn(3,4)`

np.linalg.pinv(a)

#

array([[-0.09988152, 0.23637842, 0.29312192],

[-1.07695892, 0.49075645, -0.36409413],

[ 0.03471007, 0.57286174, -0.13192266],

[ 0.83630477, -0.20498149, 0.30882525]])

A ** hermitian matrix** is a complex square matrix whose conjugate transpose is equal to itself. Its real value counterpart is a

**.**

*symmetric matrix*If A is ** full-rank** (rank=min(m,n)) matrix, A+ can be expressed as :

(1) A+ = (A*A)-1A*. A has independent columns

(2) A+ = A*(AA*)-1. A has independent rows

Singular Value Decomposition (SVD), singular matrix, complex unitary matrix

Only the square matrix has eigenvalues, the rectangular matrix can have ** singular values**, a square matrix whose determinant = 0 will be a

**, or in another word, it is non-invertible.**

*singular matrix*A complex matrix will be ** unitary** if its conjugate transpose happens to be its inverse. U*U = UU* = I. The real value counterpart is the

**.**

*orthogonal matrix*So a rectangular matrix A can be decomposed into three components:

U, S, and Vh, diagonal of S matrix contain ** singular values** for matrix A. This decomposition is called

**. SVD can be really convenient to compute pseudo-inverse, matrix rank. Furthermore, SVD is also the default solution for PCA in quite a lot machine learning package (i.e. scikit-learn). Understanding SVD can be thought of as a milestone during your journey in Linear Algebra.**

*Singular Value Decomposition (SVD)*svd = np.random.rand(4,5)

u,s,vh = np.linalg.svd(svd)#u. shape (4,4)

array([[ 0.66581103, 0.61992005, -0.21849383, -0.3530655 ],

[ 0.47769768, 0.06749606, 0.56339472, 0.67069784],

[ 0.44673911, -0.66240419, 0.31234136, -0.51389467],

[ 0.35906096, -0.41516755, -0.73300048, 0.40177286]])#s. s is the np.diag(S), S will be of shape (4,5)

array([2.35262119, 0.87561858, 0.31537598, 0.043907 ])# vh of shape (5,5)

array([[ 0.29743919, 0.569923 , 0.3576598 , 0.52216343, 0.43144237],

[-0.61102456, 0.15085062, 0.50163189, -0.46496796, 0.36886763],

[ 0.49893385, 0.39660661, -0.29349238, -0.68317228, 0.2022525 ],

[ 0.52538762, -0.60314147, 0.5557502 , -0.15677528, 0.16355866],

[-0.11494248, -0.3624299 , -0.47481454, 0.14088043, 0.78111244]])

Lp Norm, Frobenius norm, einsum

What are L1 and L2 norms? Basically, it represents different metrics to define a vector’s distance from the origin. Let’s define a more general Lp norm:

It is worth noting that, usually you use L2 norm to signify the magnitute of a vector, it can be seen from the projection part where we use inner product to compute the denominator.

For a matrix, we usually encounter ** Frobenius norm** like above.

a = np.array([4,5,6])

np.linalg.norm(a,ord=3). # L3 norm for vector a

# 7.398636222991409x = np.random.rand(2,3)

np.linalg.norm(x,ord='fro'). # frobenius norm for matrix x

# 1.309085506183174

Finally, I’d like to introduce the ** Einstein Sum function** in the Numpy package. This is a somewhat magic function that provides a generic solution for nearly all the basic matrix operations you can encounter, including, inner product, outer product, matrix multiplication, trace, diagonal, Hadamard product, summation, row sum, column sum, transpose.

Basically, this function requires two arguments, first is called “subscript”, second is called “operands”, let us see a matrix multiplication example to understand that:

x = np.random.rand(2,3)

y = np.random.rand(5,3)

np.einsum('ij,kj -> ik',x,y)#

array([[0.3676856 , 0.33156855, 0.39793874, 0.70939856, 0.8353566 ],

[0.19219345, 0.19312881, 1.36739081, 1.0606612 ,1.15039307]])

As we can see, we use `i`

and `j`

represent the dimensions of the operand `x`

, `k`

and `j`

to represent dimensions of the operand `y`

. And we specify the output matrix to be in the dimension `i`

* `k`

. The `einsum`

function will automatically figure out it is a matrix multiplication problem. With that understood, let’s delve into all the interesting examples `einsum`

function can perform. **A basic rule is that the dimension that only happens in input subscripts but not in output subscript means this dimension needs to be summed**.

`# einsum`

x = np.random.rand(2,3)

np.einsum('ij -> ji',x) # transpose

np.einsum('ij ->',x) # sum

np.einsum('ij -> i',x) # column sum

np.einsum('ij -> j',x) # row sum

x = np.random.rand(2,3)

y = np.random.rand(5,3)

np.einsum('ij,kj -> ik',x,y) # matrix multiplication

a = np.array([4,5,6])

b = np.array([7,8,9])

np.einsum('i,i ->',a,b) # inner product

np.einsum('i,j ->ij',a,b) # outer product

np.einsum('i,i ->i',a,b) # hadamard product

y = np.random.rand(5,3)

np.einsum('ij -> j',y) # diagonal

np.einsum('ij ->',y) # trace

# How important is Linear Algebra?

Not everyone needs to know Linear Algebra, and to what extent you should understand Linear Algebra heavily depends on your type of work. But it is reasonable to forecast that machine learning and other statistical models will become more and more accessible and by then, simply knowing how to run a machine learning model might not be sufficient to make you stand out. So I would suggest to equip yourself with the necessary knowledge from now and make sure you always be competitive in the foreseeable future.

Thanks for reading! Next time I will cover the statistical model in Python. If you like this article, follow me on medium, thank you so much for your support. Connect me on my Twitter or LinkedIn, also please let me know if you have any questions or what kind of NumPy tutorials you would like to see In the future!

Github Repository: