Computational Linear Algebra: Norms and Special Kind of Matrices

Computational Linear Algebra: Lecture 4 — Exploring Norms and Special Matrices

Monit Sharma
10 min readMay 22, 2023

Introduction:

Welcome back to our blog series on Computational Linear Algebra! In the previous lectures, we covered the foundational concepts of scalars, vector matrices, matrix addition and subtraction, matrix multiplication, identity and inverse matrices, as well as linear dependence and span. In this fourth lecture, we will dive into the world of norms and special types of matrices, including diagonal matrices, symmetric matrices, and orthogonal matrices.

We will however see an important concept for machine learning and deep learning. The norm is what is generally used to evaluate the error of a model. For instance it is used to calculate the error between the output of a neural network and what is expected (the actual label or value). You can think of the norm as the length of a vector. It is a function that maps a vector to a positive value. Different functions can be used and we will see few examples.

Norms:

Norms play a crucial role in measuring the size, magnitude, and distance of vectors in a vector space. They provide quantitative measures that help us analyze and compare vectors. Let’s explore some commonly used norms.

Norms are any functions that are characterized by the following properties:

  1. Norms are non-negative values. If you think of the norms as a length, you easily see why it can’t be negative.
  2. Norms are 0 if and only if the vector is a zero vector
  3. Norms respect the triangle inequality.
  4. The norm of a vector multiplied by a scalar is equal to the absolute value of this scalar multiplied by the norm of the vector.
  5. It is usually written with two horizontal bars:||x||

The triangle inequality

The norm of the sum of some vectors is less than or equal to the sum of the norms of these vectors

Let’s see the code results:

u = np.array([1, 6])
v = np.array([4, 2])
u+v

array([5, 8])

np.linalg.norm(u+v)

9.433981132056603


np.linalg.norm(u)+np.linalg.norm(v)

10.554898485297798

u = [0,0,1,6]
v = [0,0,4,2]
u_bis = [1,6,v[2],v[3]]
w = [0,0,5,8]
plt.quiver([u[0], u_bis[0], w[0]],
[u[1], u_bis[1], w[1]],
[u[2], u_bis[2], w[2]],
[u[3], u_bis[3], w[3]],
angles='xy', scale_units='xy', scale=1, color=sns.color_palette())
# plt.rc('text', usetex=True)
plt.xlim(-2, 6)
plt.ylim(-2, 9)
plt.axvline(x=0, color='grey')
plt.axhline(y=0, color='grey')

plt.text(-1, 3.5, r'u', color=sns.color_palette()[0], size=20)
plt.text(2.5, 7.5, r'v', color=sns.color_palette()[1], size=20)
plt.text(2, 2, r'u+v', color=sns.color_palette()[2], size=20)

plt.show()
plt.close()

Geometrically, this simply means that the shortest path between two points is a line.

P-norms : General Rules

Here is the recipe to get the p-norm of a vector:

  1. Calculate the absolute value of each element
  2. Take the power p of these absolute values.
  3. Sum all these powered absolute values.
  4. Take the power 1/p of this result.

This is more condense expressed with the formula:

This will be clear with examples using these widely used p-norms.

The L1 norm

p=1 so this norm is simply the sum of the absolute values:

The Euclidean Norm ( L² norm)

The Euclidean norm is the p-norm with p=2 . This may be the more used norm with the squared L2 norm.

Let’s see an example.

So the L2 norm is 5.

The norm can be calculates with linalg.norm function from numpy. We can check the result.

np.linalg.norm([3, 4])

5.0

Here is the graphical representation of the vectors:

u = [0,0,3,4]

plt.quiver([u[0]],
[u[1]],
[u[2]],
[u[3]],
angles='xy', scale_units='xy', scale=1)

plt.xlim(-2, 4)
plt.ylim(-2, 5)
plt.axvline(x=0, color='grey')
plt.axhline(y=0, color='grey')

plt.annotate('', xy = (3.2, 0), xytext = (3.2, 4),
arrowprops=dict(edgecolor='black', arrowstyle = '<->'))
plt.annotate('', xy = (0, -0.2), xytext = (3, -0.2),
arrowprops=dict(edgecolor='black', arrowstyle = '<->'))

plt.text(1, 2.5, r'u', size=18)
plt.text(3.3, 2, r'uy', size=18)
plt.text(1.5, -1, r'ux', size=18)

plt.show()
plt.close()

In this case, the vector is in a 2-dimensional space but this stands also for more dimensions.

The Squared Euclidean Norm

The squared L2 norm is convenient because it removes the square root and we end up with the simple sum of every squared values of the vector.

The squared Euclidean norm is widely used in machine learning partly because it can be calculated with the vector operation.

x = np.array([[2], [5], [3], [3]])
x
euclideanNorm = x.T.dot(x)
euclideanNorm

array([47])

np.linalg.norm(x)**2

47.0

Derivative of the Squared L2 norm

Another advantage of the squared L2 is that its partial derivative is easily computed.

Derivative of the L2 ||

In the case of the L2 norm, the derivative is more complicated and takes every elements of the vector into account.

One problem of the squared L2 norm is that it hardly discriminates between 0 and small values because the increase of the function is slow.

The Max norm

It is the L\infty norm and corresponds to the absolute value of the greatest element of the vector.

Matrix Norms: The Frobenius Norm

This is equivalent to take the L2 norm of the matrix after flattening.

The same Numpy function can be use:

A = np.array([[1, 2], [6, 4], [3, 2]])
A


np.linalg.norm(A)

8.366600265340756

Expression of the dot product with norms

x = [0,0,0,2]
y = [0,0,2,2]

plt.xlim(-2, 4)
plt.ylim(-2, 5)
plt.axvline(x=0, color='grey', zorder=0)
plt.axhline(y=0, color='grey', zorder=0)

plt.quiver([x[0], y[0]],
[x[1], y[1]],
[x[2], y[2]],
[x[3], y[3]],
angles='xy', scale_units='xy', scale=1)

plt.text(-0.5, 1, r'x', size=18)
plt.text(1.5, 0.5, r'y', size=18)

plt.show()
plt.close()

We took this example for its simplicity. As we can see, the angle is equal to 45°

and



# Note: np.cos take the angle in radian
np.cos(np.deg2rad(45))*2*np.sqrt(8)

4.000000000000001

Special Matrices:

In addition to norms, special types of matrices have unique properties that make them useful in various applications. We will see other type of vectors and matrices in this section. It is not a big section but it is important to understand the next ones. Let’s explore a few notable examples:

Special Kinds of Matrices and Vectors

Diagonal Matrices

A matrix Aij is diagonal if its entries are all zeros except on the diagonal (when i = j).

In this case the matrix is also square but there can be non square diagonal matrices.

The diagonal matrix can be denoted diag(v) where is the vector containing the diagonal values.

The numpy function diag() can be used to create square diagonal matrices.

v = np.array([2, 4, 3, 1])
np.diag(v)

array([[2, 0, 0, 0],
[0, 4, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 1]])

The multiplication between a diagonal matrix and a vector is thus just a ponderation of each element of the vector by :

Non square matrices have the same properties:

The invert of a square diagonal matrix exists if all entries of the diagonal are non-zeros. If it is the case, the invert is easy to find. Also, the inverse doesn’t exist if the matrix is non-square.

Let’s check with Numpy that the multiplication of the matrix with its invert gives us the identity matrix:



A = np.array([[2, 0, 0, 0], [0, 4, 0, 0], [0, 0, 3, 0], [0, 0, 0, 1]])
A

array([[2, 0, 0, 0],
[0, 4, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 1]])

A_inv = np.array([[1/2., 0, 0, 0], [0, 1/4., 0, 0], [0, 0, 1/3., 0], [0, 0, 0, 1/1.]])
A_inv

array([[0.5 , 0. , 0. , 0. ],
[0. , 0.25 , 0. , 0. ],
[0. , 0. , 0.33333333, 0. ],
[0. , 0. , 0. , 1. ]])



A.dot(A_inv)

array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])

Symmetric Matrices

The matrix A is symmetric if it is equal to its transpose

This concerns only square matrices.

A = np.array([[2, 4, -1], [4, -8, 0], [-1, 0, 3]])
A
A.T

array([[2, 0, 0, 0],
[0, 4, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 1]])

Unit Vectors

A unit vector is a vector of length equal to 1. It can be denoted by a letter with a hat u.

Orthogonal Vectors

Two orthogonal vectors are separated by a 90 angle. The dot product of two orthogonal vector gives 0.

x = [0,0,2,2]
y = [0,0,2,-2]

plt.quiver([x[0], y[0]],
[x[1], y[1]],
[x[2], y[2]],
[x[3], y[3]],
angles='xy', scale_units='xy', scale=1)

plt.xlim(-2, 4)
plt.ylim(-3, 3)
plt.axvline(x=0, color='grey')
plt.axhline(y=0, color='grey')

plt.text(1, 1.5, r'u', size=18)
plt.text(1.5, -1, r'v', size=18)

plt.show()
plt.close()

In addition, when the norm of orthogonal vectors is the unit norm they are called orthonormal.

It is impossible to have more than n vectors mutually orthogonal in R^n. For instance try to draw 3 vectors in a 2-dimensional space ( R²) that are mutually orthogonal…

Orthogonal Matrices

Orthogonal matrices are important because they have interesting properties. A matrix is orthogonal if columns are mutually orthogonal and have a unit norm (orthonormal) and rows are mutually orthonormal and have unit norm.

Under the hood of an orthogonal matrix

This means that

and

are orthogonal vectors and also that the rows

and

are orthogonal vectors.

Property 1:

A orthogonal matrix has this property

We can see that this statement is true with the following reasoning.

Let’s have the following matrix.

and thus

Let’s do the dot product

And we know that the columns are orthogonal which means that:

We have the identity matrix

Property 2 :

A = np.array([[np.cos(50), -np.sin(50)], [np.sin(50), np.cos(50)]])
A

array([[ 0.96496603, 0.26237485],
[-0.26237485, 0.96496603]])

col0 = A[:, [0]]
col1 = A[:, [1]]
row0 = A[0].reshape(A.shape[1], 1)
row1 = A[1].reshape(A.shape[1], 1)

Let’s check that rows and columns are orthogonal:

col0.T.dot(col1)

array([[0.]])

row0.T.dot(row1)

array([[0.]])


A.T.dot(A)

array([[1., 0.],
[0., 1.]])

A.T

array([[ 0.96496603, -0.26237485],
[ 0.26237485, 0.96496603]])

numpy.linalg.inv(A)

array([[ 0.96496603, -0.26237485],
[ 0.26237485, 0.96496603]])

Conclusion:

In this lecture, we explored the concept of norms and their significance in measuring vector magnitude and distance. We also discussed special types of matrices, including diagonal matrices, symmetric matrices, and orthogonal matrices, highlighting their unique properties and applications. A solid understanding of these concepts and matrices is essential for computational linear algebra, enabling us to solve complex problems, perform transformations, and analyze data efficiently.

In the next lecture, we will continue our exploration by delving into eigenvalues, eigenvectors, and their practical applications. These concepts are crucial in understanding the behavior and transformations of matrices in various scientific and computational fields. Stay tuned for an exciting journey into the world of eigenvalues and eigenvectors!

--

--