Understanding Eigenvalues and Eigenvectors: Enabling Deep Neural Network Compression

Anish Hilary
7 min readApr 18, 2024

--

Eigenvalues and Eigenvectors are fundamental concepts in linear algebra, widely utilized across diverse domains. In the context of deep neural networks (DNNs), these concepts offer valuable insights and techniques for model compression. In this discussion, we explore the essence of eigenvalues and eigenvectors and their pivotal role in the compression of DNNs.

Before deep diving to know about EIGEN VALUES and EIGEN VECTORS, it is useful to Visually Understand the property of Matrices and its relationship with Vectors.

In linear algebra, when certain Matrix applies matrix multiplication on any given vector, the vector undergoes transformation. This transformation could induce either Scaling or Rotation of the vector.

Some examples,

(i) Scaling a Vector

The matrix ‘A’ is a scalar matrix, which has the ability to scale any given vector accordingly. In the below equation, the vector is scaled by a factor of two. This shows that matrix Aperforms pure scaling transformation on vector 'v'.

Eq.1 Matrix multiplication between a Scalar matrix and a vector

Fig.1 shows how a vector is transformed to a new position (6,8) from its original position (3,4). This figure helps us to graphically confirm the scaling effect on the vector.

Fig.1 The matrix A has increased the magnitude of vector (3,4) to (6,8)

(ii) Rotating a Vector

The matrix A is an identity matrix with -1 on diagonal, which has the ability to rotate any given vector accordingly. In the below equation, the vector is reflected (could be considered as a special case of rotation). This shows that matrix A performs pure rotation transformation with no scaling of vector 'v'.

Eq.2 Matrix multiplication between a Reflection matrix and a vector

Fig.2 shows how a vector is transformed to a new position (-3,4)from its original position (3,4). This figure helps us to graphically confirm the rotating effect on the vector.

Fig.2 The matrix A has rotated the vector (3,4) to (-3,4)

(iii) Rotating and Scaling a Vector

But, if we perform the same operation with a random square matrix R,
the vector undergoes both rotation and scaling.

Eq.3 Matrix multiplication between a Random matrix and a vector

Th Fig.3 shows how the same vector 'v'is scaled by a factor of 6 in x-axis, a factor of 4 in y-axis along with a rotation of 11.5 degreesin clockwise direction.

Fig.3 The matrix R has scaled and rotated the vector (3,4) to (18,16)

However, for the same Matrix there exists some vectors, which undergoes only scaling without any rotation on it. These vectors are known as the eigen vectors for the given square matrix and the magnitude by which they are scaled are known as eigen values. This is mathematically expressed as,

Eq.4 Equation for identifying eigen vectors of a Matrix
  • ‘A’ represents the transformation matrix.
  • ‘v’ represents the eigen vectors.
  • ‘λ’ represents the eigen values.

Steps for finding Eigen values and Eigen vectors for a Matrix

Now, for the matrix

Matrix A : for which eigen vectors to be found

we have to find the vectors, which undergoes only scaling without any rotation when matrix multiplication is performed. For that, the following steps are done based on the Eq.4.

Step 1:

Here, we want to find non-zero eigen vectors (v), so the term ‘A — I λ’ should become zero. The Only way by which the product of a matrix with a non-zero vector to become zero is, the transformation associated with that matrix should squeeze the space into lower dimension.

(Introducing some additional matrix properties…)

Like scaling and rotation properties of a matrix, some matrix performs transformation to squeeze the space into lower dimensional space.

For example, consider the two matrices

Matrix P multiplied with unit vector forms an area ; but Matrix Q squeezes the area

The vectors of these matrices are plotted in the graphs below and the area covered by the vectors are shaded.

Fig.4 Left: The matrix vectors with area = 6 units ; Right: The matrix vectors with area = 0 units

The Fig.4 shows that the matrix ’P’ increases the area of an unit vector [1,1] (whose area is 1 unit) to 6 units, while the matrix ’Q’ collapses the area of an unit vector into a lower dimension of single line. Therefore, we can say that the Determinant of matrix P = 6 and the Determinant of matrix Q = 0.

This property of a matrix could be theoretically verified by calculating its Determinant.

Determinant calculation for Matrix P and Matrix Q

If the determinant is positive (+) : ‘The linear transformation preserves the orientation’
If the determinant is negative (-) : ‘The linear transformation flips the orientation’
If the determinant is zero (0) : ‘The transformation collapses the space into lower dimension subspace (i.e, into a line or point)’

Now, based on this we can understand that if the matrix ‘A — I λ’ has determinant value of zero, then it could squeeze the non-zero vectors (v) to get a product value of ‘0’. Keeping this in mind as our background knowledge we will move forward with our step 2.

Step 2:

Now, we can get two matrices with squeezing property by substituting these eigen values in ‘A — I λ’.

A1 and A2 are two transformation matrices with squeezing properties

This A1 and A2 matrices have been identified with the help of eigen values of the matrix A. Now, using these two matrices, we can determine our eigen vectors in our next step.

Step 3:

Using the equation (A — I λ) . v = 0 ; the eigen vectors are determined.

EIGEN VALUES and EIGEN VECTORS for the matrix have been FOUND.!!!

Time for Cross Verification….

Its time to cross check if these vectors really exhibit the property that they have promised Us. Do these really undergo only scaling when multiplied with the Matrix A.?

Lets check,

Cross verification on the scaling property of these eigen vectors

It is seen that the eigen vector 1 is scaled by a factor of 5 and eigen vector 2 is scaled by -2 (opposite direction), these scaled magnitudes corresponds to the eigen values.

The matrix multiplication A.v represents a linear transformation of the eigenvector v by the matrix A and the eigen vector is only scaled without undergoing any rotation. The resulting vector is a transformed version of the original eigenvector.

CONCLUSION-1 : If we can find the eigen vectors of a matrix, they possess a unique property wherein they undergo only scaling transformations.

But, What other property of the eigen vectors make them SPECIAL.!!??

“Among the infinite vectors in the vector space, the eigen vectors uniquely capture the most information about the contents of the matrix A through projection.”

Larger eigenvalues imply a greater magnitude of transformation, indicating that the corresponding eigenvectors capture
more variance (more information)
in the data.

This means that the data in the 2-D matrix,

could be compressed to 1-D vectors despite retaining most of the important data.

This can be better explained through the graphs.

Vector Red [5,5] captures more information about of the Blue vectors [[2, 3] , [4, 1]] than any other vector in the entire vector space

Here, the blue vectors represents the matrix data, when these data are projected onto the eigen vectors red (larger eigen value) and yellow (smaller eigen value), we can leverage the advantage of embedding most of the matrix information onto a lower dimensional space.

Python numpy library is used to calculate the projection of matrix A data onto its two eigen vectors (v1, v2).


import numpy as np

# Define the matrix A
A = np.array([[2, 3],
[4, 1]])

# Define the eigenvectors and eigenvalues
v1 = np.array([1, 1])
v2 = np.array([-3, 4])

# Scalar projections of the vectors in A onto v1 and v2
scalar_proj_Av1 = np.dot(A, v1) / np.linalg.norm(v1)
scalar_proj_Av2 = np.dot(A, v2) / np.linalg.norm(v2)

# Compute the projections of the vectors in A onto Av1 and Av2
projection_Av1 = scalar_proj_Av1 * (v1 / np.linalg.norm(v1))
projection_Av2 = scalar_proj_Av2 * (v2 / np.linalg.norm(v2))

print("projection_Av1:", projection_Av1)
print("projection_Av2:", projection_Av2)

---------------------------------------------------------------------------
projection_Av1: [2.5 2.5]
projection_Av2: [-0.72 -1.28]

The above python operation has been manually explained here:

The significance of this property serves as the backbone for concepts such as Principal Component Analysis (PCA), which reduces the feature dimensions in large Machine Learning models.

Apart from that, its advantage has been leveraged by a variety of decomposition techniques, which plays a vital role in compressing the sizes of Deep Convolutional Neural Networks.

Hope I have given some intuitive and visual explanation to have a hold on eigen values and eigen vectors concepts. This serves as a foundation for comprehending subsequent blogs, which will feature other useful concepts and their adaptation in DNN compression techniques.

Github : https://github.com/hilary-anish/DNN_compression

References

  1. https://www.youtube.com/watch?v=PFDu9oVAE-g [3Blue1Brown]

--

--

Anish Hilary

Curious about the mathematical concepts that drive the world of artificial intelligence