Foundations of Data Science: Essential Linear Algebra You Should Master

Arushi Aggarwal
3 min readJan 10, 2024

--

Linear Algebra is a key foundational topic in the field of machine learning and data science with algorithms relying heavily on operations and concepts such as vectors, matrices, eigenvalues, etc.

In regards to ML and Data Science, linear algebra is often explained using abstract concepts such as vector spaces or specific matrix operations as it allows us to understand and manipulate dataset geometry. Mastering a few key concepts will greatly deepen your foundation for the technical side of data science and in this article we will go over the few key topics to build that intuition.

After reading this post, you will know:

  • How vectors and vector spaces provide mathematical abstraction to represent multidimensional data
  • Why matrices and matrix operations are essential for encapsulating systems of equations and function mappings
  • How Eigenvalue Decomposition can be applied in Machine Learning

Vectors and Matrices

Vectors encapsulate magnitude and direction and understanding them grants the ability to conceptualize observations and model parameters as points occupying coordinate spaces. They are defined as an array of numbers.

Matrices, on the other hand, extend scalars (quantities with no direction) and vectors into two-dimensional objects that enable notation for systems of equations. Matrix multiplication defines function mapping and composition rules to transform vectors in useful ways and can be displayed as 2D array of numbers with rows and columns. They represent linear maps and help describe systems of linear equations.

Below are examples of Scalars, Vectors, and Matrix in python:

import numpy as np

# Creating a scalar as a one dimensional array with one value
scalar = np.array([1])

#Creating a vector as a row
rowVector = np.array([1, -2, 3])

# Creating a vector as a column
columnVector = np.array([[1],
[-2],
[3]])

# Creating a matrix as a two dimensional array
matrix = np.array([[1, 2, 3, 4],
[5, 4, 5, 4],
[-6, 5, 2, 1]])

print(scalar)
print()
print(rowVector)
print()
print(columnVector)
print()
print(matrix)

Vector and Matrix Operations

A grasp on matrices and vectors allows us to directly manipulate datasets algebraically to uncover insights. Through matrix multiplication we can apply data transformation techniques that can help reveal patterns of association and dependence in our data.

We will start of by multiplying matrices and vectors by computing and dot product with explanation below:

Similiarly, we can use numpy to quickly compute the previous result,

import numpy as np

matrix1 = np.array([[11, -2],
[3, 5]])
matrix2 = np.array([[5, 20],
[37, 8]])

finalMatrix = np.multiply(matrix1, matrix2)

print(finalMatrix)

We can add, subtract, and multiply matrices in a similar fashion with operations of ‘+’ , ‘-’, and ‘*’

Eigenvalues

Eigenvalue decomposition is a technique used in linear algebra to decompose a matrix into its eigenvectors and eigenvalues. In the context of image data, eigenvalue decomposition can be used for various applications such as image compression and feature extraction.

First, an eigenvalue refers to a scalar number that describes the factor by which a linear transformation scales variation along an associated direction. An eigenvector refers to a vector that defines the direction of greatest variability related to an eigenvalue. We can decompose a square matrix into a set of eigenvectors and eigenvalues called eigenvalue decomposition.

In terms of the applications, we can apply eigenvalue decomposition to, for example, image data where each image is a matrix of pixel values. You can stack these matrices into a larger matrix where each row corresponds to a flattened version of the original image. Performing eigenvalue decomposition on this matrix can help identify the principal components that explain the most variance in the image data.

Additionally, the eigenvectors obtained from the decomposition can then be used to reconstruct the original images with reduced dimensionality, which can be useful for image compression. Overall, eigenvectors can be used as features for machine learning models, which can improve the accuracy of image classification and other image-related tasks.

Summary

In this post, you discovered the vital topics of linear algebra that break down the foundations of Data Science and ML. Mastering a few pivotal linear algebraic ideas greatly empowers one’s ability to understand, analyze, and manipulate complex multidimensional data to produce insights. I hope this helps and please comment down below any questions and I will do my best to answer them.

--

--

Arushi Aggarwal

Arushi Aggarwal is currently a junior at Cornell University studying Computer and Information Science