Advanced Numpy: Linear Algebra and More…

Nishitha Kalathil
11 min readSep 11, 2023

--

Are you tired of doing math by hand? Do you get a headache just thinking about matrices and statistics? Fear not, dear reader, for NumPy is here to save the day! With NumPy, you can perform complex linear algebra operations and statistical analysis with just a few lines of code. And the best part? You don’t even need to know how to spell “eigenvector” correctly (although it might help). So grab your favorite beverage (math juice, anyone?), kick up your feet, and let’s dive into the wonderful world of advanced NumPy.

Matrix and vector operations

Eigenvalues and eigenvectors

Singular value decomposition

Linear regression

Basic Statistical Functions

Covariance and Correlation

Hypothesis testing

Random number generation

Fourier analysis

Matrix and Vector Operations

If you’re like me, the mere mention of matrices and vectors is enough to make you want to run for the hills. But fear not, dear reader, because NumPy is here to make these operations as painless as possible (although we can’t guarantee it’ll be completely pain-free).

First, let’s talk about matrices. Think of a matrix as a grid of numbers that can represent a variety of things, from a set of linear equations to an image. In NumPy, matrices are represented as arrays, which makes performing operations on them a breeze.

Let’s start with some basic matrix operations. Suppose we have two matrices, A and B, and we want to add them together. In NumPy, we can do this with the simple command:

C = A + B

See, that wasn’t so bad! NumPy takes care of all the messy details behind the scenes.

# Addition and Subtraction
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
print(matrix1 + matrix2) # prints [[6, 8], [10, 12]]
print(matrix1 - matrix2) # prints [[-4, -4], [-4, -4]]

# Scalar Multiplication
matrix = np.array([[1, 2], [3, 4]])
print(2 * matrix) # prints [[2, 4], [6, 8]]

# Matrix Multiplication
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
print(np.dot(matrix1, matrix2)) # prints [[19, 22], [43, 50]]

# Transpose
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(np.transpose(matrix)) # prints [[1, 4], [2, 5], [3, 6]]

# inverse of a matrix
a = np.array([[1, 2], [3, 4]])
a_inv = np.linalg.inv(a)

Next, let’s talk about vectors. A vector is simply a matrix with one column. They’re often used to represent things like forces or velocities in physics. In NumPy, we can represent vectors as arrays with one dimension.

vector = np.array([1, 2, 3])
print(vector) # prints [1, 2, 3]

Let’s say we have two vectors, u and v, and we want to compute their dot product (i.e., the sum of the products of their corresponding entries). We can do this with the command:

vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])
print(np.dot(vector1, vector2)) # prints 32

Like that there are some other operations too.

# Cross Product
vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])
print(np.cross(vector1, vector2)) # prints [-3, 6, -3]

# Norm: The norm of a vector is a scalar that represents the "length" of the
# vector. In NumPy, you can compute the norm using the numpy.linalg.norm function,
vector = np.array([1, 2, 3])
print(np.linalg.norm(vector)) # prints 3.74165738677

# The inner product of two vectors is a matrix that is computed by multiplying
# the first vector by the transpose of the second vector. In NumPy, you can
# compute the inner product using the numpy.inner function
vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])
print(np.inner(vector1, vector2)) # prints 32

Again, NumPy does all the heavy lifting for us.

But wait, there’s more! NumPy also provides functions for computing more advanced operations on matrices and vectors, like transposing a matrix (flipping it across the diagonal), inverting a matrix (finding its “opposite”), and computing eigenvalues and eigenvectors (finding the “essence” of a matrix).

So there you have it, folks: matrix and vector operations made easy (or at least easier) with NumPy. Now go forth and conquer those linear equations like the math wizard you were always meant to be!

Eigenvalues and eigenvectors

The matrix might not look like much, but it has some cool properties. One of these properties is that it has eigenvectors and eigenvalues. Don’t know what those are? Don’t worry, we’re about to find out.

Eigenvalues: An eigenvalue of a square matrix is a scalar that represents how the matrix behaves when it’s multiplied by a vector. Formally, for a square matrix A, a scalar λ is an eigenvalue of A if there exists a non-zero vector v(the eigenvector) such that:

Av=λv

In other words, when the matrix A is multiplied by the eigenvector v, the result is a scaled version of v by the eigenvalue λ.

Eigenvectors: An eigenvector of a matrix A, is a non-zero vector v such that when A is multiplied by v, it only stretches or shrinks v by a scalar factor λ.

Av=λv

Eigenvectors are important because they represent directions in the vector space that are only scaled by the transformation defined by the matrix.

In NumPy, you can easily compute eigenvalues and eigenvectors using the numpy.linalg.eig function.

A = np.array([[4, 2],[1, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)

Singular Value Decomposition

The Singular Value Decomposition (SVD) of a matrix is a factorization of that matrix into three matrices. It has some interesting algebraic properties and conveys important geometrical and theoretical insights about linear transformations.

For a matrix A, the SVD is represented as:

SVD

Where:

  • U is an orthogonal matrix containing the left singular vectors.
  • Σ is a diagonal matrix containing the singular values.
  • VT is the transpose of an orthogonal matrix containing the right singular vectors.

Let’s see how to perform SVD with NumPy:

A = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
U, S, Vt = np.linalg.svd(A)

U will be a matrix containing the left singular vectors, S will be a 1D array containing the singular values, and Vt will be the transpose of a matrix containing the right singular vectors.

You can also reconstruct the original matrix using the computed matrices:

Sigma = np.diag(S)
reconstructed_A = np.dot(U, np.dot(Sigma, Vt))

Linear regression

Alright, let’s dive into the whimsical world of linear regression with NumPy! Imagine we’re on a quest to predict things, armed with nothing but our trusty Python sword and the magical NumPy wand.

Chapter 1: The Quest for Linearity

Once upon a time in the land of Data, there lived a noble Knight named Sir NumPy. Sir NumPy loved to predict the future, but he needed a tool. And so, he set off on a quest to learn the ancient art of Linear Regression!

Chapter 2: Gathering Data

Sir NumPy knew that every good prediction starts with data. He collected a bunch of data pairs: (X, Y), where X was the feature (like the number of dragons in a kingdom) and Y was the target (like the number of knights needed to defeat them).

import numpy as np

X = np.array([1, 2, 3, 4, 5])
Y = np.array([3, 6, 7, 10, 12])

Chapter 3: The Magical Formula

In the scrolls of old, Sir NumPy found the magical formula for a line: Y=mX+b. The challenge was to find the best m (slope) and b (intercept) to make the predictions as accurate as possible.

Chapter 4: The Quest for Best Fit

Sir NumPy enlisted the help of a wise Oracle named polyfit from the NumPy kingdom. polyfit would magically find the best mm and bb for him.

m, b = np.polyfit(X, Y, 1)  # 1 denotes a linear fit (straight line)

Chapter 5: The Predictive Power

With the newfound mm and bb, Sir NumPy could now predict the number of knights needed to face any number of dragons!

dragons_to_face = 7
knights_needed = m * dragons_to_face + b
print(f"We need approximately {knights_needed} knights to face {dragons_to_face} dragons!")

Chapter 6: The Grand Finale

Sir NumPy was victorious! He could now predict with the power of a thousand dragons! Linear Regression had bestowed upon him the gift of foresight.

And so, Sir NumPy rode back to his castle, knowing that he could face any prediction challenge that came his way.

Basic Statistical Functions

In this fun tutorial, we’ll step into the world of superheroes and use basic statistical measures to analyze their powers. We’ll calculate the mean, median, mode, variance, and standard deviation of their strength levels.

Let’s create an imaginary dataset of superhero strength levels:

import numpy as np

strength_levels = np.array([80, 90, 85, 70, 95, 100, 75, 85, 80, 90])

The mean strength gives us an idea of the average power level:

mean_strength = np.mean(strength_levels)
print(f"The average strength is {mean_strength} units.")

The median strength tells us the middle point:

median_strength = np.median(strength_levels)
print(f"The median strength is {median_strength} units.")

Who’s the most common power level among our heroes?

from scipy import stats

mode_strength = stats.mode(strength_levels)
print(f"The most common strength is {mode_strength.mode[0]} units.")

How spread out are the power levels?

variance_strength = np.var(strength_levels)
print(f"The variance in strength is {variance_strength} units.")

Standard deviation helps us understand how powers deviate from the mean:

std_dev_strength = np.std(strength_levels)
print(f"The standard deviation in strength is {std_dev_strength} units.")

Through our statistical analysis, we’ve gained insights into the powers of our superheroes. This kind of analysis can be applied to various fields, from sports to business. Keep exploring and have fun unleashing the power of statistics!

Covariance and Correlation

Meet our friends Bob and Alice, who have a weird obsession with collecting rubber ducks and ice cream cones. They’ve kept meticulous records of their collections over the years. Let’s see what they’ve got!

import numpy as np

# Bob's Rubber Ducks and Alice's Ice Cream Cones
ducks = np.array([3, 5, 7, 11, 9, 6, 8])
cones = np.array([2, 4, 6, 9, 8, 7, 5])

Covariance is like a nosy neighbor, always trying to figure out if Bob’s duck collection has any influence on Alice’s ice cream cone stash.

covariance = np.cov(ducks, cones)[0, 1]
print(f"The covariance between rubber ducks and ice cream cones is {covariance}.")

Bob’s convinced that his rubber ducks are the secret to Alice’s love for ice cream!

Correlation is like a matchmaker, determining if there’s a romantic connection between the number of ducks and cones.

correlation = np.corrcoef(ducks, cones)[0, 1]
print(f"The duck-love index between rubber ducks and ice cream cones is {correlation}.")

Let’s throw more variables into the mix! Bob and Alice also track their collections of yo-yos and toy cars.

yo_yos = np.array([4, 6, 8, 12, 10, 7, 9])
toy_cars = np.array([3, 5, 7, 10, 9, 8, 6])

# Creating a Covariance Matrix
collections = np.vstack((ducks, cones, yo_yos, toy_cars))
cov_matrix = np.cov(collections)

print("The Covariance Matrix of Bob and Alice's collections:")
print(cov_matrix)

And there you have it! Bob, Alice, rubber ducks, ice cream cones, yo-yos, and toy cars have shown us the hilarious side of covariance and correlation. Remember, these measures help us understand how variables dance together in the world of statistics. Keep exploring and laughing your way through data analysis!

Hypothesis testing

Hypothesis testing is a fundamental concept in statistics used to make inferences about a population based on a sample of data.

In this quirky tutorial, we’ll use hypothesis testing to determine if a penguin named Pablo is truly an exceptional athlete.

You’ve been diligently recording Pablo’s diving scores over the past month. Let’s get that data!

import numpy as np

# Pablo's diving scores
pablo_scores = np.array([8.5, 9.2, 8.8, 9.5, 9.0, 9.3, 9.7, 8.9, 9.4, 9.1])
  • Null Hypothesis (H0): Pablo’s diving scores are not significantly different from the average penguin’s diving scores.
  • Alternative Hypothesis (H1): Pablo is an extraordinary diver, with significantly higher scores than the average penguin.
from scipy import stats

# Perform a one-sample t-test
t_stat, p_value = stats.ttest_1samp(pablo_scores, popmean=8.7)

alpha = 0.05

if p_value < alpha:
print("🐧 Pablo is gearing up for the Penguin Olympics! 🏅")
else:
print("🐧 Pablo's scores are impressive, but not statistically extraordinary. 🎩")
if p_value < alpha:
print("🥇🥇🥇 The penguin pod is in awe of Pablo's diving prowess! 🥇🥇🥇")
else:
print("🐧 Pablo's got style, but the pod thinks he's just like the rest! 🐧")

And there you have it! A lighthearted tutorial on hypothesis testing with a penguin twist. Remember, in real-world scenarios, hypothesis testing is used for more serious scientific investigations. But for now, let’s celebrate Pablo’s diving achievements! 🎉🐧🥇

Random number generation

Random number generation is a crucial part of many simulations and statistical analyses. numpy provides a robust set of tools for generating random numbers. Here's a tutorial on some of the basic functions:

You can use the np.random.randint() function to generate random integers within a specified range.

import numpy as np
# Generate a random integer between 1 and 100
random_int = np.random.randint(1, 101)
print(random_int)

For random floating-point numbers, use np.random.rand() to generate a random number between 0 and 1, or np.random.uniform() to generate random numbers within a specified range.

# Generate a random float between 0 and 1
random_float = np.random.rand()
print(random_float)

# Generate a random float between 0 and 10
random_float_10 = np.random.uniform(0, 10)
print(random_float_10)

For random numbers following a normal (Gaussian) distribution, use np.random.normal().

# Generate a random number from a normal distribution with mean 0 and standard deviation 1
random_normal = np.random.normal(0, 1)
print(random_normal)

numpy also supports a wide range of probability distributions like exponential, binomial, Poisson, etc. Here's an example with the exponential distribution:

# Generate a random number from an exponential distribution with lambda (rate) 0.5
random_exponential = np.random.exponential(0.5)
print(random_exponential)

Setting a Seed (for Reproducibility)

If you want to reproduce your results, you can set a seed using np.random.seed().

np.random.seed(42)  # Any integer can be used as the seed value
random_number = np.random.rand()
print(random_number) # This will always give the same number

You can also generate entire arrays of random numbers using similar functions.

# Generate a 1D array of 5 random integers between 1 and 100
random_int_array = np.random.randint(1, 101, 5)
print(random_int_array)

# Generate a 2D array of shape (2, 3) with random floats between 0 and 1
random_float_array = np.random.rand(2, 3)
print(random_float_array)

Fourier analysis

Fourier analysis is a powerful mathematical tool used in various fields, including signal processing, image analysis, and many others. Numpy provides functions to perform Fourier analysis easily.

Alright, let’s explore Fourier analysis in a more playful context: analyzing the frequencies in music!

Import the Required Libraries and Load a Music Sample

let’s use a sample of Beethoven’s Symphony №5.

import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile

# Load the music sample
sample_rate, music_data = wavfile.read("beethoven_5.wav")

Analyze a Slice of the Music

We’ll focus on a small portion of the music for analysis.

# Take a short slice (about 5 seconds)
start_time = 30 # seconds
end_time = start_time + 5 # seconds

start_index = int(start_time * sample_rate)
end_index = int(end_time * sample_rate)

music_slice = music_data[start_index:end_index]

Perform Fourier Transform

# Perform the Fourier transform
fourier_transform = np.fft.fft(music_slice)
frequencies = np.fft.fftfreq(len(music_slice), 1/sample_rate)

Plot the Spectrum

# Plot the magnitude spectrum
plt.figure(figsize=(10, 5))
plt.plot(frequencies[:len(frequencies)//2], np.abs(fourier_transform[:len(frequencies)//2]))
plt.title('Frequency Spectrum of Beethoven\'s Symphony No. 5')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Amplitude')
plt.xlim((0, 1000)) # Limit to frequencies below 1000 Hz for better visualization
plt.show()

In the resulting plot, you’ll see peaks at different frequencies corresponding to the musical notes in the sample. eel free to try this with other music samples by replacing the file path in wavfile.read().

Now you’ve used Fourier analysis to explore the frequency content of a piece of music! This technique is widely used in music analysis, audio processing, and many other fields. It’s fascinating to see how the mathematical transformation can reveal the underlying musical elements.

In this tutorials, we delved into the advanced realm of NumPy, uncovering its powerful capabilities in linear algebra and statistical analysis. Armed with these advanced techniques, we’re now equipped to tackle even more sophisticated data science tasks and unlock new dimensions of insight from our datasets. Remember, practice and experimentation will further solidify our grasp on these concepts, enabling us to wield NumPy as a formidable tool in our data-driven endeavors. Happy coding!

--

--