Numpy Vectorization

5 min readMar 30, 2018

Last month, I was implementing a K-Nearest Neighbor, and when I tried to test my “for-loop” implementation on a large test set, it was infeasibly slow! Thus, I resolved to learn vectorization — first by doing 100 Numpy Questions and finding where I can apply it.

Q: Where can I apply Numpy Vectorization?
Dumb Answer: If it involves Math and the function is called repeatedly, chances are you should use Numpy to vectorize it.

This post will talk about three situations where vectorization might be helpful.

Simple Arithmetic Problems between Two Arrays

Example 1: Return the sum of x[i] * y[i] for all pairs of indices i, j

On the right we have the standard for loop approach. On the left, we have the vectorized approach. This involves Broadcasting, which basically “stretches” or repeats an array so arithmetic is compatible. Here’s a quick example:
X = [1, 2, 3, 4] — -> (4,)
Y = [5, 6, 7] — -> (3,)
In Broadcasting two arrays, the dimensions are verified from right to left. Corresponding dimensions must either be 1, or equal.
Thus, we need to add a new axis to our examples
x[np.newaxis, :] — -> (1 x 4)
y[:, np.newaxis] — -> (3 x 1)

This naive implementation of Vectorization by broadcasting achieves a 100x speedup. However, we can do even better!

In from Python to Numpy, the author defines Vectorization as either Code Vectorization or Problem Vectorization. What we did here was Code Vectorization, which for simple problems, can be decently intuitive and follow some structure. In Code Vectorization, the goal is to turn for-loop solutions into numpy. This usually involves
1) Reshaping the arrays
2) Broadcast the arrays into a larger matrix to do calculations

Here are two more examples of Code Vectorization, there is some structure:

Problem Vectorization on the other hand is much harder. It involves fundamentally rethinking the problem! Let’s go back to the SumProducts.

The two nested for-loops for SumProducts basically calculate this equation to the left!

However, if you rethink the problem, you can realize that it’s equivalent to the following equation. You can implement this without two nested for loops by simply summing X and Y, then multiplying them!

Okay? So what do the results look like?

As you can see, Code Vectorization gave a magnitude of improvement. However, the Problem Vectorization gave another magnitude of improvement!

ML Models

In the CS231 Course, they made students implement both the vectorized and non-vectorized versions of functions such as SoftMax Loss, SVMLoss, or KNN distance algorithm.

What I learned: All ML models have costly mathematical operations. Those should definitely be vectorized.

Let’s talk about K-Nearest Neighbors. The bottleneck for K-NN is at test time where for each test example, the algorithm must find the K closest training examples. My original for slow for-loop code was something like this:
1) For each test example y, get distances to training set x
2) Sort the distances
3) Have another for loop to return the k shortest neighbors

def get_neighbors(self, y):
        distances = []
        for i in range(len(self.train)): 
            x_i = self.train[i]
            dist = self.get_distance(x_i, y)
            distances.append((x_i, dist))
        distances.sort(key=operator.itemgetter(1))
            
        neighbors = []
        for x in range(self.k):
            neighbors.append(distances[x])                
        return neighbors

Yikes. I ran this on a 80,000 example test set and it wouldn’t move.

I refactored the code to use Numpy vectorization(check out CS231):
1) For all train set X, and test set Y, create a large distance matrix
2) Sort the distance matrix and pull out top k values for neighbors

def get_neighbors(self, y):
        dists = np.zeros((Y.shape[0], self.Xtrain.shape[0]))
        #Intuitively: Distance = sqrt( (X - Y)^2) = sqrt((X^2 + Y^2 - 2XY))
        #square the train's and test's values  X^2, Y^2     
        train_2 = np.sum(self.X_train ** 2, axis = 1)
        test_2 = np.sum(self.Y_train ** 2, axis = 1)                #make the shapes compatatible     
        train_2_repeat = np.array([train_2] * Y.shape[0])
        test_2_repeat = np.array([test_2] * self.Xtrain.shape[0]).transpose()        #XY
        Y_dot_Xtrain = Y.dot(self.Xtrain.T)
        
        #(x^2 + y^2 - 2xy)
        dists = train_2_repeat + Y_2_repeat - 2 * Y_dot_Xtrain
        
        #Get Neighbors from example
        return self.Y_train[np.argsort(dists[y.index].argsort()[:self.k]]

This turns out to be much faster than my original implementation. I tested a 5000 instance validation set with FastKNN(vectorized), Original, and SKlearn implementation. It turns out SKLearn > FastKNN > Original Implementation, each by an order of magnitude. Code here: https://gist.github.com/mikeliao97/1a0b2bccb1b05ddff6a40f8ae3bd93bb

Advanced Example from Python to Numpy:

In From Python to Numpy, the later chapters show many great examples of where you can vectorize your solutions.

What I learned: There are complex situations where you don’t think it’s possible to be vectorized…but it’s possible.

The following are a few examples, I won’t get into the implementation(just read the book). I’ll just explain why it’s nonintuitive.

Uniform Vectorizing: The Game of Life

Basically, the game of life operates under many rules:
1) Any live cells with fewer than two live neighbors die, by underpopulation
2) Any live cells with more than 3 live neighbors dies, by overcrowding
3) Any live cells with two ore more three live, unchanged, to the next generation
4) Any dead cell with exactly three live neighbors become a live cell

Why it was unintuitive to me: So how do you vectorize a program with so many IF statements?

The trick is to vectorize it in two parts:
1) Create a vector responsible for counting the neighbors at each position
2) Create a second vector “mask” that’s responsible for enforcing the if conditions

Quick Aside: In most cases Numpy Vectorization makes you code faster, however it often comes at the cost of interpretability. Also, it can use up a large amount of memory.Thus, a general approach according to this Stack Overflow Post, is to keep the original version of the function and remove loops one-by-one only if it’s a bottleneck.

Numpy Vectorization

Simple Arithmetic Problems between Two Arrays

ML Models

Advanced Example from Python to Numpy:

Uniform Vectorizing: The Game of Life

Written by Mike Liao