For-loops vs. Matrix Multiplication
I recently stumbled upon a popular deep learning course offered by Stanford, CS231n. After watching the first lecture by Fei Fei Li on the historical context of computer vision, I decided to take on the course remotely. There are a few iterations of the course as it has been offered since 2015. I am following the 2016 lecture which is taught by Andrej Karpathy, who is now the senior director of AI at Tesla. Head over to his YouTube channel for the CS231n playlist. As for the assignment, I am doing the latest one, offered in Spring 2020 as of the time of writing.
In the first assignment, we are asked to implement a classifier called the K-nearest neighbor. I am not going to talk about the classifier itself, but rather how the assignment was structured in order for the students to get a feel of how efficient it is to make use of matrix operation instead of the traditional for-loops.
The dataset used is CIFAR-10. There are 5000 data points in the training set and 500 data points in the test set.
The first step in constructing KNN algorithm is to find the distance. In this example, we will use the Euclidean distance, which is the square root of a difference in distance between two data points. Our goal is to find the Euclidean distance for each point in the train set between every single point in the test set. So we will end up with 5000 * 500 = 2,500,000 points. The image below visualize the idea of Euclidean distance for a single pair of images.
Intuitively, I will iterate through each data point in the train set and at each iteration, I will iterate through each data point in the test set in order to get the distance. In other words, create a double for-loop or a nested loop.
The code snippet above demonstrates a double for-loop. It is quite straightforward. Inside the innermost loop, each pixel in the train data is subtracted from the corresponding pixel in the test data before being squared. The square root of the sum of these values would be the distance for this particular test-train pair. The pair is stored in the
dists matrix according to its
j position. Perfect, we’ve now achieved our goal. However, the time complexity for the implementation above is O(n²). This does not look good and will become inefficient for a very large dataset as compute time grows quadratically.
Well, we can speed it up by using only one loop.
test_val is (1, 4072) while
train_val is (5000, 4072). The squared difference between these two variables will result in a (5000, 4072) matrix. The difference is then summed along the row, so it now becomes (5000, 1). After computing square root value for each element, it is then converted to a vector array of size (1, 5000). The vector is stored in the i-th row of
dists. This approach is more sophisticated and arguably faster than the nested for-loop. But, is this the best we can do? The answer is no.
We can actually ditch the for-loop altogether! But how?
Using the Euclidean distance actually allows us to perform matrix multiplication between the test and train data. Refer to the derived formula, where
x is a test data and
y is a train data. In terms of code, not only we got rid of the for-loop, it’s also cleaner and more concise. Firstly, we define
y2 to be equal to the sum of the square along the row for the test data and train data respectively. Then, we compute the dot product between the test data and the transposed train data to get the sum of multiplication between the two sets of data. Lastly, a set(or a matrix rather) of distances is calculated based on the derived formula.
Check out the Medium post below to learn about
What is numpy.newaxis and when to use it.
You might have encountered the np.newaxis expression here and there. Even though it might seem irrelevant at first it…
The code snippet below compares the speed between the implementations.
Note that using a single for-loop is only slightly faster than the nested for-loop, and the winner is clearly the one with no loops!
When dealing with large datasets, avoid doing inefficient computation and find the best way to minimize computation time. For this problem, knowledge in linear algebra, specifically matrix multiplication comes in handy. There are so many free resources out there to learn linear algebra, but I particularly enjoy the linear algebra series by 3blue1brown and I hope you would too!