Vectorization in Deep Learning

AS
3 min readJul 2, 2022

--

Normal Instructions vs Vector instructions

In the context of high-level languages like Python, Matlab, and R, the term vectorization describes the use of optimized, pre-compiled code written in a low-level language (e.g. C) to perform mathematical operations over a sequence of data. This is done in place of an explicit iteration written in the native language code (e.g. a “for-loop” written in Python). Vectorization allows the elimination of the for-loops in python code. It is especially important in Deep learning as we are dealing with large numbers of datasets. So, it allows the code to run quickly and helps train the algorithms faster.

In logistic regression, let’s look at this expression;

where x and w are both matrices with large vectors where x, w ϵ ℛ

So to compute the Z, we will need to write a for-loop for a non-vectorized function such as

c= 0tic = time.time()
for i in range(1000000):
c +=w[i]*x[i]

For the vectorized version in NumPy, we simply calculate the dot product of the two vectors.

c = np.dot(w,x)
toc = time.time()

If we use the time function to calculate the time for both operations, we can see that the vectorized version is significantly faster than the non-vectorized version. The time factor especially gets compounded when we are dealing with very large matrixes.

import time
w = np.random.rand(1000000)
x = np.random.rand(1000000)
tic = time.time()
c= np.dot(w,x)
toc = time.time()
print(c)
print('Vectorized version: '+ str(1000*(toc-tic))+'ms')c= 0
tic = time.time()
for i in range(1000000):
c +=w[i]*x[i]
toc = time.time()print(c)
print("For loop: ",str(1000*(toc-tic))+'ms')
250069.16704749785
Vectorized version: 2.0003318786621094ms
250069.16704750186
For loop: 236.63949966430664ms

As we can see from the code above, even though we have the same answers for the multiplication (250069.16), our dot product is almost 118x faster. Also, since we can run this function on SIMB, the algorithm takes advantage of the parallelism which uses the native dot product to get the results faster.

SIMD is short for Single Instruction/Multiple Data, while the term SIMD operations refer to a computing method that enables the processing of multiple data with a single instruction. In contrast, the conventional sequential approach using one instruction to process each individual data is called scalar operations. Using a simple summation as an example, the difference between the scalar and SIMD operations is illustrated below. See below for how each method handles the same four sets of additions.

In other words, this allows us to take advantage of the GPUs which are better at SIMD in the training algorithms faster than the CPUs.

NumPy provides highly-optimized functions for performing mathematical operations on arrays of numbers. Performing extensive iterations (e.g. via ‘for-loops’) in Python to perform repeated mathematical computations should nearly always be replaced by the use of vectorized functions on arrays. This informs the entire design paradigm of NumPy.

--

--