Deep Learning (Part 14)-Vectorization

Coursesteach
9 min readNov 30, 2023

Sections

What is Vectorization
Python Implementation
CPU and GPU
Important points
Benefits of Vectorization
Common Vectorization Techniques
Examples of Vectorization

Section 1- What is Vectorization ( Dr Andrew)

Vectorization is basically the art of getting rid of explicit for loops in your code. In the deep learning era, especially in deep learning in practice, you often find yourself training on relatively large data sets, because that’s when deep learning algorithms tend to shine. And so, it’s important that your code very quickly because otherwise, if it’s training a big data set, your code might take a long time to run then you just find yourself waiting a very long time to get the result. So in the deep learning era, I think the ability to perform vectorization has become a key skill.

Vectorization is a fundamental technique in deep learning that can significantly improve the performance and efficiency of your code. It involves replacing explicit loops with vectorized operations using libraries like NumPy and TensorFlow.

Figure 1:

Let’s start with an example. So, what is Vectorization? In logistic regression you need to compute Z equals W transpose X plus B (Figure 1), where W was this column vector and X is also this vector. Maybe they are very large vectors if you have a lot of features. So, W and X were both these R and no R, NX dimensional vectors. So, to compute W transpose X, if you had a non-vectorized implementation, you would do something like Z equals zero. And then for I in the range of X. So, for I equals 1, 2 NX, Z plus equals W I times XI. And then maybe you do Z plus equal B at the end. So, that’s a non-vectorized implementation. Then you find that that’s going to be really slow. In contrast, a vectorized implementation would just compute W transpose X directly.

Section 2- Python Implementation

In Python or a numpy, the command you use for that is Z equals np. W, X, so this computes W transpose X. And you can also just add B to that directly. And you find that this is much faster. Let’s actually illustrate this with a little demo. So, here’s my Jupiter notebook in which I’m going to write some Python code. So, first, let me import the numpy library to import. Send P. And so, for example, I can create A as an array as follows. Let’s say print A. Now, having written this chunk of code, if I hit shift enter, then it executes the code. So, it created the array A and it prints it out. Now, let’s do the Vectorization demo. I’m going to import the time libraries, since we use that, in order to time how long different operations take. Can they create an array A? Those random thought round. This creates a million dimensional array with random values. b = np.random.rand. Another million dimensional array. And, now, tic=time.time, so this measure the current time, c = np.dot (a, b). toc = time.time. And this print, it is the vectorized version. It’s a vectorize version. And so, let’s print out. Let’s see the last time, so there’s toc — tic x 1000, so that we can express this in milliseconds. So, ms is milliseconds. I’m going to hit Shift Enter. So, that code took about three milliseconds or this time 1.5, maybe about 1.5 or 3.5 milliseconds at a time. It varies a little bit as I run it, but looks like maybe on average it’s taking like 1.5 milliseconds, maybe two milliseconds as I run this. All right. Let’s keep adding to this block of code. That’s not implementing non-vectorize version.

import time
import numpy as np

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
dot+= x1[i]*x2[i]
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
for i in range(len(x1)):
for j in range(len(x2)):
outer[i,j] = x1[i]*x2[j]
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
mul[i] = x1[i]*x2[i]
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
for j in range(len(x1)):
gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

Let’s see, c = 0, then tic = time.time. Now, let’s implement a for loop. For I in range of 1 million, I’ll pick out the number of zeros right. C += (a,i) x (b, i), and then toc = time.time. Finally, print more than explicit full loop. The time it takes is this 1000 x toc — tic + “ms” to know that we’re doing this in milliseconds. Let’s do one more thing. Let’s just print out the value of C we compute it to make sure that it’s the same value in both cases. I’m going to hit shift enter to run this and check that out. In both cases, the vectorize version and the non-vectorize version computed the same values, as you know, 2.50 to 6.99, so on. The vectorize version took 1.5 milliseconds. The explicit for loop and non-vectorize version took about 400, almost 500 milliseconds. The non-vectorize version took something like 300 times longer than the vectorize version. With this example you see that if only you remember to vectorize your code, your code actually runs over 300 times faster. Let’s just run it again. Just run it again. Yeah. Vectorize version 1.5 milliseconds seconds and the for loop. So 481 milliseconds, again, about 300 times slower to do the explicit for loop. If the engine x slows down, it’s the difference between your code taking maybe one minute to run versus taking say five hours to run. And when you are implementing deep learning algorithms, you can really get a result back faster. It will be much faster if you vectorize your code.

Section 3- CPU and Graphical Processing Units (GPUs),

GPUs provide impressive computational power to perform complex tasks that involve technology such as AI, machine learning, and 3D rendering.GPUs (Graphical Processing Units) are computer chips that are designed to rapidly render graphics and images by completing mathematical calculations almost instantaneously. GPUs were initially developed to reduce the workload being placed on the CPU by modern, graphic-intensive applications, rendering 2D and 3D graphics using parallel processing, a method that involves multiple processors handling different parts of a single task. [2].

Some of you might have heard that a lot of scaleable deep learning implementations are done on a GPU or a graphics processing unit. But all the demos I did just now in the Jupiter notebook where actually on the CPU. And it turns out that both GPU and CPU have parallelization instructions. They’re sometimes called SIMD instructions. This stands for a single instruction multiple data. But what this basically means is that, if you use built-in functions such as this np.function or other functions that don’t require you explicitly implementing a for loop. It enables Phyton Pi to take much better advantage of parallelism to do your computations much faster. And this is true both computations on CPUs and computations on GPUs. It’s just that GPUs are remarkably good at these SIMD calculations but CPU is actually also not too bad at that. Maybe just not as good as GPUs. You’re seeing how vectorization can significantly speed up your code. The rule of thumb to remember is whenever possible, avoid using explicit for loops. Let’s go onto the next Tutorial to see some more examples of vectorization and also start to vectorize logistic regression.

Training machine models locally is limited to the computation power of your computer.
Here are few resources, Web sites for training ML models that will give you more computational resources, including GPUs for FREE:
1.Kaggle Notebooks offers up to 30 hours of free GPU time per week
https://www.kaggle.com/code
2.Google Colab offers free GPU and TPU resources.
https://colab.research.google.com/
https://drlee.io/utilizing-gpu-and-tpu-for-free-on-google...
3.Amazon SageMaker Studio Lab offers free CPU and GPU. No credit card or AWS account required.
Offers 4 hours of GPU time per 24 hours.
studiolab.sagemaker.aws
4.Gradient/Paperspace offers GPU and IPU instances with a free tier to get started
https://www.paperspace.com/artificial-intelligence
5.Microsoft Azure for Student Account
References :https://www.analyticsvidhya.com/.../get-free-gpu-online.../

Section 4- Important points

  • Vectorization is basically the art of getting rid of explicit for loops in your code
  • Vectorization implementation is faster as compare to non-vectorization implementation
  • if you use built-in functions such as this np.function or other functions that don’t require you to explicitly implementing a for loop. It enables Phyton Pi to take much better advantage of parallelism to do your computations much faster.

Section 5-Benefits of Vectorization:

  • Improved performance: Vectorization leverages the power of highly optimized C and Fortran libraries, leading to significantly faster execution compared to loop-based implementations.
  • Reduced code complexity: Eliminating loops simplifies the code and makes it easier to read and maintain.
  • Better memory management: Vectorized operations often require less memory than loop-based approaches.

Section 6- Common Vectorization Techniques:

  • Using NumPy arrays: NumPy provides efficient functions for performing vectorized operations on arrays, such as element-wise addition, multiplication, dot products, and matrix multiplication.
  • Broadcasting: NumPy allows broadcasting to automatically expand arrays to compatible sizes for vectorized operations.
  • TensorFlow operations: TensorFlow provides high-level operations for various deep learning tasks, like convolutions and matrix multiplications, which are already optimized for vectorized execution.

Section 7- Examples of Vectorization:

  1. Element-wise addition:
import numpy as np

# Loop-based implementation
sum_list = []
for i in range(len(x)):
sum_list.append(x[i] + y[i])

# Vectorized implementation
sum_array = x + y
  1. Dot product:
# Loop-based implementation
dot_product = 0
for i in range(len(x)):
dot_product += x[i] * y[i]

# Vectorized implementation
dot_product = np.dot(x, y)
  1. Matrix multiplication:
# Loop-based implementation
result = np.zeros((m, n))
for i in range(m):
for j in range(n):
for k in range(p):
result[i, j] += x[i, k] * y[k, j]

# Vectorized implementation
result = np.matmul(x, y)

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles because we reach end to end ,where we will explore specific topics related to Deep Learning in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and Sharing with others!💻✌️

Note: if you are a Deep Learning export and have some good suggestions to improve this blog please share through comments and contribute.

For More update about Deep Learning Please follow and enroll:

Course: Neural Networks and Deep Learning

📚GitHub Repository

📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email: mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

References

1- Neural Networks and Deep Learning

2-Building a GPU Machine vs. Using the GPU Cloud

--

--