Vectorization in Python- An Alternative to Python Loops

shivam bhatele
The Pythoneers

--

Applications often need to handle large amounts of data. But using non-optimized functions can slow down the entire algorithm and result in a model that takes a long time to run.

To help you ensure that your code is computationally efficient, I’ll teach you how to employ vectorization as a technique.

The speed of an algorithm is critical in determining its reliability, particularly in real-time applications. To ensure that the algorithm runs as quickly as possible, it’s important to use standard mathematical functions for fast operations on large arrays of data without the need for explicit loops.

A library that contains these functions is NumPy. Vectorization is a technique that utilizes these standard functions to improve the performance of an algorithm.

What is NumPy?

NumPy is an essential package for high-performance scientific computing and data analysis in the Python ecosystem. It is the foundation of many higher-level tools such as Pandas and scikit-learn.

TensorFlow also uses NumPy arrays as the foundation for building Tensor objects and graph flow for deep learning tasks. These heavily rely on linear algebra operations on large lists, vectors, and matrices of numbers.

NumPy is faster because it uses vectorized implementation, and many of its core functions are written in C.

NumPy arrays are homogenous arrays that are tightly packed, while Python lists are arrays of pointers to objects, even when they are all of the same type.

This means that NumPy arrays benefit from the locality of reference. Many NumPy operations are implemented in C, which eliminates the overhead of loops, pointer indirection, and per-element dynamic type checking in Python. The performance improvement will vary depending on the specific operations being performed.

Why are loops slow?

When searching for performance bottlenecks in code, loops are often a common suspect, especially in Python. Loops in Python are slower compared to languages such as C/C++, one of the reasons is Python’s dynamically typed nature.

Python follows a process in which it goes line-by-line through the code, compiles it into bytecode, and then executes it to run the program. When the code includes a section with a loop over a list, Python faces a problem because it is dynamically typed, which means it does not know the type of objects in the list (whether they are integers, strings, or floats) until it goes through the list.

This information is stored within each object, and Python cannot determine it in advance. As a result, during each iteration, Python has to perform a series of checks, such as determining the type of variable, resolving its scope, and checking for invalid operations. This added overhead can slow down the performance of the loop.

In contrast, in C, arrays can only consist of one data type, which the compiler knows in advance. This enables many optimizations that are not possible in Python. This is why loops in Python are often slower than in C, and nested loops can significantly slow down the performance.

Why Use Vectorization?

Speed is important because small differences in runtime can accumulate and become significant when repeated over many function calls. For example, an incremental 30 microseconds of overhead, when repeated over 1 million function calls, can result in 30 seconds of additional runtime. That’s why micro performance is worth monitoring.

In terms of computation, there are three key concepts that give NumPy its power: Vectorization, Broadcasting, and Indexing.

Using loops is a common and natural tendency when performing repetitive operations in programming. However, when working with a large number of iterations, such as millions or billions of rows, using loops can lead to poor performance and long run times.

In these cases, implementing vectorization in Python can greatly improve efficiency and avoid the frustration of waiting for slow processes to complete.

What is Vectorization?

NumPy arrays can only have a single data type and the data is stored in a continuous block of memory. By utilizing this, NumPy performs operations on these arrays by delegating them to optimized pre-compiled C code, which improves performance.

Many of the functions in NumPy are just interfaces to underlying C code, where the majority of the computation takes place. This allows NumPy to move the execution of loops to C, which is more efficient than Python when it comes to looping.

However, this can only be done if the array enforces that all elements are of the same type, otherwise, it would not be possible to convert the Python data types to native C types for execution.

Vectorization is a technique used to improve the performance of Python code by eliminating the use of loops. This feature can significantly reduce the execution time of code.

There are various operations that can be performed on vectors, such as the dot product of vectors (also known as scalar product), which results in a single output, outer products that produce a square matrix of dimension equal to the length of the vectors, and element-wise multiplication, which multiplies the elements of the same index and preserves the dimension of the matrix.

Vectorization is the use of array operations from (NumPy) to perform computations on a dataset.

When iterating over an array or any data structure in Python, there is a significant amount of overhead involved. By using vectorized operations in NumPy, the looping is delegated to highly optimized C and Fortran functions, resulting in faster and more efficient Python code.

Let’s see some examples showing how the classical methods are much more time-consuming as compared to using the vectorization technique:

  • outer(a, b): calculates the outer product of two vectors.
  • multiply(a, b): calculates the matrix product of two arrays.
  • dot(a, b): calculates the dot product of two arrays.
  • zeros((n, m)): creates a matrix of specified shape and type, filled with zeros.
  • process_time(): returns the total system and user CPU time of the current process in fractional seconds, not including time spent in sleep.

Examples

Example 1: Dot Product

The dot product is an algebraic operation in which two vectors of equal length are multiplied together, resulting in a single scalar value. It is also known as the inner product.

When two matrices, a, and b, of the same length are used, the dot product is calculated by taking the transpose of the first matrix, a’, and then performing matrix multiplication with the second matrix, b. This is illustrated in the diagram below.

Dot Product

Python Code Implementation

import time
import numpy
import array
# a and b is an array of array with int of 8 bytes size
a = array.array(‘q’)
for i in range(50000):
a.append(i);
b = array.array(‘q’)
for i in range(50000, 100000):
b.append(i)
# classic dot product of vectors implementation
start_time = time.process_time()
classic_dot_product = 0.0;
for i in range(len(a)):
classic_dot_product += a[i] * b[i]
end_time = time.process_time()
print(“classic_dot_product = “+ str(classic_dot_product));
print(“Computation time using loops = “ + str(1000*(end_time-start_time)) + “ms”)
vectorised_start_time = time.process_time()
vectorised_dot_product = numpy.dot(a, b)
vectorised_end_time = time.process_time()
print(“\nvectorised_dot_product = “+str(vectorised_dot_product))
print(“Computation time using vectorization = “+str(1000*(vectorised_end_time-vectorised_start_time))+”ms”)

Output:

classic_dot_product = 104164166675000.0
Computation time using loops = 12.917649000000003ms
vectorised_dot_product = 104164166675000
Computation time using vectorization = 0.12057600000001112ms

Example 2: Outer Product

The outer product of two coordinate vectors is the tensor product of those vectors. When considering two vectors, a and b, with dimensions n x 1 and m x 1, respectively, the outer product will result in a rectangular matrix of size n x m.

If the two vectors have the same dimensions, the resulting matrix will be square as shown in the diagram.

Python Code Implementation

import time
import numpy
import array
# a and b is an array of array with int of 2 bytes size
a = array.array(‘i’)
for i in range(300):
a.append(i);
b = array.array(‘i’)
for i in range(300, 600):
b.append(i)
# classic outer product of vectors implementation
start_time = time.process_time()
classic_outer_product = numpy.zeros((300, 300))
for i in range(len(a)):
for j in range(len(b)):
classic_outer_product[i][j]= a[i]*b[j]
end_time = time.process_time()
print(“outer_product = “+ str(classic_outer_product));
print(“Computation time using loops = “ + str(1000*(end_time — start_time )) + “ms”)
vectorised_start_time = time.process_time()
vectorised_outer_product = numpy.outer(a, b)
vectorised_end_time = time.process_time()
print(“vectorised_outer_product = “+str(vectorised_outer_product));
print(“Computation time using vectorization = “+str(1000*(vectorised_end_time — vectorised_start_time ))+”ms”)

Output:

outer_product = [[ 0. 0. 0. … 0. 0. 0.]
[ 300. 301. 302. … 597. 598. 599.]
[ 600. 602. 604. … 1194. 1196. 1198.]

[ 89100. 89397. 89694. … 177309. 177606. 177903.]
[ 89400. 89698. 89996. … 177906. 178204. 178502.]
[ 89700. 89999. 90298. … 178503. 178802. 179101.]]
Computation time using loops = 40.29279ms
vectorised_outer_product = [[ 0 0 0 … 0 0 0]
[ 300 301 302 … 597 598 599]
[ 600 602 604 … 1194 1196 1198]

[ 89100 89397 89694 … 177309 177606 177903]
[ 89400 89698 89996 … 177906 178204 178502]
[ 89700 89999 90298 … 178503 178802 179101]]
Computation time using vectorization = 0.4319530000000127ms

Example 3: Element wise Product

Each element of the first matrix is multiplied by its corresponding element in the second matrix in the algebraic operation known as element-wise multiplication of two matrices. The matrices’ dimensions should match.

Consider the following two matrices: a and b. If an element in a has the index I or j, multiply a(i, j) by b(i, j), as shown below.

Python Code Implementation

import time
import numpy
import array
# a and b is an array of array with int of 2 bytes size
a = array.array(‘i’)
for i in range(20000):
a.append(i);
b = array.array(‘i’)
for i in range(20000, 40000):
b.append(i)
# classic element wise product of vectors implementation
vector = numpy.zeros((20000))
start_time = time.process_time()
for i in range(len(a)):
vector[i]= a[i]*b[i]
end_time = time.process_time()
print(“Element wise Product = “+ str(vector));
print(“Computation time using loops = “ + str(1000*(end_time — start_time )) + “ms”)
vectorised_end_time = time.process_time()
vector = numpy.multiply(a, b)
vectorised_start_time = time.process_time()
print(“Element wise Product = “+str(vector));
print(“Computation time using vectorization = “+str(1000*(vectorised_end_time — vectorised_start_time ))+”ms”)

Output:

Element wise Product = [0.00000000e+00 2.00010000e+04 4.00040000e+04 … 7.99820009e+08
7.99880004e+08 7.99940001e+08]
Computation time using loops = 6.08118199999999ms
Element wise Product = [ 0 20001 40004 … 799820009 799880004 799940001]
Computation time using vectorization = -0.10032100000001432ms

Example 4: Sum of numbers

Here we will be implementing the sum of numbers using loops and using vectorization and would analyze the time taken by both approaches.

Python Code Implementation

import time
import numpy as np
start = time.time()
total = 0
# sum using loop
for item in range(0, 30000):
total = total + item
print(‘Sum:’ + str(total))
end = time.time()
print(‘Computation time using loop is:’ +str(end — start))
# using vectorization
vectorised_start_time = time.time()
print(‘Sum:’ + str(np.sum(np.arange(30000))))
vectorised_end_time= time.time()
print(‘Computation time using vectorization is:’ +str(vectorised_end_time — vectorised_start_time ))

Output:

Sum:449985000
Computation time using loop is:0.0031075477600097656
Sum:449985000
Computation time using vectorization is:0.000225067138671875

Conclusion

  • Vectorized operations in NumPy enable the use of efficient, pre-compiled functions and mathematical operations on NumPy arrays and data sequences.
  • Vectorization is a method of performing array operations without the use of for loops.
  • Vectorized operations using NumPy are significantly quicker and more efficient than using for-loops.
  • Vectorization is the conversion of a scalar operation on individual data elements to an operation in which a single instruction simultaneously acts on multiple data elements.

--

--

shivam bhatele
The Pythoneers

I am a Software Developer and I loved to share programming knowledge and interact with new people. Also I am big lover of dogs, reading, and dancing.