Deep Learning

Numpy on GPU/TPU

Make your Numpy code to run 50x faster.

Sambasivarao. K

Published in

Analytics Vidhya

4 min readApr 20, 2021

Numpy is so far the most used library for performing mathematical operations on arrays. It has formed the base for many Machine learning and data science libraries. It has a large number of high-level mathematical functions to operate on arrays. As we all know, Numpy gained popularity because of its speed of operations. Numpy array objects work almost 50x faster than the python lists. Also, NumPy arrays support vectorization which removes the loops in python.

Can we run numpy operations even faster? The answer is Yes!

Tensorflow has implemented a subset of Numpy API and released it as part of 2.4 version, as tf.experimental.numpy. This allows running NumPy code much faster and can further improve the performance by running on GPU/TPU.

Benchmark

Before going into a more detailed view, let’s compare the performance of NumPy and tensorflow-numpy. For workloads composed of small operations (less than about 10 microseconds), TensorFlow dispatching operations can dominate the runtime and NumPy could provide better performance. For other cases, TensorFlow should generally provide better performance.

Tensorflow has created a sigmoid benchmark experiment for performance comparison. They implemented the sigmoid operation using NumPy and tensorlfow-numpy and ran it multiple times on CPU and GPU. The results of this experiment are shown below:

As you can see for small operations, NumPy performs better and as the size increases, tf-numpy provides better performance. And the performance on GPU is way better than its CPU counterpart.

TensorFlow NumPy ND array

Now that we are convinced that tensorflow-numpy performs better than numpy, let’s dive into the API.

ND array is an instance of tf.experimental.numpy.ndarray, represents a multidimensional dense array. This wraps an immutable tf.Tensor which makes it interoperable with tf.Tensor. Also, it implements __array__ interface which allows these objects to be passed into contexts that expect a NumPy or array-like object (e.g. matplotlib). Interoperation does not do data copies, even for data placed on accelerators or remote devices.

tf.Tensor objects can be passed to tf.experimental.numpy APIs, without performing data copies.

NumPy interoperability (Image by Author)

Operator Precedence: TensorFlow NumPy defines an __array_priority__ higher than NumPy’s. This means that for operators involving both ND array and np.ndarray, the former will take precedence, i.e., np.ndarray input will get converted to an ND array and the TensorFlow NumPy implementation of the operator will get invoked.

Types: ND array supports a subset of numpy data types and type promotion follows numpy semantics. Also, broadcasting and indexing work the same way as NumPy arrays.

Data type and promotions (Image by Author)

Device support: ND array has GPU and TPU support on par with tf.Tensor as it wraps around tf.Tensor. We can control which device to use by using tf.device scopes as shown below.

Graph and eager modes: Eager mode execution is similar to python code execution, so it supports ND array just like NumPy by executing op-by-op. However, the same code can be executed in graph mode by putting it inside tf.function. Below is the code example to do that.

Limitations

Not all dtypes are currently supported.
Mutation is not supported. ND Array wraps immutable tf.Tensor.
Fortran order, views, stride_tricks are not supported.
NumPy C API is not supported. NumPy’s Cython and Swig integration are not supported.
Only a subset of functions and modules are supported.

That’s all for now. We have explored tensorflow-numpy and its capabilities. tf-numpy’s interoperability makes it a good choice to use in both TensorFlow codes and general numpy codes as well. Also, you can use this library to run complex numpy codes on GPU.

In the next article, we will build a Neural Network from scratch using tensorflow-numpy and use auto-differentiation using tf.GradientTape to train the network on GPU. Also, we will explore TensorFlow-related speed-up tricks, like compilation and auto-vectorization. Thank you!

Tensorflow-Numpy Model

Neural Network built from scratch using tensorflow-numpy

medium.com

Join my weekly newsletter to get the latest updates and recent advances in AI along with curated stories from the medium on Machine Learning.