Deep Learning

Numpy on GPU/TPU

Make your Numpy code to run 50x faster.

Sambasivarao. K
Analytics Vidhya
Published in
4 min readApr 20, 2021

--

Image by i2tutorials

Numpy is so far the most used library for performing mathematical operations on arrays. It has formed the base for many Machine learning and data science libraries. It has a large number of high-level mathematical functions to operate on arrays. As we all know, Numpy gained popularity because of its speed of operations. Numpy array objects work almost 50x faster than the python lists. Also, NumPy arrays support vectorization which removes the loops in python.

Can we run numpy operations even faster? The answer is Yes!

Tensorflow has implemented a subset of Numpy API and released it as part of 2.4 version, as tf.experimental.numpy. This allows running NumPy code much faster and can further improve the performance by running on GPU/TPU.

Benchmark

Before going into a more detailed view, let’s compare the performance of NumPy and tensorflow-numpy. For workloads composed of small operations (less than about 10 microseconds), TensorFlow dispatching operations can dominate the runtime and NumPy could provide better performance. For other cases, TensorFlow should generally provide better performance.

Tensorflow has created a sigmoid benchmark experiment for performance comparison. They implemented the sigmoid operation using NumPy and tensorlfow-numpy and ran it multiple times on CPU and GPU. The results of this experiment are shown below:

Image by Tensorflow

As you can see for small operations, NumPy performs better and as the size increases, tf-numpy provides better performance. And the performance on GPU is way better than its CPU counterpart.

TensorFlow NumPy ND array

Now that we are convinced that tensorflow-numpy performs better than numpy, let’s dive into the API.

ND array is an instance of tf.experimental.numpy.ndarray, represents a multidimensional dense array. This wraps an immutable tf.Tensor which makes it interoperable with tf.Tensor. Also, it implements __array__ interface which allows these objects to be passed into contexts that expect a NumPy or array-like object (e.g. matplotlib). Interoperation does not do data copies, even for data placed on accelerators or remote devices.

tf.Tensor objects can be passed to tf.experimental.numpy APIs, without performing data copies.

NumPy interoperability (Image by Author)

Operator Precedence: TensorFlow NumPy defines an __array_priority__ higher than NumPy’s. This means that for operators involving both ND array and np.ndarray, the former will take precedence, i.e., np.ndarray input will get converted to an ND array and the TensorFlow NumPy implementation of the operator will get invoked.

Operator precedence (Image byAuthor)

Types: ND array supports a subset of numpy data types and type promotion follows numpy semantics. Also, broadcasting and indexing work the same way as NumPy arrays.

Data type and promotions (Image by Author)

Device support: ND array has GPU and TPU support on par with tf.Tensor as it wraps around tf.Tensor. We can control which device to use by using tf.device scopes as shown below.

Device setup (Image by Author)

Graph and eager modes: Eager mode execution is similar to python code execution, so it supports ND array just like NumPy by executing op-by-op. However, the same code can be executed in graph mode by putting it inside tf.function. Below is the code example to do that.

tf.function usage (Image by Author)

Limitations

  • Not all dtypes are currently supported.
  • Mutation is not supported. ND Array wraps immutable tf.Tensor.
  • Fortran order, views, stride_tricks are not supported.
  • NumPy C API is not supported. NumPy’s Cython and Swig integration are not supported.
  • Only a subset of functions and modules are supported.

That’s all for now. We have explored tensorflow-numpy and its capabilities. tf-numpy’s interoperability makes it a good choice to use in both TensorFlow codes and general numpy codes as well. Also, you can use this library to run complex numpy codes on GPU.

In the next article, we will build a Neural Network from scratch using tensorflow-numpy and use auto-differentiation using tf.GradientTape to train the network on GPU. Also, we will explore TensorFlow-related speed-up tricks, like compilation and auto-vectorization. Thank you!

Join my weekly newsletter to get the latest updates and recent advances in AI along with curated stories from the medium on Machine Learning.

--

--

Sambasivarao. K
Analytics Vidhya

Computer Vision and Machine Learning Engineer @Honeywell. | Content creator at Analytics Vidhya, Geek Culture, and MLearning.ai