Using MXNet NDArray for Fast GPU Algebra on Images

Published in

Apache MXNet

6 min readFeb 15, 2021

By Philippe Saadé and Olivier Cruchant

Introduction

Apache MXNet is a modern deep learning framework featuring both a pleasant developer experience and high-performance training and inference. In order to get the best possible performance while exposing a friendly interface, the MXNet community re-developed from scratch a suite of optimized tensor algebra kernels, historically accessible via the Python MXNet NDArray library. The initial MXNet NDArray API is very similar to NumPy, yet it features major internal differences enabling high-performance deep learning: (1) it can run on GPU, (2) it supports auto-differentiation, (3) it runs asynchronously, executing only the code that need to run and in an optimized order. In order to make the powerful MXNet NDArray API fully accessible to NumPy developers, the MXNet community released in 2019 the mxnet.numpy library, which implements the NumPy API (Announcement on MXNet Medium). In this blog post, we illustrate the strength of MXNet NDArray library for fast and compact algebra over images. We first highlight couple NDArray concepts and then assemble them into a function, that we provide both as an mxnet.ndarray implementation and an mxnet.numpy implementation. We use as an illustration a fictional use-case of anomalous image detection via simple pixel difference. This can be a reasonable baseline when looking for anomalous areas over a batch of near-identical images with same viewpoint and luminosity, for example the images below:

*From left to right: (1) correct sample (2) synthetic anomalous sample with a stain (3) anomalous sample after difference with the cross-batch mean.*

Notable concepts

We will use the following libraries:

import itertools as itr
import os
import mxnet as mx
from mxnet import image as mxim
from mxnet import init, gluon, nd
import matplotlib.cm as cm
from matplotlib import pyplot as plt
from matplotlib import image as mpim

MXNet features the image library that we can use to read images:

picture = mxim.imread('orig1.JPG')# .asnumpy() converts MXNet NDArrays to Numpy array
plt.imshow(picture.asnumpy())

*This is a non-anomalous image. In our synthetic dataset, we have 10 such images, and two images with visual anomalies*

MXNet NDArray leverages the mxnet context concept, that enables to control the hardware context - CPU or GPU of objects. For example, we can send our previously loaded image to the GPU (if on a GPU-equipped instance) with the following snippet:

pic_on_gpu = picture.copyto(mx.gpu())

Note that we can also use picture.as_in_context(mx.gpu()) to copy the picture to the GPU ; the advantage of the latter being that it does the actual copy only if the image is not in the GPU already.

Similarly to Numpy, MXNet NDArray is a great platform to apply algebraic transformations to batches of records. Writing transformations applied to batch of images is furthermore relevant on GPUs which excel at batch processing. Very similar to their Numpy counterparts, we use in this demo ndarray.concat(), ndarray.mean(), ndarray.abs(), ndarray.max() to manipulate and analyze a batch of images.

MXNet being primarily designed for deep learning, it features abundant neural network primitives. Its imperative Python front-end Gluon provides numerous model layers, and in this blog - even though we do not use deep learning nor even machine learning - we borrow from Gluon its 2D-convolution (mxnet.gluon.nn.Conv2D) with all coefficients initialized at 1 to apply a smoothing effect and isolate contiguous areas of anomalous pixels.

Wrapping it all in a compact function

The inline code block below proposes an example function to run the above-mentioned analysis. It first reads a batch of images, then subtracts from all of them the cross-batch mean image, applies a hard-coded convolution to each of them and return images that have high-value pixels after the convolution. Note that this code features couple hard-coded constants that would be worth tuning over representative data. This sample is provided as an illustration of the NDArray capacities and may deserve further refinement and testing before being used in the real-world.

Implementation with `mxnet.ndarray`

def mxnd_find_anomalies(folder, gpu=False, save_viz=True, threshold=300):
    """
    Parameters
        ----------
        folder : str
            local directory with the batch of images to score
        gpu : bool, optional
            whether to use GPU or CPU
        save_viz : bool, optional
            saves the pictures of deltas and convolutions
        threshold : int, optional
            conv threshold to be anomalous. Should be tuned!           
    """
    
    ctx = mx.gpu() if gpu else mx.cpu()  # Set context
    
    # read images in a 4D-NDArray
    pics = os.listdir(folder)
    ims = [mxim.imread(folder+'/'+pic).expand_dims(0).as_in_context(ctx) for pic in pics]
    ims = nd.concat(*ims, dim=0)
        
    # Compute the average image
    avg = nd.mean(ims.astype('float32'), axis=0)
    
    # Remove the mean from every image, then average-pool on color
    deltas = nd.mean(ims.astype('float32')-avg, axis=3)
    
    # Apply a batched convolution to all the deltas, then apply abs
    conv = gluon.nn.Conv2D(1, kernel_size=5, use_bias=False)
    conv.initialize(init.Constant(1))
    conv.collect_params().reset_ctx(ctx)
    
    # we expand dims for conv, which expect 3D pics (with channels)
    conv_delta = nd.abs(conv(deltas.expand_dims(1)))
    
    # Max-pool over the image
    top_deltas = nd.max(conv_delta, axis=(1,2,3))
    
    # Return pics that have a convolution > threshold
    anomalies = list(itr.compress(pics, top_deltas > threshold))
    an_indexes = list(itr.compress(range(len(pics)), top_deltas > threshold))
    
    if save_viz:  # optionally, save images for interpretability
        for a, i in zip(anomalies, an_indexes):
            mpim.imsave('pixmap-'+a, deltas[i].asnumpy(), format='png')
            mpim.imsave('convmap-'+a, conv_delta[i][0].asnumpy(), format='png')
                
    return list(anomalies)

Implementation with `mxnet.numpy`

In the below code block, we implement the same logic as above, while replacing the mxnet.ndarray API by the mxnet.numpy API. Since mxnet.numpy implements the NumPy API, we need to replace couple methods by their NumPy counterpart, notably:

Expanding array dimensions: array.expand_dims(0) above becomes np.expand_dims(array, 0)
Concatenation: nd.concat(*ims, dim=0) becomes np.concatenate(ims, axis=0)

We further need to import mxnet.numpy and notify MXNet backend of our use of NumPy semantics, with the extra import and configuration below:

from mxnet import np, npx
npx.set_np()

And here is the final implementation of our baseline anomaly detection function using the MXNet NumPy API:

def mxnp_find_anomalies(folder, gpu=False, save_viz=True, threshold=300):
    """
    Parameters
        ----------
        folder : str
            local directory with the batch of images to score
        gpu : bool, optional
            whether to use GPU or CPU
        save_viz : bool, optional
            saves the pictures of deltas and convolutions
        threshold : int, optional
            conv threshold to be anomalous. Should be tuned!
    """
    
    ctx = mx.gpu() if gpu else mx.cpu()  # Set context
    
    # read images in a 4D-NDArray
    pics = os.listdir(folder)
    ims = [np.expand_dims(mxim.imread(folder+'/'+pic), 0).as_in_context(ctx) for pic in pics]
    ims = np.concatenate(ims, axis=0)
        
    # Compute the average image
    avg = np.mean(ims.astype('float32'), axis=0)
    
    # Remove the mean from every image, then average-pool on color
    deltas = np.mean(ims.astype('float32')-avg, axis=3)
    
    # Apply a batched convolution to all the deltas, then apply abs
    conv = gluon.nn.Conv2D(1, kernel_size=5, use_bias=False)
    conv.initialize(init.Constant(1))
    conv.collect_params().reset_ctx(ctx)
    
    # we expand dims for conv, which expect 3D pics (with channels)
    conv_delta = np.abs(conv(np.expand_dims(deltas, 1)))
    
    # Max-pool over the image
    top_deltas = np.max(conv_delta, axis=(1,2,3))
    
    # Return pics that have a convolution > threshold
    anomalies = list(itr.compress(pics, top_deltas > threshold))
    an_indexes = list(itr.compress(range(len(pics)), top_deltas > threshold))
    
    if save_viz:  # optionally, save images for interpretability
        for a, i in zip(anomalies, an_indexes):
            mpim.imsave('pixmap-'+a, deltas[i].asnumpy(), format='png')
            mpim.imsave('convmap-'+a, conv_delta[i][0].asnumpy(), format='png')
                
    return list(anomalies)

Here are 2 anomalies the function found over our synthetic data:

Both functions have comparable latencies, which makes sense since they use the same back-end kernels. Here are execution latencies excluding visualizations creation, over a batch of 12 1000px*750px images with MXNet 1.7, on an Amazon SageMaker GPU-equipped ml.p3.2xlarge Notebook (average of 10 runs)

The mxnet.ndarray code runs in 780ms and the mxnet.numpy code in 750ms on the 8 vCPU (Intel Xeon E5)
The mxnet.ndarray code runs in 100ms and the mxnet.numpy code in 75ms on the GPU (NVIDIA Tesla V100 GPU), close to 90% faster vs CPU

Conclusion

Even though MXNet is a deep learning framework, its primitives can be creatively used beyond neural network development. In this post, we showed how to perform simple, GPU-accelerated algebra over images using its NDArray library, that can be used by NumPy developers without extra learning curve via the mxnet.numpy library. Do not hesitate to take a look at it, contribute and engage with the community!