# A New NumPy Interface for Apache MXNet (Incubating)

*Authors: **Soji Adeshina**, **Vishaal Kapoor*

*Thanks to Jun Wu, Sheng Zha, Balaji Kamakoti, Alex Chung for their early feedback and edits.*

# Overview

NumPy is a tool familiar to data scientists and machine learning developers everywhere. This is evidenced by its usage in nearly 3 out every 4 open source machine learning projects on GitHub. However, as hardware accelerators like GPUs have become increasingly incorporated into the machine learning toolkit, NumPy users have had to switch to new frameworks with different syntax to take advantage of the speed gains from training machine learning models on GPUs.

**The ****MXNet community**** is pleased to announce a new NumPy interface for MXNet** that allows developers to retain the familiar syntax of NumPy, while leveraging performance gains from accelerated computing on GPUs and asynchronous execution on CPUs and GPUs, in addition to automatic differentiation for differentiable NumPy ops through `mxnet.autograd`

.

## MXNet NumPy Interface Features

**Automatic Differentiation**

An important feature for training deep learning models is using automatic differentiation to compute the gradients that are used to improve the model. With the MXNet NumPy interface you can compute the gradients of any arbitrary function of MXNet NumPy operations with respect to any MXNet NumPy `ndarray`

.

**Native Hardware acceleration support**

Due to the computationally intensive nature of training deep learning models, it is often necessary to perform computation on hardware accelerators like GPUs. While NumPy does not currently support performing operations on GPUs, the MXNet NumPy interface allows you to run operations on GPU and train very deep models efficiently.

**Consistency with NumPy Syntax**

The new NumPy interface from MXNet, `mxnet.numpy`

, is intended to be a drop-in replacement for NumPy, as such `mxnet.numpy`

supports many familiar `numpy.ndarray`

operations necessary for developing machine learning or deep learning models and operations are continually being added. You can check out the supported operations on the documentation page.

# Getting Started

To get started,

(1) Install MXNet following the instructions here.

*Note, you’ll need to install the GPU version to take advantage of the GPU.** **If you just want to use the asynchronous execution on a CPU, you can select CPU*.

(2) Follow the example below to see how a few minor changes to your code is all you need to move your computations to the GPU for lightning fast performance.

# 1. Logistic Regression (with NumPy)

For an illustration of how `mxnet.numpy`

can be used as a drop-in replacement for `numpy`

, let's start with an implementation of a popular machine learning technique for classification: Logistic regression.

**In this section we use NumPy, and in the next we’ll do the same thing with mxnet.numpy and compare the differences.**

First we need to collect some data to train the model on. We will be training on data that is artificially generated by drawing from two bivariate normal distributions with different means. Each cluster corresponds to a different label in our training data. Our logistic regression model should learn to assign the correct labels to samples from each cluster.

`(20000, 2) (20000,)`

## Logistic Regression Implementation

Now, let’s implement the logistic regression model. Recall that logistic regression tries to learn the parameters for the following model:

To train the model, we first randomly initialize the model parameters. Then we use gradient descent to evolve towards parameter values that minimize the negative log likelihood of the labels given the data. We are implementing a regularized form of logistic regression to penalize excessively large weights.

`Loss at iteration 0 : 2.180406757477724`

Loss at iteration 1000 : 0.035017553467819935

Loss at iteration 2000 : 0.02283794361201609

Loss at iteration 3000 : 0.018289092286958103

Loss at iteration 4000 : 0.01582273104056109

Loss at iteration 5000 : 0.014246603445271892

Loss at iteration 6000 : 0.013140231953032487

Loss at iteration 7000 : 0.012314780437898269

Loss at iteration 8000 : 0.011671959906342289

Loss at iteration 9000 : 0.011155203054815713

Model weights: [-2.08750961 -1.93529254 0.12816638]

<class ‘numpy.ndarray’>

After running gradient descent on the logistic regression model for 10,000 iterations, we should have learned sufficiently good model weights for our classification problem and we print the learned weights above. Notice that the data type for the weights is `numpy.ndarray`

.

We can do some test predictions with the trained model by taking a small subset of the training data.

`[9.88456391e-05 9.22181384e-04]`

[0.99999933 0.99793841]

We see that the model predicts values very close to 0 for samples from the 0 cluster and close to 1 for samples from the 1 cluster, which is what we expect.

# 2. Logistic Regression with `mxnet.numpy`

Now we can easily train the same model using `mxnet.numpy`

. Notice that the code below that generates the data is identical to that in the snippet above. The only difference in the snippet below is that now we have made the import statement such that the `np`

namespace maps to `mxnet.numpy`

instead of `numpy`

.

We don’t even need to make any changes or redefine the `LogisticRegression`

model class we created earlier. We can simply create a new model instance and fit the instance with the training data as `mxnet.numpy`

arrays.

`Loss at iteration 0 : 2.7279124`

Loss at iteration 1000 : 0.03574349

Loss at iteration 2000 : 0.023593647

Loss at iteration 3000 : 0.019085938

Loss at iteration 4000 : 0.016654767

Loss at iteration 5000 : 0.015109067

Loss at iteration 6000 : 0.014029568

Loss at iteration 7000 : 0.013228249

Loss at iteration 8000 : 0.012607378

Loss at iteration 9000 : 0.0121107865

Model weights: [-1.993176 -2.043093 0.211259]

<class ‘mxnet.numpy.ndarray’>

We can see that after fitting the model, we learn similar model weights as before, but now the weights are of type `mxnet.numpy.ndarray`

`[9.642076e-07 4.287043e-06]`

[0.9997216 0.99987984]

Also, our model predicts the right classes consistent with the earlier model.

## Using the GPU with mxnet.numpy

Serving as a drop-in replacement for numpy is only the beginning of what can be achieved with `mxnet.numpy`

. `mxnet.numpy`

also allows you to run expensive computation on GPU with very minimal syntax changes to your existing code. All you need to do is simply add a keyword argument specifying that you want the GPU context everywhere you create an array.

Here we extend the logistic regression example above to use GPUs.

First, let’s make a few modifications to data generating process by adding `ctx=npx.gpu()`

to every array initialization procedure.

Next, we modify the `LogisticRegression`

class by ensuring that every instance where a new array is created, the array is created on the GPU context. Therefore, we add the `ctx=npx.gpu()`

keyword to the weight initialization step and the creation of the dummy column of ones.

In total we only need to change 3 lines.

`Loss at iteration 0 : 2.023619`

Loss at iteration 1000 : 0.03632223

Loss at iteration 2000 : 0.023934785

Loss at iteration 3000 : 0.019356007

Loss at iteration 4000 : 0.016889626

Loss at iteration 5000 : 0.015321383

Loss at iteration 6000 : 0.014225139

Loss at iteration 7000 : 0.013410199

Loss at iteration 8000 : 0.012777602

Loss at iteration 9000 : 0.012270548

Model weights: [ 0.11582068 -2.0269437 -1.992209 ] @gpu(0)

<class ‘mxnet.numpy.ndarray’>

`[0.00135131 0.00256316] @gpu(0)`

[0.978114 0.99988353] @gpu(0)

We can see that the learned weights are now created and stored on GPU. With only minimal changes, we were able to convert an existing numpy implementation to run on the GPU without breaking numpy syntax.

# 3. Comparing performance on NumPy and `mxnet.numpy`

Another key feature of the MXNet NumPy interface is reduced latencies owing to it’s use of the optimized, asynchronous MXNet backend engine. Comparing the computation speed of common operations in deep learning implemented in NumPy to their implementations in `mxnet.numpy`

using both CPU and GPU, we observe significant speed-ups using `mxnet.numpy`

.

`NumPy : Dotted two 4096x4096 matrices in 1.41 s.`

mxnet.numpy : Dotted two 4096x4096 matrices in 0.48 s.

mxnet.numpy on GPU : Dotted two 4096x4096 matrices in 0.01 s.

Here we see that `mxnet.numpy`

is nearly 3 times as fast at computing matrix products and using `mxnet.numpy`

with GPU acceleration is 2 orders of magnitude faster.

*The speed benchmarks were performed on an Amazon **p3.2xlarge** instance with NumPy version 1.14.5 and mxnet-cu100mkl version 1.5.0*

# 4. Using Automatic differentiation (Autograd) with `mxnet.numpy`

Autograd is the automatic differentiation library of MXNet. It traces the execution of a function and then performs reverse mode automatic differentiation to compute the gradients. `mxnet.numpy`

arrays and operators can be used in the `autograd`

scope so that you can chain together `mxnet.numpy`

operations into arbitrary functions and automatically compute their gradients.

Here is a simple example

`array([[ 4., 8.], [12., 16.]])`

Here we have automatically computed the gradient of *f(x) = 2x²* with respect to *x*, at *x = *[*1, 2, 3, 4*] correctly as *4x = *[*4, 8, 12, 16*]

This means you can now take existing NumPy functions or create new functions with NumPy syntax and apply automatic differentiation to compute the gradient of these functions with respect to some input. This allows `numpy`

users have a smooth transition to define neural networks using NumPy syntax, and to use the sophisticated gradient based optimization techniques implemented in the `mxnet.optimizer`

library for training these models.

For a tutorial on how to implement neural networks with `mxnet.numpy`

see the crash course tutorial.

# 5. Differences between `mxnet.ndarray`

and `mxnet.numpy`

If you are already a regular user of MXNet, you might be wondering — “Didn’t MXNet already have a NumPy like interface for manipulating multi-dimensional arrays or tensors?”, “What’s the difference between that library — `mxnet.ndarray`

and this one `mxnet.numpy`

?” , “Is `mxnet.ndarray`

going to be deprecated in favor of `mxnet.numpy`

now?”. The answer to that last part is No!

`mxnet.ndarray`

was designed to be as close to NumPy as possible but without negatively impacting performance while maintaining interoperability with MXNet's symbolic API. As a result some parts of NumPy syntax were omitted from `mxnet.ndarray`

that are now supported in `mxnet.numpy`

. Here are some of the differences between `mxnet.ndarray`

and `mxnet.numpy`

.

`mxnet.numpy`

supports automatic broadcasting for*, -, /, +*`mxnet.numpy`

supports NumPy style indexing and slicing whereas`mxnet.ndarray`

uses`nd.take`

and`nd.pick`

# Summary

MXNet has a new high level interface that maintains full syntax similarity with NumPy with the following features:

- Automatic differentiation for training new or existing machine learning models with gradient descent using NumPy syntax.
- Accelerated tensor computation using GPUs in with minimal changes to NumPy syntax.
- Asynchronous operator execution with the backend MXNet engine on CPU and GPU.
- Drop in replacement for NumPy in existing machine learning projects so that those projects can take advantage of deep learning using MXNet.

**For more information** check out this guide on asynchronous computation and this guide on computation on the GPU using `mxnet.numpy`

.