Thanks to Jun Wu, Sheng Zha, Balaji Kamakoti, Alex Chung for their early feedback and edits.
NumPy is a tool familiar to data scientists and machine learning developers everywhere. This is evidenced by its usage in nearly 3 out every 4 open source machine learning projects on GitHub. However, as hardware accelerators like GPUs have become increasingly incorporated into the machine learning toolkit, NumPy users have had to switch to new frameworks with different syntax to take advantage of the speed gains from training machine learning models on GPUs.
The MXNet community is pleased to announce a new NumPy interface for MXNet that allows developers to retain the familiar syntax of NumPy, while leveraging performance gains from accelerated computing on GPUs and asynchronous execution on CPUs and GPUs, in addition to automatic differentiation for differentiable NumPy ops through
MXNet NumPy Interface Features
An important feature for training deep learning models is using automatic differentiation to compute the gradients that are used to improve the model. With the MXNet NumPy interface you can compute the gradients of any arbitrary function of MXNet NumPy operations with respect to any MXNet NumPy
Native Hardware acceleration support
Due to the computationally intensive nature of training deep learning models, it is often necessary to perform computation on hardware accelerators like GPUs. While NumPy does not currently support performing operations on GPUs, the MXNet NumPy interface allows you to run operations on GPU and train very deep models efficiently.
Consistency with NumPy Syntax
The new NumPy interface from MXNet,
mxnet.numpy, is intended to be a drop-in replacement for NumPy, as such
mxnet.numpy supports many familiar
numpy.ndarray operations necessary for developing machine learning or deep learning models and operations are continually being added. You can check out the supported operations on the documentation page.
To get started,
(1) Install MXNet following the instructions here.
Note, you’ll need to install the GPU version to take advantage of the GPU. If you just want to use the asynchronous execution on a CPU, you can select CPU.
(2) Follow the example below to see how a few minor changes to your code is all you need to move your computations to the GPU for lightning fast performance.
1. Logistic Regression (with NumPy)
For an illustration of how
mxnet.numpy can be used as a drop-in replacement for
numpy, let's start with an implementation of a popular machine learning technique for classification: Logistic regression.
In this section we use NumPy, and in the next we’ll do the same thing with mxnet.numpy and compare the differences.
First we need to collect some data to train the model on. We will be training on data that is artificially generated by drawing from two bivariate normal distributions with different means. Each cluster corresponds to a different label in our training data. Our logistic regression model should learn to assign the correct labels to samples from each cluster.
(20000, 2) (20000,)
Logistic Regression Implementation
Now, let’s implement the logistic regression model. Recall that logistic regression tries to learn the parameters for the following model:
To train the model, we first randomly initialize the model parameters. Then we use gradient descent to evolve towards parameter values that minimize the negative log likelihood of the labels given the data. We are implementing a regularized form of logistic regression to penalize excessively large weights.
Loss at iteration 0 : 2.180406757477724
Loss at iteration 1000 : 0.035017553467819935
Loss at iteration 2000 : 0.02283794361201609
Loss at iteration 3000 : 0.018289092286958103
Loss at iteration 4000 : 0.01582273104056109
Loss at iteration 5000 : 0.014246603445271892
Loss at iteration 6000 : 0.013140231953032487
Loss at iteration 7000 : 0.012314780437898269
Loss at iteration 8000 : 0.011671959906342289
Loss at iteration 9000 : 0.011155203054815713
Model weights: [-2.08750961 -1.93529254 0.12816638]
After running gradient descent on the logistic regression model for 10,000 iterations, we should have learned sufficiently good model weights for our classification problem and we print the learned weights above. Notice that the data type for the weights is
We can do some test predictions with the trained model by taking a small subset of the training data.
We see that the model predicts values very close to 0 for samples from the 0 cluster and close to 1 for samples from the 1 cluster, which is what we expect.
2. Logistic Regression with
Now we can easily train the same model using
mxnet.numpy. Notice that the code below that generates the data is identical to that in the snippet above. The only difference in the snippet below is that now we have made the import statement such that the
np namespace maps to
mxnet.numpy instead of
We don’t even need to make any changes or redefine the
LogisticRegression model class we created earlier. We can simply create a new model instance and fit the instance with the training data as
Loss at iteration 0 : 2.7279124
Loss at iteration 1000 : 0.03574349
Loss at iteration 2000 : 0.023593647
Loss at iteration 3000 : 0.019085938
Loss at iteration 4000 : 0.016654767
Loss at iteration 5000 : 0.015109067
Loss at iteration 6000 : 0.014029568
Loss at iteration 7000 : 0.013228249
Loss at iteration 8000 : 0.012607378
Loss at iteration 9000 : 0.0121107865
Model weights: [-1.993176 -2.043093 0.211259]
We can see that after fitting the model, we learn similar model weights as before, but now the weights are of type
Also, our model predicts the right classes consistent with the earlier model.
Using the GPU with mxnet.numpy
Serving as a drop-in replacement for numpy is only the beginning of what can be achieved with
mxnet.numpy also allows you to run expensive computation on GPU with very minimal syntax changes to your existing code. All you need to do is simply add a keyword argument specifying that you want the GPU context everywhere you create an array.
Here we extend the logistic regression example above to use GPUs.
First, let’s make a few modifications to data generating process by adding
ctx=npx.gpu() to every array initialization procedure.
Next, we modify the
LogisticRegression class by ensuring that every instance where a new array is created, the array is created on the GPU context. Therefore, we add the
ctx=npx.gpu() keyword to the weight initialization step and the creation of the dummy column of ones.
In total we only need to change 3 lines.
Loss at iteration 0 : 2.023619
Loss at iteration 1000 : 0.03632223
Loss at iteration 2000 : 0.023934785
Loss at iteration 3000 : 0.019356007
Loss at iteration 4000 : 0.016889626
Loss at iteration 5000 : 0.015321383
Loss at iteration 6000 : 0.014225139
Loss at iteration 7000 : 0.013410199
Loss at iteration 8000 : 0.012777602
Loss at iteration 9000 : 0.012270548
Model weights: [ 0.11582068 -2.0269437 -1.992209 ] @gpu(0)
[0.00135131 0.00256316] @gpu(0)
[0.978114 0.99988353] @gpu(0)
We can see that the learned weights are now created and stored on GPU. With only minimal changes, we were able to convert an existing numpy implementation to run on the GPU without breaking numpy syntax.
3. Comparing performance on NumPy and
Another key feature of the MXNet NumPy interface is reduced latencies owing to it’s use of the optimized, asynchronous MXNet backend engine. Comparing the computation speed of common operations in deep learning implemented in NumPy to their implementations in
mxnet.numpy using both CPU and GPU, we observe significant speed-ups using
NumPy : Dotted two 4096x4096 matrices in 1.41 s.
mxnet.numpy : Dotted two 4096x4096 matrices in 0.48 s.
mxnet.numpy on GPU : Dotted two 4096x4096 matrices in 0.01 s.
Here we see that
mxnet.numpy is nearly 3 times as fast at computing matrix products and using
mxnet.numpy with GPU acceleration is 2 orders of magnitude faster.
The speed benchmarks were performed on an Amazon p3.2xlarge instance with NumPy version 1.14.5 and mxnet-cu100mkl version 1.5.0
4. Using Automatic differentiation (Autograd) with
Autograd is the automatic differentiation library of MXNet. It traces the execution of a function and then performs reverse mode automatic differentiation to compute the gradients.
mxnet.numpy arrays and operators can be used in the
autograd scope so that you can chain together
mxnet.numpy operations into arbitrary functions and automatically compute their gradients.
Here is a simple example
array([[ 4., 8.], [12., 16.]])
Here we have automatically computed the gradient of f(x) = 2x² with respect to x, at x = [1, 2, 3, 4] correctly as 4x = [4, 8, 12, 16]
This means you can now take existing NumPy functions or create new functions with NumPy syntax and apply automatic differentiation to compute the gradient of these functions with respect to some input. This allows
numpy users have a smooth transition to define neural networks using NumPy syntax, and to use the sophisticated gradient based optimization techniques implemented in the
mxnet.optimizer library for training these models.
For a tutorial on how to implement neural networks with
mxnet.numpy see the crash course tutorial.
5. Differences between
If you are already a regular user of MXNet, you might be wondering — “Didn’t MXNet already have a NumPy like interface for manipulating multi-dimensional arrays or tensors?”, “What’s the difference between that library —
mxnet.ndarray and this one
mxnet.numpy?” , “Is
mxnet.ndarray going to be deprecated in favor of
mxnet.numpy now?”. The answer to that last part is No!
mxnet.ndarray was designed to be as close to NumPy as possible but without negatively impacting performance while maintaining interoperability with MXNet's symbolic API. As a result some parts of NumPy syntax were omitted from
mxnet.ndarray that are now supported in
mxnet.numpy. Here are some of the differences between
mxnet.numpysupports automatic broadcasting for
, -, /, +similar to NumPy
mxnet.numpysupports NumPy style indexing and slicing whereas
MXNet has a new high level interface that maintains full syntax similarity with NumPy with the following features:
- Automatic differentiation for training new or existing machine learning models with gradient descent using NumPy syntax.
- Accelerated tensor computation using GPUs in with minimal changes to NumPy syntax.
- Asynchronous operator execution with the backend MXNet engine on CPU and GPU.
- Drop in replacement for NumPy in existing machine learning projects so that those projects can take advantage of deep learning using MXNet.