MXNet boosts CPU performance with MKL-DNN

Published in

Apache MXNet

3 min readJan 16, 2019

Authors: Hagay Lupesko, Alexander Zai, Manu Seth

Apache MXNet community is excited to announce that MXNet performance on CPUs is now dramatically improved through the integration of Intel MKL-DNN into the default MXNet build. The improvements are significant across a wide range of model architectures, with inference speed-ups for both latency and throughput improving between 3x to 35x, depending on the model architecture and CPU — more performance stats listed later in the post.

To benefit from these optimizations, just grab the latest version of MXNet for Java or Scala. Alternatively, just follow the default steps to build MXNet from source — no further customization, installations or specialized actions needed!

What is Apache MXNet

Apache MXNet is an open-source deep learning framework used to build, train, and deploy deep neural networks. MXNet abstracts much of the complexity involved in implementing neural networks, is highly performant and scalable, and offers APIs across popular programming languages such as Python, C++, Java, R, Scala, and more.

What is Intel MKL-DNN

Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) is an open-source library for high performance deep-learning on CPUs. The library accelerates neural network applications through vectorized and threaded building blocks, exposed via a C++ interface. MKL-DNN source code is available on GitHub.

Much of MKL-DNN optimization magic is achieved through data parallelism, also known as SIMD: Single Instruction Multiple Data. Modern x86 processors support Advanced Vector Extensions (AVX), which extend the x86 instruction set with SIMD capabilities, and enable the processor to perform operations on multiple numbers at the same time, as opposed to running an operation one number at a time. Recent AVX512 extensions, supported on Intel’s Skylake architecture, can handle up to 16 32-bit float numbers in just a single instruction!

Performance improvements at a glance

As part of the effort to integrate MKL-DNN as the default build, the MXNet community ran exhaustive inference benchmarks across Intel and AMD CPUs. To check out the full details of these benchmarks visit the MXNet Wiki.

Below are a few diagrams illustrating the inference performance improvements across a few model architectures, benchmarked on Intel Xeon Skylake-SP (AWS c5.18xlarge).

MXNet Inference benchmarks running on Intel Xeon

Ensuring quality and accuracy

To ensure the MKL-DNN integration does not degrade quality and accuracy, MXNet contributors implemented a full suite of unit tests, reaching 100% coverage for the operators powered by MKL-DNN. These tests, alongside hundreds of other MXNet unit and integration tests, are running multiple times a day with each MXNet build, on the project’s CI/CD pipeline — ensuring operators and models work as expected.

Beyond unit tests, MXNet contributors tested end-to-end model accuracies on more than 30 deep learning models, to verify that the introduction of MKL-DNN does not impact model accuracy — results show that model accuracy is not impacted by the usage of MKL-DNN acceleration, as detailed in the Inference Accuracy Benchmarks for MKL-DNN on the MXNet wiki.

Despite all of the testing done, there may still be a case where developers using MXNet may want to disable MKL-DNN acceleration. For that purpose, a new “MXNET_MKLDNN_ENABLED” environment variable was introduced, set to 1 by default, that turns off MKLDNN acceleration when set to 0, more details can be found in MXNet’s environment variables FAQ.

Getting started with MXNet

To benefit from these performance improvements, as well as the rest of the MXNet goodness, head to the MXNet install page, choose your platform and language of choice, and follow the steps outlined.

If you are just starting with MXNet and deep learning, we highly recommend the MXNet in 60 minutes blog post as a great way to get started. Enjoy!