Trueface Tutorials: Convert MXNet Models into High-Performance Inference Frameworks

Published in

Trueface

5 min readSep 27, 2019

Here at Trueface, we use MXNet to train our machine learning models. MXNet is a great framework when it comes to prototyping and training because of the robust and easy-to-use API which allows developers of all skill levels to get started in just a few lines of code.

When it comes time to deploy our software on clients’ infrastructure however, we are often constrained by memory and CPU limitations due to deploying on embedded systems. Therefore, we require our software to be fast, lightweight, and dependency-free.

Unfortunately, MXNet is not the best candidate framework for this task. It not only proves to be comparatively slow for performing inference but also requires many dependencies when built for speed (requires Intel MKL-DNN among other libraries).

One solution to this problem is to convert these models to better-performing frameworks such as NCNN and TVM. Both of these frameworks have been optimized for inference (on CPU as well as GPU) and are packaged as self-contained static libraries.

Originally when we searched for ways to bridge the gap between MXNet and NCNN/TVM, the only instructions we found were either incomplete or required serious modifications. Therefore, we created this end-to-end tutorial to guide those needing to convert MXNet models into more resource-friendly frameworks.

In this tutorial, we will be converting a 100 layer facial recognition ResNet model. To follow along, download the open-source LResNet100E-IRmodel from the InsightFace Github page. After extracting the files, you should have a params file and a json file. The following tutorial has been written for Ubuntu.

MXNet to NCNN

NCNN is a high-performance neural network inference computing framework optimized for mobile platforms. NCNN is deeply considerate about deployment and uses on mobile phones from the beginning of design. NCNN does not have third party dependencies. It is cross-platform, and runs faster than all known open source frameworks on mobile phone CPU.

Download and build NCNN

Start by downloading the latest version of NCNN from their Github page. We will be using the mxnet2ncnn executable for performing the conversion. At the time of writing this article, the converter does not support the _copy operator. Therefore, we must add support for this operator using the fix discussed in this Github issue.

Navigate to ncnn/tools/mxnet and open mxnet2ncnn.cpp in your preferred editor. Add the following code to this line:

Next, build the executable by running mkdir build && cd build && cmake .. && make. You should now have an executable called mxnet2ncnn . While here, also take the time to build NCNN itself (we will require this later for performing the inference). Build instructions can be found here.

Converting the models

Move the mxnet2ncnn executable to the directory in which you extracted the MXNet model files, and run the following command: ./mxnet2ncnn model-symbol.json model-0000.params . This will produce two new files: ncnn.bin and ncnn.param.

Performing inference with NCNN in C++

The inputs to this model must be 112x112 aligned images. In this tutorial, we will use pre-aligned images which are available for download on the LFW website.

The following code demonstrates how to perform inference with NCNN:

Be sure to link the appropriate libraries and add the NCNN include files. Here is a sample CMakeLists.txt

MXNet to TVM

TVM is an open deep learning compiler stack for CPUs, GPUs, and specialized accelerators. It aims to close the gap between the productivity-focused deep learning frameworks, and the performance- or efficiency-oriented hardware backends.

Download and build TVM

Start by cloning the TVM Github repository. Move the config file to the build directory:

git clone --recursive https://github.com/dmlc/tvm
cd tvm
mkdir build
cp cmake/config.cmake build

Next, install the required dependencies:

sudo apt-get update
sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake python3-pip libxml2-dev

Install one of the LLVM pre-built binaries. At the time of this tutorial, I used LLVM 8.0.0. Once this is complete, edit the config.cmake file to point to the LLVM config:

set(USE_LLVM /path/to/your/llvm/bin/llvm-config)

Now build the TVM library:

cd build
cmake ..
make -j4

Now we will need to add the PYTHONPATH variable to tell python where it can find the library. Either export the variables directly into your shell or add them to your ~/.bashrc for the changes to persist:

export TVM_HOME=/path/to/tvm
export PYTHONPATH=$TVM_HOME/python:$TVM_HOME/topi/python:$TVM_HOME/nnvm/python:${PYTHONPATH}

Finally, install the python dependencies:

pip3 install --user numpy decorator attrs tornado psutil xgboost mxnet

Converting and auto-tuning the model

We could convert the model directly to TVM at this point. Although doing so will convert successfully, we will get the following warnings:

Cannot find config for target=llvm -mcpu=skylake, workload=('conv2d', (1, 3, 112, 112, 'float32'), (64, 3, 3, 3, 'float32'), (1, 1), (1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.

As indicated in the errors, the current configuration will cause performance regressions. We can fix this by first auto-tuning the model before compiling to TVM.

The following script will first load the models, auto-tune them, then compile them into TVM format. Note that the models must follow the naming convention model-symbol.json and model-XXXX.params . Modify the target to match that of your CPU architecture. Set the path to your model directory in the space indicated.

Running this script will generate three artifacts: tvm.json tvm.params and tvm_lib.so . Note: this can take quite some time as the tuning process is slow.

Performing inference with TVM in C++

Once again, the inputs to this model must be 112x112 aligned images. In this tutorial, we will use pre-aligned images which are available for download on the LFW website.

The following code demonstrates how to perform inference with TVM. We will have to load the tvm_lib.so at runtime.

In order to get it working correctly, we will have to build tvm_runtime_pack.cc with our executable. This file can be obtained from tvm/apps/howto_deploy/ . We will also require some of the include files from the TVM directory.

A sample CMakeLists.txt is shown below:

Final Remarks

And there you have it. We have successfully converted our MXNet model into both NCNN and TVM. These lighter weight models are ready to be integrated into resource-constrained environments without serious performance loss. Please note that the TVM model does have a slight difference in accuracy which is a result of the tuning step. We hope this tutorial is useful to engineers looking for ways to convert frameworks from training environments to real-world deployments.

Feel free to email us at support@trueface.ai with any questions or comments.