Fusing Data Transformations with Neural Network for Faster Inference and Simpler Production Deployments

Published in

Apache MXNet

5 min readApr 11, 2019

Authors: Sandeep Krishnamurthy, Jake Lee

Data pre-processing is a common step before feeding the data for model training and inference. Gluon, MXNet’s imperative Python API, is most commonly used for training the model. Gluon provides you with an easy to use data transformation APIs that hide the difficulty of doing asynchronous data pre-processing during training. However, when deploying the model in to production, you are still required to perform these data pre-processing steps. Often, you end up re-writing the data pre-processing operations in your inference code. It gets even harder and time-consuming if you choose a non-Python language binding of MXNet in your inference code, such as the MXNet Java Inference APIs.

Fig 1 — Traditional model v/s End-to-end models (credits — Thomas Delteil)

What if we had an end-to-end model that includes the data pre-processing, neural network and data post-processing, all part of the saved model files? It would give you an ideal experience of having inference code that just loads the saved model and runs inference with the raw data without having to re-write any of the pre-processing and post-processing operations!

In this blog post, we will be showing you how to export such an end-to-end model from MXNet Gluon and run inference with MXNet Java Predictor APIs. Apart from simplifying the inference logic, we will also see from the benchmarks, that an end-to-end models speeds up inference by up to 2X on a GPU machine and ~20% on CPUs.

Exporting an end-to-end model from Gluon

MXNet Gluon provides a collection of pre-trained, state-of-the-art models via Gluon Model Zoo. In this blog post, let us use the Gluon’s built-in model zoo APIs to download and use a pre-trained (on ImageNet dataset), ResNet18_v1 model.

“Resize”, “ToTensor”, “Normalize” are the most common pre-processing operations performed on the image data. Let us fuse these data pre-processing pipeline with the ResNet18_v1 model and export an end-to-end model.

NOTE: For your convenience, we have already exported the end-to-end model and hosted on S3. You can skip the above step and download the hosted end-to-end model directly.

wget https://s3.us-east-2.amazonaws.com/mxnet-public/end_to_end_models/resnet18_v1_end_to_end-symbol.jsonwget https://s3.us-east-2.amazonaws.com/mxnet-public/end_to_end_models/resnet18_v1_end_to_end-0000.params

Inference with MXNet Java Predictor

Let us download the Synset file, with list of class names, and a sample image for doing a prediction in the next step.

wget https://s3.us-east-2.amazonaws.com/scala-infer-models/resnet-18/synset.txtwget https://s3.amazonaws.com/model-server/inputs/Pug-Cookie.jpg

Next, let us use MXNet’s Java Predictor to load the end-to-end model, we exported in the previous step, and run inference on a demo image.

In the code below, observe that, we do not do any data pre-processing operations, we directly read the image and pass it on to the end-to-end model and expect it to handle the data pre-processing followed by the neural network prediction.

NOTE: If you are new to MXNet’s Java APIs, I recommend you to follow this introductory blogpost on MXNet’s Java APIs covering setup and APIs available for doing inference with MXNet’s Java binding.

Output of the Prediction is an NDArray with the probabilities of different classes. Below, we provide a utility function to convert the probabilities to class names. We read the class names from the Synset file.

Sample Image

Output Prediction

Class: n02110958 pug, pug-dog

Performance Benchmarks

Apart from simplifying the model deployment, we also observed that with an end-to-end model, we gain a noticeable performance improvements (~20%) when running inference on CPUs and a significant performance boost (~50%) when running inference on GPUs as compared to running the data pre-processing and neural network prediction separately as we commonly do today. To benchmark, we ran inference with a synthetic data on a non-end-to-end and an end-to-end pre-trained ResNet-18 model. Below is the summary of the benchmarks.

MXNet’s Python Module APIs were used to run the benchmarks. We also ran the inference benchmarks with MXNet Scala and Java inference APIs and observed similar performance boost by fusing data transformations as part of the model. You can find all the scripts and resources used for the below benchmarks here — awslabs/deeplearning-benchmark

Conclusion

In this blog post, we saw how to create an end-to-end model, in MXNet Gluon, by fusing the data pre-processing with the neural network. Followed by using MXNet’s Java Predictor API for doing prediction on the end-to-end model. We observed that with an end-to-end model, we simplify the inference code as we no longer rewrite the data pre-processing operations during inference. We also saw in the benchmark results, by fusing data pre-processing within the model, we gain a significant performance boost during inference on both CPU and GPU.

What Next?

At this point of time, MXNet only supports data pre-processing operators for image classification use cases only. With the promising results we observed, there are multiple avenues of future development — object detection use cases, pre-processing operators for text use cases and more… Stay tuned…

Apache MXNet is an open source project, all your contributions are very welcome! See MXNet community page for more details on how you can become part of MXNet community.

Thanks to Aaron Markham, Hagay Lupesko, Thomas Delteil, Vishaal Kapoor

References

Apache MXNet Github Repository — https://github.com/apache/incubator-mxnet/
Benchmark Scripts — https://github.com/awslabs/deeplearning-benchmark/tree/master/end_to_end_model_benchmark
Gluon-CV Model Export Utility — https://gluon-cv.mxnet.io/api/utils.html?highlight=export%20block#gluoncv.utils.export_block
Image Classification with MXNet Scala Inference API s— https://medium.com/apache-mxnet/image-classification-with-mxnet-scala-inference-api-8ab6ce1bbccf
MXNet Java Inference APIs — https://medium.com/apache-mxnet/introducing-java-apis-for-deep-learning-inference-with-apache-mxnet-8406a698fa5a
AWS C5.2X instance details — https://aws.amazon.com/ec2/instance-types/c5/
AWS P3.2X instance details — https://aws.amazon.com/ec2/instance-types/p3/