MLearning.ai
Published in

MLearning.ai

Testing nebullvm, the open-source AI inference accelerator, on TensorFlow, Pytorch and Hugging Face

EDIT — NEW RELEASE: nebullvm is evolving and with the new release it has become more powerful and able to accelerate models by more than 20 times!!! Nebullvm now supports more optimization techniques other than deep learning compilers (e.g., quantization, half precision, distillation, sparsity, etc.) while remaining easy to use. Learn more about the new release at https://github.com/nebuly-ai/nebullvm/releases

Photo by CHUTTERSNAP on Unsplash

You may have read on some blogs about nebullvm, the open-source library that optimizes AI models to make them faster in inference.

The real question is, does nebullvm really achieve what it claims to do? Does it really reach ~10x inference acceleration of deep learning models by just adding a few lines to your code?

Let’s test its full potential.

Intro about the open-source library

Nebullvm takes an AI model as input and accelerates it by leveraging a technology called deep learning compilers. The library outputs an optimized version of the model that performs inference 5–20 times faster, where acceleration depends strongly on the input model and the hardware on which the optimization is performed.

As stated on github, the goal of nebullvm is to let any developer benefit from deep learning (DL) compilers without having to spend tons of hours understanding, installing, testing and debugging this powerful technology.

Testing nebullvm on your models

We suggest testing the library on your AI models right away by following the installation instructions below. If you want to get a first feel of the library’s capabilities or take a look at how nebullvm can be readily implemented in an AI workflow, we have built 3 tutorials and notebooks where the library can be tested on the most popular AI frameworks TensorFlow, PyTorch and Hugging Face.

  • Notebook: Accelerate FastAI’s Resnet34 with nebullvm
  • Notebook: Accelerate PyTorch YOLO with nebullvm
  • Notebook: Accelerate Hugging Face’s GPT2 and BERT with nebullvm

Benchmarks

We have also tested nebullvm on popular AI models and hardware from leading vendors.

  • Hardware: M1 Pro, Intel Xeon, AMD EPYC
  • AI Models: EfficientNet, Resnet, SqueezeNet, Bert, GPT2

At first glance, we can observe that acceleration varies greatly across hardware-model couplings. Overall, the library provides great positive results, most ranging from 2 to 10 times speedup.

To summarize, the results are:

  • Nebullvm provides positive acceleration to non-optimized AI models
  • Early results show poorer (yet positive) performance on Hugging Face models. Support for Hugging Face has just been released and improvements will be implemented in future versions
  • Nebullvm provides a ~2–3x boost on Intel hardware. These results are most likely related to an already highly optimized implementation of PyTorch for Intel devices
  • The library provides great performances also on Apple M1 chips
  • And across all scenarios, nebullvm is very useful for its ease of use, allowing you to take advantage of deep learning compilers without having to spend hours studying, testing and debugging this technology

The table below shows the response time in milliseconds (ms) of the non-optimized model and the optimized model for the various model-hardware couplings as an average value over 100 experiments. It also displays the speedup provided by nebullvm, where speedup is defined as the response time of the optimized model over the response time of the non-optimized model.

Hardware used for the experiment

  • M1 Pro → Apple M1 Pro 16GB of RAM
  • Intel Xeon → EC2 Instance on AWS — t2.large
  • AMD EPYC → EC2 Instance on AWS — t4a.large

Remarks on pre-optimized models

Nebullvm is benchmarked against models that have not been optimized with some other accelerator. If you are already using a deep learning compiler on your models such as Apache TVM, TensorRT, or OpenVINO, it is likely that you will not get 5–20x speedup with nebullvm over your pre-optimized model.

Even in this case, nebullvm could be of great help to you for its ease of use.

More about nebullvm

Full documentation on nebullvm is provided on github. The main contributor to the library is Diego Fiori, with support from the github community.

The library quickly grew to 800+ github stars in just the first month after launch, and aims to continuously expand in performance and coverage. As mentioned on github, the library aims to become:

  • Deep learning model agnostic. Nebullvm supports all the most popular architectures such as transformers, LSTMs, CNNs and FCNs.
  • Hardware agnostic. The library now works on most CPUs and GPUs and will soon support TPUs and other deep learning-specific ASICs.
  • Framework agnostic. Nebullvm supports the most widely used frameworks (PyTorch, TensorFlow and Hugging Face) and will soon support many more.
  • Secure. Everything runs locally on your machine.
  • Easy-to-use. It takes a few lines of code to install the library and optimize your models.
  • Leveraging the best deep learning compilers. There are tons of DL compilers that optimize the way your AI models run on your hardware. It would take tons of hours for a developer to install and test them at every model deployment. The library does it for you!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store