Build ONNX Runtime from Source on Windows 10
Deep Learning Framework (ONNX Runtime) for Python and C++ on CPU and GPU
Welcome to the second tutorial on building deep learning frameworks from source. This series aims to offer a step by step guide for anyone struggling with the compilation of deep learning frameworks from source on Windows OS.
You can access our previous tutorial on Building TensorFlow from Source on Windows for C++ and Python (CPU and GPU). We have discussed some of the situations that require building DL frameworks from source and how to build TensorFlow 2 from source.
What is ONNX Runtime?
ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. It enables acceleration of machine learning inferencing across all of your deployment targets using a single set of API. ONNX Runtime automatically parses through your model to identify optimization opportunities and provides access to the best hardware acceleration available [1].
In short, ONNX Runtime is a remarkable cross-platform framework for DL model’s inference and deployment with a wide range of supported APIs, architectures, and hardware accelerators, as shown in the figure below.
Supported Hardware Accelerator
ONNX Runtime is capable of accelerating ONNX models using different hardware-specific libraries (Execution Providers). ONNX Runtime also provides developers with a convenient way to integrate a new hardware-specific Execution Provider.
- Default CPU: Using MLAS + Eigen, Microsoft Linear Algebra Subprogram (MLAS) is a recently developed minimal version of BLAS library which implements an optimized version of linear algebra operations such as general matrix multiply (GEMM) in low-level languages with various processor support. Eigen is known as a high-level C++ library of template headers for linear algebra, matrix and vector operations, geometrical transformations, numerical solvers, and related algorithms [3].
- CUDA: It stands for Compute Unified Device Architecture which is a toolkit developed by NVidia for parallel computing and accelerating application on high performance using GPU. It is just a software layer that gives direct access and control of GPU’s instruction set for software engineers.
- TensorRT: It is a SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications [4].
- OpenVINO: It stands for Open Visual Inference and Neural network Optimization. It is a toolkit for Deep Learning model optimization and deployment developed by Intel for Intel platforms as CPU, integrated GPU, Intel Movidius VPU, and FPGAs.
- DirectML: Direct Machine Learning is a low-level API for hardware-accelerated machine learning primitives. It specialized in GPU acceleration using different hardware vendor as AMD, Intel, NVidia, Qualcomm.
- DNNL & MKL-ML: Both libraries are developed by Intel, where DNNL is Math Kernel Library for Deep Neural Networks and MKL-ML is a performance-enhancing library for accelerating deep learning frameworks on Intel architecture. Recently both libraries have been combined in a new library called oneDNN.
ONNX Runtime provides many other execution providers in preview releases as ACL (Arm Compute Library), ArmNN, AMD MIGraphX, NNAPI (Android Neural Networks API), NUPHAR (Neural-network Unified Preprocessing Heterogeneous Architecture), RKNPU (Rockchip NPU), Xilinx’s Vitis AI.
Next, the procedure of building ONNX Runtime from source on Windows 10 for Python and C++ using different hardware execution providers (Default CPU, GPU CUDA) will be discussed in detail.
Steps:
- Prerequisites Installation.
- Build ONNX Runtime Wheel for Python 3.7.
- Install and Test ONNX Runtime Python Wheels (CPU, CUDA).
- Build ONNX Runtime Shared DLL Library for C++.
- Install and Test ONNX Runtime C++ API (CPU, CUDA).
Step 1. Prerequisites Installation
- Git Installation
- Visual Studio 2019 Build Tools
- Python 3.7
The above three prerequisites installation guide can be found in our previous tutorial on Building TensorFlow from Source on Windows for C++ and Python (CPU and GPU). (Step 1, 4, 5)
4. CMake
CMake is a tool designed to build, test, and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment of your choice [5]. You can download it and start the installation process from this link. Don’t forget to add CMake to your user environment PATH.
To verify CMake installation, open CMD, and type the following command:
cmake --version
The expected output from the CMD will be:
5. CUDA & cuDNN (Required for CUDA Execution Provider)
The installation process of CUDA is quite straightforward. Install CUDA v11.0 from here. Next, install cuDNN by downloading the installer from here. Choose v8.0 for CUDA v11.0, unzip it, and move cuDNN files as following:
- [unzipped dir]\bin\ → C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin
- [unzipped dir]\include\ → C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include
- [unzipped dir]\lib\ → C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib
To verify the CUDA installation, open CMD, and type the following command:
nvcc --version
The expected output from the CMD will be:
Step 2. Build ONNX Runtime Wheel for Python 3.7
What is Wheel File?
A WHL file is a package saved in the Wheel format, which is the standard built-package format used for Python distributions. It contains all the files for a Python install and metadata, which includes the version of the wheel implementation and specification used to package it [6].
First, we clone ONNX Runtime source code from GitHub, open CMD, and type:
git clone --recursive https://github.com/Microsoft/onnxruntime
cd onnxruntime
git checkout v1.6.0
- Build ONNX Runtime Default CPU Python Wheel
The build instructions options include:
- --config: Release || RelWithDebInfo || Debug
- --use_openmp: enable OpenMP library for parallel programming in multi-processors or shared-memory processors for potential performance improvements.
- --parallel: enable parallel processing for the building process.
- --build_wheel: build ONNX Runtime for python.
.\build.bat --config Release --build_wheel --parallel --use_openmp
2. Build ONNX Runtime GPU Python Wheel with CUDA Execution Provider
Additional CUDA builds instructions options include:
- --use_cuda: enable CUDA build.
- --cuda_home: CUDA installation Path.
- --cudnn_home: cuDNN installtion Path.
.\build.bat --config Release --build_wheel --parallel --use_openmp ---use_cuda --cudnn_home “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0” --cuda_home “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0”
default output path is ./build/Windows/Release/Release/dist
Congratulations, we have built the targeted 2 ONNX Runtime wheels with different execution providers, expected files as shown below:
Step 3: Install and Test ONNX Runtime Python Wheels (CPU, GPU CUDA).
In this section, we are going to start the python wheel installation using pip and CMD, we will go through 3 steps repetitively:
- Install ONNX Runtime version
- Varify and test installation
- Uninstall ONNX Runtime
Standard CPU
Open CMD and install ONNX Runtime wheel
python -m pip install .\onnxruntime\build\Windows\Release\Release\dist\onnxruntime-1.6.0-cp37-cp37m-win_amd64.whl
After installation, run python script below for installation verification (we will use the below script after each installation):
The expected output will be:
Uninstall ONNX Runtime:
python -m pip uninstall onnxruntime
GPU CUDA
Open CMD and install ONNX Runtime wheel
python -m pip install .\onnxruntime\build\Windows\Release\Release\dist\onnxruntime_gpu-1.6.0-cp37-cp37m-win_amd64.whl
After installation, run the python verification script presented above.
The expected output will be:
Uninstall ONNX Runtime:
python -m pip uninstall onnxruntime-gpu
Expected Error
Creating inference session from ONNX model and getting “Failed to load library, error code:126”
sess = onnxruntime.InferenceSession('D:\MobileNetV2.onnx')
Error: 2020-12-28 13:52:52.2441651 [E:onnxruntime:Default, provider_bridge_ort.cc:662 onnxruntime::ProviderLibrary::Get] Failed to load library, error code: 126
Segmentation fault
This error means some DLLs are missing, please make sure :
CUDA (LIB) installation directory is present in your System Environment PATH.
Step 4: Build ONNX Runtime NuGet for C++
What is NuGet?
NuGet is a package manager designed to enable developers to share reusable code. A NuGet package is a single ZIP file with the .nupkg
extension that contains compiled code (DLLs), other files related to that code (headers), and a descriptive manifest that includes information like the package’s version number [7].
Firstly, we delete the build directory from ONNX Runtime directory to clean our previous Python builds.
Secondly, we are going to use the same building command as in STEP 2 but replacing --build_wheel
by --build_nuget
, adding --skip_tests
and using Visual Studio 2019 by adding --cmake_generator “Visual Studio 16 2019”
- Build ONNX Runtime Default CPU NuGet Package
.\build.bat --config Release --build_nuget --parallel --use_openmp --skip_tests --cmake_generator "Visual Studio 16 2019"
2. Build ONNX Runtime GPU NuGet Package with CUDA Execution Provider
.\build.bat --config Release --build_nuget --parallel --use_openmp --skip_tests --cmake_generator "Visual Studio 16 2019" ---use_cuda --cudnn_home “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0” --cuda_home “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0”
default output path is ./build/Windows/Release/Release/nuget-artifacts
Congratulations, we have built the targeted 2 ONNX Runtime NuGet packages with different execution providers, expected files as shown below:
How do we get 4 NuGet packages here?
ONNX Runtime builds native and managed packages from each version, so what are the differences between Managed and Native Packages?
Managed vs Native Code
Managed code is the code whose memory is free and allocated automatically and has auto garbage collection and other goodies.
Native code is the code whose memory is not “managed” such as memory isn’t freed automatically (C++’ delete and C’s free). There is no reference counting and no garbage collection.
We use Managed package for .Net applications and we use Native package for native C and C++ applications.
Step 5: Install and Test ONNX Runtime C++ API (CPU, CUDA)
We are going to use Visual Studio 2019 for this testing. I create a C++ Console Application.
Installing and managing NuGet packages, by right click on your Solution and open Manage NuGet Packages, then add ./build/Windows/Release/Release/nuget-artifacts
directory as a new Package Source.
Now, we can start installing each native package separately, run our testing script, and uninstall it back.
I prepared a simple C++ script to load the MobileNet V2 ONNX model:
Firstly, including onnxruntime_cxx_api.h
is required to use ONNX Runtime C++ API. Next, I defined a preprocessor USE_CPU
as a macro-name, you can change the macro-name according to ONNXRuntime execution provider installed, choices: USE_CUDA
USE_CPU
. Execution Provider header will be included automatically following your macro name. The below script will only load ONNX Runtime, its corresponding provider, and MobileNet V2 model to validate our installation process.
You may face some problems related to Included Headers, LIB, and DLLs because sometimes NeGut packages don’t include Header and DLLs automatically. Please add headers and LIBs following the below steps:
- Specify in
project properties -> C++ -> Additional include directories
where library headers are.
2. Specify in properties->linker
the path where libraries (.lib) are located and the name of the library. With this Visual Studio is able to link the project properly.
3. Copy (.dll) files from <application>\<package_path>\runtimes\win-x64\native
to <application>\x64\<Config_name>\
or any directory that is added to the environment PATH.
Finally, we have finished building ONNX Runtime from source with different execution providers as (Default CPU, CUDA).
Thanks for your reading and stay tuned for my next articles…
References
- https://www.onnxruntime.ai/about.html
- https://github.com/microsoft/onnxruntime/tree/master/docs/execution_providers
- https://en.wikipedia.org/wiki/Eigen_(C%2B%2B_library)
- https://developer.nvidia.com/tensorrt
- https://cmake.org/
- https://fileinfo.com/extension/whl
- https://docs.microsoft.com/en-us/nuget/what-is-nuget