Build ONNX Runtime from Source on Windows 10

Deep Learning Framework (ONNX Runtime) for Python and C++ on CPU and GPU

Ibrahim Soliman
ViTrox-Publication
9 min readJan 18, 2021

--

[ONNX Runtime] Build from Source on Windows (Python & C++) (CPU, GPU)

Welcome to the second tutorial on building deep learning frameworks from source. This series aims to offer a step by step guide for anyone struggling with the compilation of deep learning frameworks from source on Windows OS.

You can access our previous tutorial on Building TensorFlow from Source on Windows for C++ and Python (CPU and GPU). We have discussed some of the situations that require building DL frameworks from source and how to build TensorFlow 2 from source.

What is ONNX Runtime?

ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. It enables acceleration of machine learning inferencing across all of your deployment targets using a single set of API. ONNX Runtime automatically parses through your model to identify optimization opportunities and provides access to the best hardware acceleration available [1].

In short, ONNX Runtime is a remarkable cross-platform framework for DL model’s inference and deployment with a wide range of supported APIs, architectures, and hardware accelerators, as shown in the figure below.

ONNX Runtime supported OS, API, Architecture, and Hardware accelerator.

Supported Hardware Accelerator

ONNX Runtime is capable of accelerating ONNX models using different hardware-specific libraries (Execution Providers). ONNX Runtime also provides developers with a convenient way to integrate a new hardware-specific Execution Provider.

ONNX Runtime Execution Providers [2]
  1. Default CPU: Using MLAS + Eigen, Microsoft Linear Algebra Subprogram (MLAS) is a recently developed minimal version of BLAS library which implements an optimized version of linear algebra operations such as general matrix multiply (GEMM) in low-level languages with various processor support. Eigen is known as a high-level C++ library of template headers for linear algebra, matrix and vector operations, geometrical transformations, numerical solvers, and related algorithms [3].
  2. CUDA: It stands for Compute Unified Device Architecture which is a toolkit developed by NVidia for parallel computing and accelerating application on high performance using GPU. It is just a software layer that gives direct access and control of GPU’s instruction set for software engineers.
  3. TensorRT: It is a SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications [4].
  4. OpenVINO: It stands for Open Visual Inference and Neural network Optimization. It is a toolkit for Deep Learning model optimization and deployment developed by Intel for Intel platforms as CPU, integrated GPU, Intel Movidius VPU, and FPGAs.
  5. DirectML: Direct Machine Learning is a low-level API for hardware-accelerated machine learning primitives. It specialized in GPU acceleration using different hardware vendor as AMD, Intel, NVidia, Qualcomm.
  6. DNNL & MKL-ML: Both libraries are developed by Intel, where DNNL is Math Kernel Library for Deep Neural Networks and MKL-ML is a performance-enhancing library for accelerating deep learning frameworks on Intel architecture. Recently both libraries have been combined in a new library called oneDNN.

ONNX Runtime provides many other execution providers in preview releases as ACL (Arm Compute Library), ArmNN, AMD MIGraphX, NNAPI (Android Neural Networks API), NUPHAR (Neural-network Unified Preprocessing Heterogeneous Architecture), RKNPU (Rockchip NPU), Xilinx’s Vitis AI.

Next, the procedure of building ONNX Runtime from source on Windows 10 for Python and C++ using different hardware execution providers (Default CPU, GPU CUDA) will be discussed in detail.

Steps:

  1. Prerequisites Installation.
  2. Build ONNX Runtime Wheel for Python 3.7.
  3. Install and Test ONNX Runtime Python Wheels (CPU, CUDA).
  4. Build ONNX Runtime Shared DLL Library for C++.
  5. Install and Test ONNX Runtime C++ API (CPU, CUDA).

Step 1. Prerequisites Installation

  1. Git Installation
  2. Visual Studio 2019 Build Tools
  3. Python 3.7

The above three prerequisites installation guide can be found in our previous tutorial on Building TensorFlow from Source on Windows for C++ and Python (CPU and GPU). (Step 1, 4, 5)

4. CMake

CMake is a tool designed to build, test, and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment of your choice [5]. You can download it and start the installation process from this link. Don’t forget to add CMake to your user environment PATH.

To verify CMake installation, open CMD, and type the following command:

The expected output from the CMD will be:

CMake Installation Verification

5. CUDA & cuDNN (Required for CUDA Execution Provider)

The installation process of CUDA is quite straightforward. Install CUDA v11.0 from here. Next, install cuDNN by downloading the installer from here. Choose v8.0 for CUDA v11.0, unzip it, and move cuDNN files as following:

  1. [unzipped dir]\bin\ → C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin
  2. [unzipped dir]\include\ → C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include
  3. [unzipped dir]\lib\ → C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib

To verify the CUDA installation, open CMD, and type the following command:

The expected output from the CMD will be:

CUDA Installation Verification

Step 2. Build ONNX Runtime Wheel for Python 3.7

What is Wheel File?

A WHL file is a package saved in the Wheel format, which is the standard built-package format used for Python distributions. It contains all the files for a Python install and metadata, which includes the version of the wheel implementation and specification used to package it [6].

First, we clone ONNX Runtime source code from GitHub, open CMD, and type:

  1. Build ONNX Runtime Default CPU Python Wheel

The build instructions options include:

  • --config: Release || RelWithDebInfo || Debug
  • --use_openmp: enable OpenMP library for parallel programming in multi-processors or shared-memory processors for potential performance improvements.
  • --parallel: enable parallel processing for the building process.
  • --build_wheel: build ONNX Runtime for python.

2. Build ONNX Runtime GPU Python Wheel with CUDA Execution Provider

Additional CUDA builds instructions options include:

  • --use_cuda: enable CUDA build.
  • --cuda_home: CUDA installation Path.
  • --cudnn_home: cuDNN installtion Path.

default output path is ./build/Windows/Release/Release/dist

Congratulations, we have built the targeted 2 ONNX Runtime wheels with different execution providers, expected files as shown below:

ONNX Runtime Build Results (Python Wheels)

Step 3: Install and Test ONNX Runtime Python Wheels (CPU, GPU CUDA).

In this section, we are going to start the python wheel installation using pip and CMD, we will go through 3 steps repetitively:

  1. Install ONNX Runtime version
  2. Varify and test installation
  3. Uninstall ONNX Runtime

Standard CPU

Open CMD and install ONNX Runtime wheel

After installation, run python script below for installation verification (we will use the below script after each installation):

The expected output will be:

Installation Verification of ONNX Runtime with CPU

Uninstall ONNX Runtime:

GPU CUDA

Open CMD and install ONNX Runtime wheel

After installation, run the python verification script presented above.

The expected output will be:

Installation Verification of ONNX Runtime with CUDA

Uninstall ONNX Runtime:

Expected Error

Creating inference session from ONNX model and getting “Failed to load library, error code:126”

This error means some DLLs are missing, please make sure :

CUDA (LIB) installation directory is present in your System Environment PATH.

Step 4: Build ONNX Runtime NuGet for C++

What is NuGet?

NuGet is a package manager designed to enable developers to share reusable code. A NuGet package is a single ZIP file with the .nupkg extension that contains compiled code (DLLs), other files related to that code (headers), and a descriptive manifest that includes information like the package’s version number [7].

Firstly, we delete the build directory from ONNX Runtime directory to clean our previous Python builds.

Secondly, we are going to use the same building command as in STEP 2 but replacing --build_wheel by --build_nuget , adding --skip_tests and using Visual Studio 2019 by adding --cmake_generator “Visual Studio 16 2019”

  1. Build ONNX Runtime Default CPU NuGet Package

2. Build ONNX Runtime GPU NuGet Package with CUDA Execution Provider

default output path is ./build/Windows/Release/Release/nuget-artifacts

Congratulations, we have built the targeted 2 ONNX Runtime NuGet packages with different execution providers, expected files as shown below:

ONNX Runtime Build Results (NuGet Packages)

How do we get 4 NuGet packages here?

ONNX Runtime builds native and managed packages from each version, so what are the differences between Managed and Native Packages?

Managed vs Native Code

Managed code is the code whose memory is free and allocated automatically and has auto garbage collection and other goodies.

Native code is the code whose memory is not “managed” such as memory isn’t freed automatically (C++’ delete and C’s free). There is no reference counting and no garbage collection.

We use Managed package for .Net applications and we use Native package for native C and C++ applications.

Step 5: Install and Test ONNX Runtime C++ API (CPU, CUDA)

We are going to use Visual Studio 2019 for this testing. I create a C++ Console Application.

Step1. Manage NuGet Packages in your Solution.

Installing and managing NuGet packages, by right click on your Solution and open Manage NuGet Packages, then add ./build/Windows/Release/Release/nuget-artifactsdirectory as a new Package Source.

Step2. Add New NuGet Packages Path.

Now, we can start installing each native package separately, run our testing script, and uninstall it back.

Step3. Install ONNX Runtime Package for your solution.

I prepared a simple C++ script to load the MobileNet V2 ONNX model:

Firstly, including onnxruntime_cxx_api.h is required to use ONNX Runtime C++ API. Next, I defined a preprocessor USE_CPU as a macro-name, you can change the macro-name according to ONNXRuntime execution provider installed, choices: USE_CUDA USE_CPU. Execution Provider header will be included automatically following your macro name. The below script will only load ONNX Runtime, its corresponding provider, and MobileNet V2 model to validate our installation process.

You may face some problems related to Included Headers, LIB, and DLLs because sometimes NeGut packages don’t include Header and DLLs automatically. Please add headers and LIBs following the below steps:

  1. Specify in project properties -> C++ -> Additional include directorieswhere library headers are.

2. Specify in properties->linker the path where libraries (.lib) are located and the name of the library. With this Visual Studio is able to link the project properly.

3. Copy (.dll) files from <application>\<package_path>\runtimes\win-x64\native to <application>\x64\<Config_name>\ or any directory that is added to the environment PATH.

Finally, we have finished building ONNX Runtime from source with different execution providers as (Default CPU, CUDA).

Thanks for your reading and stay tuned for my next articles…

References

  1. https://www.onnxruntime.ai/about.html
  2. https://github.com/microsoft/onnxruntime/tree/master/docs/execution_providers
  3. https://en.wikipedia.org/wiki/Eigen_(C%2B%2B_library)
  4. https://developer.nvidia.com/tensorrt
  5. https://cmake.org/
  6. https://fileinfo.com/extension/whl
  7. https://docs.microsoft.com/en-us/nuget/what-is-nuget

--

--