Analytics Vidhya
Published in

Analytics Vidhya

How to Convert Your Keras Model to ONNX

Converting your Keras model to ONNX format with some explanation

Stuck behind the paywall? Click this friend link to access :)


I love Keras for its simplicity. With about 10 minutes, I can build a deep learning model with its sequential or functional API with elegant code. However, Keras always loads its model in a very slow speed. Moreover, I have to use another deep learning framework because of system constrains or solely my boss tells me to use that framework. Though I can convert my Keras model to other frameworks with other guy’s script, I still convert my model to ONNX for trying its claimed interoprability in the AI tools.

What is ONNX?

ONNX is an abbreviation of “Open Neural Network Exchange”. The goal of ONNX is to become an open format to represent deep learning models so that we can move model between frameworks in ease, and it is created by Facebook and Microsoft.

Converting Your Keras Model to ONNX

  1. Download the example code from my GitHub
  2. Download the pre-trained weight from here
  3. Type the following commands to set up
$ conda create -n keras2onnx-example python=3.6 pip
$ conda activate keras2onnx-example
$ pip install -r requirements.txt

4. Run this command to convert the pre-trained Keras model to ONNX

$ python

5. Run this command to inference with ONNX runtime

$ python 3_001_0.bmp

It should output the following messages in the end:

1.000 3
0.000 37
0.000 42
0.000 14
0.000 17

Some Explanations converts a Keras .h5 model to ONNX format, i.e., .onnx. The code of it is shown below:

There are some points for converting Keras model to ONNX:

  1. Remember to import onnx and keras2onnx packages.
  2. keras2onnx.convert_keras() function converts the keras model to ONNX object.
  3. onnx.save_model() function is to save the ONNX object into .onnx file. inferences fish image using ONNX model. And I paste the code in here:

and there are some outlines when inferencing:

  1. Remember import onnxruntime package.
  2. onnxruntime.InferenceSession() function loads ONNX model.
  3. run() function in line 34 predicts the image and return the predicted result. What’s more,, feed)[0] returns the 0-th elemnent as a numpy matrix.
  4. np.squeeze(pred_onnx) squeezes the numpy matrix to numpy vector, i.e., remove the 0-th dimension, so that we can get the probabilities of each class.

Inference Time

Total Inference Time (load model + inference)

Some reports say ONNX run faster both mode loading time and inferencing time. Hence, I make an inference time experiment on my laptop in this section.

The hardware for this experiment is shown below:

  • CPU: Core i5–3230M
  • RAM: 16GB

The software and packages are shown here:

  • OS: CentOS 7.6
  • Programming language: Python 3.6
  • Packages
  • keras version 2.2.4
  • tensorflow version 1.13.1
  • onnxruntime

Each inferencing runs three times to eliminate the error of other factors, e.g., context switching.

For inferencing with Keras, my computer runs with the following results:

$ time python 3_001_0.bmp
# run for three times
real 0m37.801s
user 0m37.254s
sys 0m1.590s
real 0m35.558s
user 0m35.838s
sys 0m1.362s
real 0m36.444s
user 0m36.542s
sys 0m1.418s

The inferencing time is about (37.081+35.58+36.444)/3 = 36.37 seconds (round to the second digit after decimal point).

As a contrary, inferencing with ONNX runtime shows:

$ time python
# run three times
real 0m2.576s
user 0m2.919s
sys 0m0.759s
real 0m2.530s
user 0m2.931s
sys 0m0.700s
real 0m2.560s
user 0m2.944s
sys 0m0.710s

Whoa! What a huge improvement! The inferencing tims is about (2.576+2.530+2.560)/3 = 2.56 seconds.

The code infercing with Keras can be found on my GitHub repo.

Inference Time (only inference)

Edit: One of my friend said I should test only inference time between Keras and ONNX because we load model once in practice. As a result, I will test only inference time between Keras and ONNX, and I will split to two parts:

1. Keras (with TensorFlow installed by pip) v.s. ONNX

2. Keras (with TensorFlow installed by conda) v.s. ONNX

Of course, I write in order to make comparison test which is shown below:

You can see I run each inference method with 10 times and take the average time, and I run three times for easing the error.

Keras (with TensorFlow installed by pip) v.s. ONNX

The comparison is shown below:

$ python
Keras inferences with 0.8759469270706177 second in average
ONNX inferences with 0.3100883007049561 second in average
Keras inferences with 0.8891681671142578 second in average
ONNX inferences with 0.313812255859375 second in average
Keras inferences with 0.9052883148193359 second in average
ONNX inferences with 0.3306725025177002 second in average

We find that Keras inference needs (0.88+0.87+0.91)/3 = 0.87 seconds, while ONNX inference need (0.31+0.31+0.33)/3 = 0.32 seconds. That’s a speedup ratio of 0.87/0.32 = 2.72x between ONNX and Keras.

Keras (with TensorFlow installed by conda) v.s. ONNX

Wait a minute! pip install tensorflow installs TensorFlow without optimization of Intel’s processor. So let’s remove TensorFlow first then install it via conda (the version I install is 1.13.1).

Then run again:

$ python
Keras inferences with 0.9810404300689697 second in average
ONNX inferences with 0.604683232307434 second in average
Keras inferences with 0.8862279415130615 second in average
ONNX inferences with 0.6059059381484986 second in average
Keras inferences with 0.9496192932128906 second in average
ONNX inferences with 0.5927849292755127 second in average

We find Keras takes (0.98+0.89+0.95)/3 = 0.94 seconds to inference. Compared to ONNX, it spend (0.60+0.61+0.59)/3 = 0.6 seconds for inferencing. That’s a speedup of 0.94/0.6 = 1.57x. Interestingly, both Keras and ONNX become slower after install TensorFlow via conda.


In this post, I make an introduction of ONNX and show how to convert your Keras model to ONNX model. I also demonstrate how to make prediction with ONNX model. Hope you enjoy this post!

If you have any thoughts and questions to share, please contact me at clh960524[at] Also, you can check my GitHub repositories for other works. If you are, like me passionate about machine learning, image processing and parallel computing, feel free to add me on LinkedIn.




Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Recommended from Medium

Data Annotation Using Active Learning With Python Code

Detecting and modeling outliers with PyOD

Part 1-Sinhala Language based Plagiarism Detection in Natural Language Processing

“AI Based COVID Social Distance Monitoring System”: cost effective & easy deployable approach

Recreating keras functional api with PyTorch

Multi-GPU Framework Comparisons

Comparing Classification Algorithms — Multinomial Naive Bayes vs. Logistic Regression

Deep Session Learning for Cyber Security

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lu-Hsuan Chen

Lu-Hsuan Chen

Enthusiastic of image processing, machine learning, and parallel computing. Current status: beggar on the street.

More from Medium

Machine Learning Model Stacking in Python

Anomaly detection using Isolation Forest and Local Outlier Factor

Introduction of Graph Convolutional Network (GCN) & Quick Implementation