TinyYOLOv2 on onnx

Ryo Shimizu
3 min readOct 20, 2018

--

ONNX is a open model data format for deep neural networks.

You can browse and use several robust pretrained model from onnx model zoo.

Microsoft announced “ONNX Runtime” it’s seems to be easy to use with pre-trained model.

https://docs.microsoft.com/en-gb/python/api/overview/azure/onnx/intro?view=azure-onnx-py

But their sample code is terrible and nonsense.

import onnxruntime as rt
sess = rt.InferenceSession("model.onnx")
input_name = sess.get_inputs()[0].name
pred_onnx = sess.run(None, {input_name: X})

If you have a basic knowledge of Python, you can understand about this code snippet is not work.

Because nobody define the X.

I upset when saw this code!
And I angry about to Microsoft provide such a terrible code example.

So I wrote a small example for Tiny_YOLOv2 model prediction thru onnx.

This code is not a best, not a beautiful.

But work.

Preperation

Before running code, you should prepare an environment.

I prefer pyenv anaconda3–5.0.1. When I used anaconda3–5.0.0, I failed to install onnxruntime.

$ pyenv install anaconda3-5.0.1
$ pyenv local anaconda3-5.0.1
$ pip install onnxruntime

If you want to install onnxruntime-gpu, you can modify the command.

And you have to download pretrained model of Tiny_YOLOv2 from onnx model zoo.

$ wget https://www.cntk.ai/OnnxModels/tiny_yolov2/opset_1/tiny_yolov2.tar.gz
$ tar xvzf tiny_yolov2.tar.gz

And you have to download some image file for inference. and rename to “test.jpg”

When you can running this example code.

import numpy as np
import onnxruntime as rt
from PIL import Image,ImageDraw
sess = rt.InferenceSession("tiny_yolov2/model.onnx")
input_name = sess.get_inputs()[0].name
img = Image.open('test.jpg')
img = img.resize((416, 416)) #for tiny_yolov2
X = np.asarray(img)
X = X.transpose(2,0,1)
X = X.reshape(1,3,416,416)
out = sess.run(None, {input_name: X.astype(np.float32)})
out = out[0][0]
numClasses = 20
anchors = [1.08, 1.19, 3.42, 4.41, 6.63, 11.38, 9.42, 5.11, 16.62, 10.52]
def sigmoid(x, derivative=False):
return x*(1-x) if derivative else 1/(1+np.exp(-x))
def softmax(x):
scoreMatExp = np.exp(np.asarray(x))
return scoreMatExp / scoreMatExp.sum(0)
clut = [(0,0,0),(255,0,0),(255,0,255),(0,0,255),(0,255,0),(0,255,128),
(128,255,0),(128,128,0),(0,128,255),(128,0,128),
(255,0,128),(128,0,255),(255,128,128),(128,255,128),(255,255,0),
(255,128,128),(128,128,255),(255,128,128),(128,255,128)]
label = ["aeroplane","bicycle","bird","boat","bottle",
"bus","car","cat","chair","cow","diningtable",
"dog","horse","motorbike","person","pottedplant",
"sheep","sofa","train","tvmonitor"]
draw = ImageDraw.Draw(img)
for cy in range(0,13):
for cx in range(0,13):
for b in range(0,5):
channel = b*(numClasses+5)
tx = out[channel ][cy][cx]
ty = out[channel+1][cy][cx]
tw = out[channel+2][cy][cx]
th = out[channel+3][cy][cx]
tc = out[channel+4][cy][cx]
x = (float(cx) + sigmoid(tx))*32
y = (float(cy) + sigmoid(ty))*32

w = np.exp(tw) * 32 * anchors[2*b ]
h = np.exp(th) * 32 * anchors[2*b+1]

confidence = sigmoid(tc)
classes = np.zeros(numClasses)
for c in range(0,numClasses):
classes[c] = out[channel + 5 +c][cy][cx]
classes = softmax(classes)
detectedClass = classes.argmax()
if 0.5< classes[detectedClass]*confidence:
color =clut[detectedClass]
x = x - w/2
y = y - h/2
draw.line((x ,y ,x+w,y ),fill=color)
draw.line((x ,y ,x ,y+h),fill=color)
draw.line((x+w,y ,x+w,y+h),fill=color)
draw.line((x ,y+h,x+w,y+h),fill=color)
img.save("result.png")

How it works?

Basically, ONNX runtime needs create session object.
This case, we need only inference session. When you have to give a path of pretrained model.

sess = rt.InferenceSession("tiny_yolov2/model.onnx")

ONNX runtime needs a name of input tensor. So get it.

input_name = sess.get_inputs()[0].name

Okay, let’s read an image file and reshape.

img = Image.open('test.jpg')
img = img.resize((416, 416)) #for tiny_yolov2
X = np.asarray(img)
X = X.transpose(2,0,1)
X = X.reshape(1,3,416,416)

In the document of YOLO, input shape of this network is (1,3,416,416)

So I resize a image to (416,416).

The shape of np.asarray is (416,416,3). So we need to transpose shape to (2,0,1).

And then,the image data reshape to (1,3,416,416).When we ready to inferrence.

out = sess.run(None, {input_name: X.astype(np.float32)})
out = out[0][0]

Bravo, we solved a mystery of the secret parameter “X”.

It is a simple ndarray, but that type should be np.float32.

After that, we got a result of inference.

If you want to know about more, you should check out this article.

Thank you for reading.

Ryo

--

--