Face Recognition with Arcface with TensorRT

楊亮魯
2 min readSep 25, 2019

--

in the past post Face Recognition with Arcface on Nvidia Jetson Nano

I fail to run the TensorRT inference on jetson Nano, due to Prelu not supported for TensorRT 5.1.

But, the Prelu (channel-wise) operator is ready for tensorRT 6.0!

pre-requirement: make sure you can run the following line

docker run --rm --gpus all nvcr.io/nvidia/tensorrt:19.09-py3 nvidia-smi

There are two parts in this article

  • start container, build the arcface TensorRT engine
  • run the inference

Run the Container and build arcface TensorRT Engine

# bash
git clone https://github.com/penolove/insightface.git -b eyeWitnessWrapper-with-tensorrt-example
cd insightface;
# download the arcface mdoel from https://github.com/onnx/models/tree/master/vision/body_analysis/arcface
wget
https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx
# start container
docker run \
--gpus all \
-v $PWD/:/insightface/ \
-ti nvcr.io/nvidia/tensorrt:19.09-py3 /bin/bash
# in side container install pre-requirements
cd insightface/
pip install -r requirements.txt
apt-get install -y libsm6 libxrender1 libxext-dev
pip install mxnet-cu101 # just for compare speed

# Now let’s convert the downloaded onnx model into tensorrt engine arcface_trt.engine

# python
import os
import tensorrt as trt
batch_size = 1
TRT_LOGGER = trt.Logger()
def build_engine_onnx(model_file):
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = 1 << 30
builder.max_batch_size = batch_size
# Load the Onnx model and parse it in order to populate the TensorRT network.
with open(model_file, 'rb') as model:
parser.parse(model.read())
return builder.build_cuda_engine(network)

# downloaded the arcface mdoel
onnx_file_path = './resnet100.onnx'

engine = build_engine_onnx(onnx_file_path)
engine_file_path = './arcface_trt.engine'
with open(engine_file_path, "wb") as f:
f.write(engine.serialize())

Inference With Trt Engine and compare speed mxnet one

# inference with trt engine
python naive_detector.py --is_trt_engine --model arcface_trt.engine
# inference with the original mxnet model
# original mxnet model can be downloaded from
# https://github.com/deepinsight/insightface/wiki/Model-Zoo
python naive_detector.py

registered faces:

from left to right labeled 1~5
the left one is original mxnet model, the right one is the trt engine

inference 1000 times 5 faces on my gtx-1070 takes:

  • Trt Engine : 38s with batch_size = 1
  • Trt Engine : 22s with batch_size = 5
  • mxnet: ~60s with batch_size =1
  • mxnet: ~29s with batch_size =5

--

--