人臉辨識, Acrface, Jetson Nano, Face Recognition with Arcface on Nvidia Jetson Nano (tensorrt 6.0)

simple naive demo with 183 club image

4 min readJul 18, 2019

Arcface with Mxnet

Pre-requirements

make sure you already setup swap at least 4GB
install mxnet and some require packages

wget https://s3.us-east-2.amazonaws.com/mxnet-public/install/jetson/1.4.0/mxnet-1.4.0-cp36-cp36m-linux_aarch64.whl# make sure you already have install pip
sudo pip3 install mxnet-1.4.0-cp36-cp36m-linux_aarch64.whl
sudo pip3 install cython

in order to perform a simple demo, I frok a repo from the arcface repo and wrapper module with eyewitness.

 export CLONE_TAG=eyeWitnessWrapper
git clone -b ${CLONE_TAG} https://github.com/penolove/insightface# please comment out the opencv-python library in requirements.txt, since cv2 already installed on the nano
pip3 install -r requirements.txt

run a detection example with already chosen faces

# detect faces with mtcnn, recognize faces with arcfaces
python3 naive_detector.py --gpu 0

[Training] The pre-chosen faces, from left to right labeled as 1~5

[Testing] Recognized faces, which shows the exactly correct result.

Workaround: Using Tensorrt instead of Mxnet (failure)

no matter using the backbone mobilenet /resnet used in arcface, there is a layer called prelu, which is not fully supported in tensorrt (the implementation can be found here).

the current prelu in tensorrt is not yet supported channel-wise prelu.

alternative ways to solve this is using relu like:

prelu(x) = relu(x) - w * relu(-x)

but !!! the broadcasting multiplication is also not supported in tensorrt.

I give up. (updated at 2019-07-18)

the comments in the following issue said in the next tensorrt version will supports prelu, I will continue to finish is article until then.

PReLU layer without channel sharing · Issue #12 · NVIDIA/TensorRT

There is an NvPlugin for PReLU and its creator is defined as TENSORRTAPI INvPlugin* createPReLUPlugin(float negSlope)…

github.com

Let’s TensorRT 6.0!!! (updated at 2019–12–25)

since the jetpack 4.3 was released at 2019–12–21, it supports tensorrt 6.0 for jetson nano, actually, this section just replays this article, also though even the articles were written in English, most of the audience were mandarin reader, so I will write the article both in mandarin and English.

The following steps were tested with jetpack 4.3.

[optional] Step 0: build docker image, 建image阿發呆阿

swap設一下

# edit set_swap.sh
fallocate -l 4G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
swapon -showchmod +x ./set_swap.sh
sudo ./set_swap.sh

Dockerfile: it will takes lots of time to compile mxnet. mxnet 編譯有夠久

# Dockerfile
FROM nvcr.io/nvidia/l4t-base:r32.3.1RUN apt-get updateRUN apt-get install -y git python3-pip cmake protobuf-compiler libprotoc-dev libopenblas-dev gfortran libjpeg8-dev libxslt1-dev libfreetype6-dev python3-pybind11 libatlas-base-dev libfreetype6-dev graphviz build-essential python3-opencv python3-scipy libopencv-dev# requirement for insightface
RUN git clone https://github.com/penolove/insightface.git -b eyeWitnessWrapper-with-tensorrt-exampleWORKDIR /insightface
RUN pip3 install Cython pycuda==2019.1
ENV DEBIAN_FRONTEND=noninteractive
RUN sed -i "s/opencv-python/#opencv-python/g" requirements.txt
RUN sed -i "s/scipy/#scipy/g" requirements.txt
RUN pip3 install -r requirements.txt
RUN wget https://s3.us-east-2.amazonaws.com/mxnet-public/install/jetson/1.4.0/mxnet-1.4.0-cp36-cp36m-linux_aarch64.whl
RUN pip3 install mxnet-1.4.0-cp36-cp36m-linux_aarch64.whl# compile mxnet without CUDA, due to the image not installed cuda, if needed, you can download CUDA in image and compile by yourself.
WORKDIR /
RUN git clone https://github.com/apache/incubator-mxnet.git --branch v1.4.x --recursive
RUN apt-get install -y
WORKDIR incubator-mxnet/
RUN cp make/config.mk .
RUN make -j2
WORKDIR /incubator-mxnet/python
RUN python3 setup.py install
RUN echo "export PYTHONPATH=/incubator-mxnet/python:$PYTHONPATH" >> ~/.bashrc
WORKDIR /insightface

Step 1: pull/start container, 拉/啟 container

mount the tensorrt python library from your nano to container

docker run --runtime nvidia -it -v /usr/lib/python3.6/dist-packages/tensorrt:/usr/lib/python3.6/dist-packages/tensorrt penolove/jetson_nano_arcface /bin/bash

Step 2: replay this article above, 該抓的該跑的套弄一下R

download arcface pretrained model

cd /insightface
# download the arcface model from https://github.com/onnx/models/tree/master/vision/body_analysis/arcface
wget https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx

python script for build tensorrt engine, 建 arcface tensorrt Engine

# edit /insightface/build_engine.py
import os
import tensorrt as trt
batch_size = 1
TRT_LOGGER = trt.Logger()
def build_engine_onnx(model_file):
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_workspace_size = 1 << 28
        builder.max_batch_size = batch_size
        # Load the Onnx model and parse it in order to populate the TensorRT network.        
        with open(model_file, 'rb') as model:
            parser.parse(model.read())
        return builder.build_cuda_engine(network)
        
# downloaded the arcface mdoel
onnx_file_path = './resnet100.onnx'
    
engine = build_engine_onnx(onnx_file_path)
engine_file_path = './arcface_trt.engine'
with open(engine_file_path, "wb") as f:
    f.write(engine.serialize())

Step 3: Inference, 推論

# inference with trt engine
python3 naive_detector.py --is_trt_engine --model arcface_trt.engine --gpu -1# ...Predicting image with classifier
# ...Predicting image with classifier done in 0:00:00.677746
# output can be found in detected_image/183club/drawn_image_2.jpg

notice that, the — gpu -1 option is for mxnet, since the mxnet here were compiled without cuda, if you need to compare with mxnet gpu version, please re-build the docker image above(makes mxnet with cuda/cudnn support), and the mxnet script and model can be found in this article.

注意到, gpu -1 那個參數只給mxnet用的, 實際上tesnorrt 還是會使用到gpu, 這邊只是我的懶惰造就 interface 很混淆, 不過我也懶的修正.
如果你想要比較mxnet gpu vs. tensorrt gpu 的速度差異你可以考慮重新build 上面的image(需要自己灌cuda/cudnn), 跟重compile mxnet with cuda/cudnn.