OrangePi AiPro: review and guide

9 min readJul 3, 2024

I don’t know the right name to call a review about this board. The most Chinese board? The most mysterious? The most controversial?
Anyway — it’s one of the most interesting!

My current top Edge boards for Computer Vision / LLM are probably like this:

Jetson Orin (Nano is weak, but the others are ok)
Intel-based (don’t care what)
Hailo / RochChip (I don’t know which is better)

The board we will discuss can probably compete for the fifth position (most likely with Qualcomm). But does it want to, especially outside of China?

Let’s talk about a Huawei-based board — the OrangePi Ai Pro. The company is banned in half of the world. But it’s funded by Chinese taxpayers’ money. And it makes decent products!

The image is from here. 2020, now it’s a little bit worth/better.

The board is produced only for the Chinese market and is a mystery to those who can’t speak Chinese. However, unlike ordinary Chinese boards, it is not terrible — well, almost.

Disclaimer

No one sent me the board. My hobby is getting weird boards, checking how well ML works on them, and then dragging them into production. I have worked on Blackfins and other exotic boards in the last 15 years. It seems like there were almost pioneering Jetsons in 2014 and RockChips for Computer Vision about 3–4 years ago.

I make videos about all this stuff on my YouTube and write articles here.

I will rely on many other boards in this comparison. My opinion on many of them can be found in this 2022 review.

Purchase

The board is obviously not officially shipped outside of China, but you can get it. Aliexpress. Some time ago, there was a lot of confusion with OrangePi 5 pro, but now every description is correct.

Chipset

The board is based on the Ascend processor (Ascend310B4).
The 310 series itself was launched in 2018. B4, it seems, was made two years ago (but this is not certain). A year ago, there was an official reference board — Atlas 200I DK A2. It already has Linux and Python.
But until now, everything that could be bought was around 500 euros, which is expensive. And OrangePi is cheaper.

The first quest — Documentation

The official site with documentation looks like this.

Nothing in English. And all files, as usual, can be found only on Baidu.

Right while I was writing this article, an English-language landing page appeared on the OrangePi website. But the download links are still empty.

The main guide can be found on several sites, for example, here. It is obviously in Chinese. There is a semi-official site. It doesn’t have everything, and not everyone can download from it.

There is a super useful site about Atlas 200I DK A2. All the software is the same, but I haven’t tried to take the system image from there. It’s 1GB different, so that’s the one I wouldn’t take.

Of course, the best approach is to download it from Baidu. But one of my subscribers from KAVABANGA.TEAM shared the current version with me, so here it is. But it may be outdated at some point.

Flashing

The documentation advises writing the image through Balena Etcher, but it didn’t work for me. In the end, I wrote the image using an RPi imager. There is a tool on the Ascend website, but it’s in Chinese, so I haven’t tried it.

So, there are a lot of options here. There are also examples of flashing M.2 and eMMC.

Launching

Launching is obvious. I prefer SSH.

Use the second one. Then, the working directory will be the folder with examples.

You can connect via USB, COM port, or the monitor :)

Running the first example

I really like the way the Chinese have been making examples lately. Even in boards in which it is almost unreal to work, a beautiful example that runs in 2 clicks — it’s usual nowday:

Grove Vision AI (the board is more of a toy, but the example runs out of the box).
MilkV, a board for five bucks. One example out of the box, and you’ll have to go through hell to run the postal one.
MaiX -III . All the examples are run from Python. But each network is baked binary in C++ with network weights and code inside

So. Two commands:

cd samples/notebooks/
./start_notebook.sh

(you can configure the notebook to log in from the outside, but I prefer to tunnel)

And that’s it. You can run a dozen samples from there!

This is the easiest and most convenient way for beginners to run neurons on remote boards. A professional, of course, will use VSCode to connect. But when you need to do it quickly and understand what and how, it’s convenient to do it directly.

This is where 99% of guides about any AI boards end. But ours is just beginning.

Let’s talk about how to export your model to the board!

Export

If you search the main documentation, you will not find the words ONNX, TensorFlow, or PyTorch.

You won’t find the words ONNX, TensorFlow, or PyTorch.

It would seem that this can make me sad. But in the official example export was described. And it even seems to work right on the board:

Let’s pretend we can translate it without Google

This is rare. It seems that only Nvidia prefers to convert on the boards. Almost all other manufacturers prefer to do export on the host machine.

In practice — it doesn’t run at least on an 8Gb board. The problem is described here.

However, this solution does not work maybe because the reference board is different.

In the example from OrangePi, you can see attempts to play with swap. But it didn’t work for me. The export process leads to a reboot, probably because of overheating. But more about that later.

In theory, something like this should work:

. /usr/local/Ascend/ascend-toolkit/set_env.sh
export PYTHONPATH=/usr/local/Ascend/thirdpart/aarch64/acllite:$PYTHONPATH
atc --model=yolov5s.onnx --framework=5 --output=yolov5s_bs1 
    --input_format=NCHW --soc_version=Ascend310B4 --input_fp16_nodes="images"

Let’s leave this as homework for owners of 16GB boards.

Wow-wow-wow! After I finished the article, I found the fix. You need to switch to one core. Then it will take 4x less ,memory. And everything will work:

Finally it’s work

This is the fix before the start:

export TE_PARALLEL_COMPILER=1
export MAX_COMPILE_CORE_NUMBER=1

Please note that OrangePi AiPro is an Ascend310B4 board, not Ascend310B1, as indicated in the examples. You can check this with the command “npu-smi info”:

I think that Orange Pi team did not read their guide so far.

Real export

For us mere mortal peasants with 8Gb boards, another way is available. Install CANN-Toolkit on the host machine and export through it. Note that the system must have at least 16 gigs of RAM — more is better.

Download the latest toolkit. I originally downloaded version 6.2, but nothing worked. Please note!
Setting Dependencies. Lots of them:

sudo apt-get update
sudo apt-get install -y gcc g++ make cmake zlib1g zlib1g-dev openssl libsqlite3-dev libssl-dev libffi-dev libbz2-dev libxslt1-dev unzip pciutils net-tools libblas-dev gfortran libblas3

Install Python3 + pip3 if they are not there yet. If there is, don’t forget:

pip3 install --upgrade pip

But for this piece of the guide, Huawey should fire someone. You need to use Docker. Many hardware manufacturers are already doing this. For example, Rockchip started doing it this way. Hailo does it this way.

pip3 install attrs
pip3 install numpy
pip3 install decorator
pip3 install sympy
pip3 install cffi
pip3 install pyyaml
pip3 install pathlib2
pip3 install psutil
pip3 install protobuf
pip3 install scipy
pip3 install requests
pip3 install absl-py

Without specifying versions, even without a ready-made Requirrements.txt. At least numpy <2.0 should be installed. 2.0 does not work. I make it work with 1.26.0. What will be tomorrow?

Now all that remains is:

chmod +x Ascend-cann-toolkit_6.2.RC2_linux-x86_64.run
./Ascend-cann-toolkit_6.2.RC2_linux-x86_64.run --install

And basically, everything is ready. Before exporting, it is necessary to customize the enviroments:

source /home/ubuntu/Ascend/ascend-toolkit/set_env.sh
export LD_LIBRARY_PATH=/home/ubuntu/Ascend/ascend-toolkit/7.0.RC1/tools/ncs/lib64:$LD_LIBRARY_PATH
export PATH=/home/ubuntu/Ascend/ascend-toolkit/7.0.RC1/tools/ncs/bin/:$PATH

Different guides have slightly different values. This is what worked for me. It was installed in /home/ubuntu/Ascend/

It works with some warnings. But it works!

Impression

This is far from the worst board. The final guide is fairly simple and straightforward, with no magic.

But there is no ready-made guide anywhere. The system files must be collected all over the internet, even though I have a good idea of what to look for. Searching and googling for bugs in Chinese that Google can’t handle is a nightmare.

I’d give it a 6–7 out of 10. I get through it in 5–6 hours.

DISCLAIMER. From my experience with other boards, this guide works so far, but I’m sure it won’t be half a year before it’s broken.

Layer Support

All the basic convolution networks are working pretty well, and even some transformers work (Dinov2 checked). The threshold is passed where it is already necessary to work with text. YoloWorld is not working out of the box.

And work with LLM is not quite clear. Neither Torch nor Torch. jit are supported. What is supported:

Onnx
Tensorflow
Caffe
MindSpore

But, as you realize, most LLMs don’t export well, even in ONNX. So, the limitation, I think, will be more on this level. Everything is okay on pure convolutions, but all the logic must be removed from the model.

I only partially managed to export Whisper from here.

Let’s look at the speed

The board is good!

I measured everything in FP16. Let’s start from Yolov5:

preprocess time:
0.0060651302337646484
inference time:
0.04085516929626465
nms time:
0.0065386295318603516

Look at the comparison with other boards here (sorry, I haven’t added some new ones yet). These few boards are faster:

Jetson Orin
RK3588 in 12 threads mode (because of the tricky 3-core NPU). In one thread RK3588 is slower
Hailo-8 (but Hailo is int8 and very dependent on how fast the bus is). For example, a modern RPi Ai Kit will be slower.
MAIX-III (but it’s non-usable and int8 only)
Intel processors (but modern enough)

At the same time, in terms of simplicity, usability, speed, price, and support, the only direct competitor is RK3588 (and I like it more, but more on that later).

A few other, useful networks:

ResNet-50          – 12ms
Dinov2S (224*224)  - 46ms
Dinov2B (224*224)  - 111ms

It’s really nice. For Dinov2, it’s much faster than RK3588. And yes, it’s okay that different platforms perform differently on different networks.

Important note: increasing the batch size does not lead to speedup

Power consumption

I am a bit confused. Idle consumption is seven wt, and consumption when the npu is fully loaded is nine wt. It’s completely different from all other boards I tested.

Temperature

The temperature is super strange as well. I tested this without the cooler. The board is super hot, even in Idle mode. And the temperature is almost the same during the inference.

My general opinion on usability

Oh. This is where it gets interesting. I will speak against the background of everything I have dealt with.

When choosing a board, it is important to consider what it solves and where it can be used.

No one in Europe or the US will use Chinese boards if they are used in critical infrastructure (hospitals, smart cities, etc.). There, we use Jetsons, Hailo, and Intel. You could use a few other boards.

It is quite normal for private businesses to use Chinese boards. That’s why simple stuff (pet monitoring, smart stores, private parking lots, etc.) often use RockChips.

But there are problems with OrangePi AiPro:

It can not be used everywhere.
Even countries where it can be used are big questions, and Huawei is not targeting them.
They don’t even try to write documentation in English.

And it seems that these three problems are enough to choose RockChips everywhere except for super large lots (there, you need to look at slightly different parameters)

You can find this article as video onYouTube!