How to run OpenCV on STM32 MCU

Денис Дерюгин
5 min readJul 16, 2019

--

In this article I’ll tell how I got OpenCV application running on STM32746G-Discovery and STM32F769I-Discovery. If you want just a brief instruction how to reproduce it, head to corresponding wiki page on Github.

OpenCV example (Canny edges filter) on STM32769G-Discovery board

I’m one of Embox developers. This RTOS allows to run some “heavy” Linux software (QT, OpenGL, PJSIP, etc.) without Linux kernel (i.e. you need far less resources and get more direct control over peripherals). Running OpenCV on MCUs seems to be quite popular request, but it seems that nobody done that yet (there are some videos with names like “OpenCV + STM32”, but as far as I see they use STM32 board as a camera, while actual image processing is being done on desktop), so I decided to port it to STM32F7Discovery board.

What’s the problem?

OpenCV has two major issues when you try to run it on MCU:

  • Compiled code takes too much memory (~4MiB with minimum modules enabled)
  • OpenCV is written in C++, so you can’t just run it as a baremetal code (exceptions and libstdc++ are required)

Porting OpenCV to Embox

When you port something to a new platform, it’s a good idea to build it from source in a normal way, i.e. compile it for GNU/Linux system. With OpenCV it’s not a problem: source code is available at Github, and it’s easy to build it with cmake.
Good news: OpenCV can be statically linked, so it will be much easier to port to MCU. Let’s build it with default config and find out how much code it produces:

> size lib/*so --totals
text data bss dec hex filename
1945822 15431 960 1962213 1df0e5 lib/libopencv_calib3d.so
17081885 170312 25640 17277837 107a38d lib/libopencv_core.so
10928229 137640 20192 11086061 a928ed lib/libopencv_dnn.so
842311 25680 1968 869959 d4647 lib/libopencv_features2d.so
423660 8552 184 432396 6990c lib/libopencv_flann.so
8034733 54872 1416 8091021 7b758d lib/libopencv_gapi.so
90741 3452 304 94497 17121 lib/libopencv_highgui.so
6338414 53152 968 6392534 618ad6 lib/libopencv_imgcodecs.so
21323564 155912 652056 22131532 151b34c lib/libopencv_imgproc.so
724323 12176 376 736875 b3e6b lib/libopencv_ml.so
429036 6864 464 436364 6a88c lib/libopencv_objdetect.so
6866973 50176 1064 6918213 699045 lib/libopencv_photo.so
698531 13640 160 712331 ade8b lib/libopencv_stitching.so
466295 6688 168 473151 7383f lib/libopencv_video.so
315858 6972 11576 334406 51a46 lib/libopencv_videoio.so
76510375 721519 717496 77949390 4a569ce (TOTALS)

As you can see at the last line, .bss and .data sections take less than 1MiB each while code section is ~70MiB (of course with static linking it will take much less for particular application, but that’s too much anyway).

Now let’s try to make a minimal build. Call cmake .. -LA to list available options and turn off as much options as possible:

-DBUILD_opencv_java_bindings_generator=OFF \
-DBUILD_opencv_stitching=OFF \
-DWITH_PROTOBUF=OFF \
-DWITH_PTHREADS_PF=OFF \
-DWITH_QUIRC=OFF \
-DWITH_TIFF=OFF \
-DWITH_V4L=OFF \
-DWITH_VTK=OFF \
-DWITH_WEBP=OFF \
<...>

Section sizes:

> size lib/libopencv_core.a --totals
text data bss dec hex filename
3317069 36425 17987 3371481 3371d9 (TOTALS)

Run OpenCV in QEMU

It’s a good idea to start with emulator than going with actual hardware, so let’s try out QEMU to run OpenCV on emulated Integrator/CP board (it’s just a random ARM board with video support in QEMU).

Minimal working example of using OpenCV looks like this:

version.cpp:

#include <stdio.h>
#include <opencv2/core/utility.hpp>

int main() {
printf("OpenCV: %s", cv::getBuildInformation().c_str());

return 0;
}

This program prints some OpenCV info:

root@embox:/#opencv_version                                                     
OpenCV:
General configuration for OpenCV 4.0.1 =====================================
Version control: bd6927bdf-dirty

Platform:
Timestamp: 2019-06-21T10:02:18Z
Host: Linux 5.1.7-arch1-1-ARCH x86_64
Target: Generic arm-unknown-none
CMake: 3.14.5
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/make
Configuration: Debug

CPU/HW features:
Baseline:
requested: DETECT
disabled: VFPV3 NEON

C/C++:
Built as dynamic libs?: NO
< other build info follows >

Next step is running some basic example of actual image processing. There are some examples on the official website, I’ve chosen Canny edge detector.

OpenCV supports QT, GTK and Window APIs, which are too heavy to run on MCUs, so I had to rewrite some parts of this example for direct drawing to the frame buffer. After some tinkering with inner OpenCV image formats, I got following results:

Original image
Result of edge detection

Run OpenCV on STM32F7Discovery

32F746GDISCOVERY has following memory resources:

  1. 1 Mibytes of Flash memory
  2. 340 Kibytes of RAM
  3. 128-Mbit Quad-SPI Flash memory
  4. 128-Mbit SDRAM (64 Mbits accessible)
  5. Connector for microSD card

microSD can be used to store images, but it’s not very helpful to handle large code section.

Display resolution is 480x272, so frame buffer will take 522 240 bytes (with 32-bit colors), i.e. it doesn’t fit RAM. However, it’s possible to use SDRAM for heap and frame buffer; the rest of RAM will be used for other OS needs (stack, process resources, etc.).

Minimal Embox config with OpenCV has following sections:

text    data     bss     dec     hex filename
2876890 459208 312736 3648834 37ad42 build/base/bin/embox

Brief digression on sections: .text and .rodata sections contain instructions and constants (i.e. unmodifiable data), .data contains mutable data, .bss contains “zeroed” variables, which are not actually placed into the kernel image, but this memory will be used in run-time.

.data/.bss are ok, they surely will fit RAM/SDRAM, but .text is too large (code is placed to flash memory — 1MiB).

The only way to handle such large code section is to use QSPI-flash memory. It has memory-mapped mode, which allows read-only access via system bus, so it’s possible to place code there. However, there are few problems with it:

  1. QSPI is not accessible after reboot, i.e. it’s neccessary to perform some software initialization before executing code from this memory.
  2. You can’t flash it with openocd and gdb as usual.

Eventually I decided to write a small bootloader which got data from host computer via TFTP and wrote data to QSPI with stm32cube functions.

Results

Finally, it’s working! However, it takes too much time: 40 seconds to process and draw image.

Then I tried to run the same program on STM32F769I-Discovery board with slightly different config (it uses another pins for UART and stuff like that). This board has 2MiB flash memory, so with -O2 it works just fine without QSPI trick, and process this image in 3 seconds.

I hope this article would help you to run your own OpenCV-based projects. Feel free to create issues in Embox repository or to mail me if you need some help.

--

--