How to run OpenCV on STM32 MCU

5 min readJul 16, 2019

In this article I’ll tell how I got OpenCV application running on STM32746G-Discovery and STM32F769I-Discovery. If you want just a brief instruction how to reproduce it, head to corresponding wiki page on Github.

OpenCV example (Canny edges filter) on STM32769G-Discovery board

I’m one of Embox developers. This RTOS allows to run some “heavy” Linux software (QT, OpenGL, PJSIP, etc.) without Linux kernel (i.e. you need far less resources and get more direct control over peripherals). Running OpenCV on MCUs seems to be quite popular request, but it seems that nobody done that yet (there are some videos with names like “OpenCV + STM32”, but as far as I see they use STM32 board as a camera, while actual image processing is being done on desktop), so I decided to port it to STM32F7Discovery board.

What’s the problem?

OpenCV has two major issues when you try to run it on MCU:

Compiled code takes too much memory (~4MiB with minimum modules enabled)
OpenCV is written in C++, so you can’t just run it as a baremetal code (exceptions and libstdc++ are required)

Porting OpenCV to Embox

When you port something to a new platform, it’s a good idea to build it from source in a normal way, i.e. compile it for GNU/Linux system. With OpenCV it’s not a problem: source code is available at Github, and it’s easy to build it with cmake.
Good news: OpenCV can be statically linked, so it will be much easier to port to MCU. Let’s build it with default config and find out how much code it produces:

> size lib/*so --totals
   text    data     bss     dec     hex filename
1945822   15431     960 1962213  1df0e5 lib/libopencv_calib3d.so
17081885     170312   25640 17277837    107a38d lib/libopencv_core.so
10928229     137640   20192 11086061     a928ed lib/libopencv_dnn.so
 842311   25680    1968  869959   d4647 lib/libopencv_features2d.so
 423660    8552     184  432396   6990c lib/libopencv_flann.so
8034733   54872    1416 8091021  7b758d lib/libopencv_gapi.so
  90741    3452     304   94497   17121 lib/libopencv_highgui.so
6338414   53152     968 6392534  618ad6 lib/libopencv_imgcodecs.so
21323564     155912  652056 22131532    151b34c lib/libopencv_imgproc.so
 724323   12176     376  736875   b3e6b lib/libopencv_ml.so
 429036    6864     464  436364   6a88c lib/libopencv_objdetect.so
6866973   50176    1064 6918213  699045 lib/libopencv_photo.so
 698531   13640     160  712331   ade8b lib/libopencv_stitching.so
 466295    6688     168  473151   7383f lib/libopencv_video.so
 315858    6972   11576  334406   51a46 lib/libopencv_videoio.so
76510375     721519  717496 77949390    4a569ce (TOTALS)

As you can see at the last line, .bss and .data sections take less than 1MiB each while code section is ~70MiB (of course with static linking it will take much less for particular application, but that’s too much anyway).

Now let’s try to make a minimal build. Call cmake .. -LA to list available options and turn off as much options as possible:

-DBUILD_opencv_java_bindings_generator=OFF \
        -DBUILD_opencv_stitching=OFF \
        -DWITH_PROTOBUF=OFF \
        -DWITH_PTHREADS_PF=OFF \
        -DWITH_QUIRC=OFF \
        -DWITH_TIFF=OFF \
        -DWITH_V4L=OFF \
        -DWITH_VTK=OFF \
        -DWITH_WEBP=OFF \
        <...>

Section sizes:

> size lib/libopencv_core.a --totals
   text    data     bss     dec     hex filename
3317069   36425   17987 3371481  3371d9 (TOTALS)

Run OpenCV in QEMU

It’s a good idea to start with emulator than going with actual hardware, so let’s try out QEMU to run OpenCV on emulated Integrator/CP board (it’s just a random ARM board with video support in QEMU).

Minimal working example of using OpenCV looks like this:

version.cpp:

#include <stdio.h>
#include <opencv2/core/utility.hpp>

int main() {
    printf("OpenCV: %s", cv::getBuildInformation().c_str());

    return 0;
}

This program prints some OpenCV info:

root@embox:/#opencv_version                                                     
OpenCV: 
General configuration for OpenCV 4.0.1 =====================================
  Version control:               bd6927bdf-dirty

  Platform:
    Timestamp:                   2019-06-21T10:02:18Z
    Host:                        Linux 5.1.7-arch1-1-ARCH x86_64
    Target:                      Generic arm-unknown-none
    CMake:                       3.14.5
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               Debug

  CPU/HW features:
    Baseline:
      requested:                 DETECT
      disabled:                  VFPV3 NEON

  C/C++:
    Built as dynamic libs?:      NO
< other build info follows >

Next step is running some basic example of actual image processing. There are some examples on the official website, I’ve chosen Canny edge detector.

OpenCV supports QT, GTK and Window APIs, which are too heavy to run on MCUs, so I had to rewrite some parts of this example for direct drawing to the frame buffer. After some tinkering with inner OpenCV image formats, I got following results:

Run OpenCV on STM32F7Discovery

32F746GDISCOVERY has following memory resources:

1 Mibytes of Flash memory
340 Kibytes of RAM
128-Mbit Quad-SPI Flash memory
128-Mbit SDRAM (64 Mbits accessible)
Connector for microSD card

microSD can be used to store images, but it’s not very helpful to handle large code section.

Display resolution is 480x272, so frame buffer will take 522 240 bytes (with 32-bit colors), i.e. it doesn’t fit RAM. However, it’s possible to use SDRAM for heap and frame buffer; the rest of RAM will be used for other OS needs (stack, process resources, etc.).

Minimal Embox config with OpenCV has following sections:

text    data     bss     dec     hex filename
2876890  459208  312736 3648834  37ad42 build/base/bin/embox

Brief digression on sections: .text and .rodata sections contain instructions and constants (i.e. unmodifiable data), .data contains mutable data, .bss contains “zeroed” variables, which are not actually placed into the kernel image, but this memory will be used in run-time.

.data/.bss are ok, they surely will fit RAM/SDRAM, but .text is too large (code is placed to flash memory — 1MiB).

The only way to handle such large code section is to use QSPI-flash memory. It has memory-mapped mode, which allows read-only access via system bus, so it’s possible to place code there. However, there are few problems with it:

QSPI is not accessible after reboot, i.e. it’s neccessary to perform some software initialization before executing code from this memory.
You can’t flash it with openocd and gdb as usual.

Eventually I decided to write a small bootloader which got data from host computer via TFTP and wrote data to QSPI with stm32cube functions.

Results

Finally, it’s working! However, it takes too much time: 40 seconds to process and draw image.

Then I tried to run the same program on STM32F769I-Discovery board with slightly different config (it uses another pins for UART and stuff like that). This board has 2MiB flash memory, so with -O2 it works just fine without QSPI trick, and process this image in 3 seconds.

I hope this article would help you to run your own OpenCV-based projects. Feel free to create issues in Embox repository or to mail me if you need some help.

Embox resources:

Homepage
Main mailing list: embox-devel@googlegroups.com
Telegram chat
Telegram news channel