Optimizing rendering performance on STM32 using DMA2D

Jakub Wegrzyn
siili_auto
Published in
12 min readMar 25, 2019

--

STM32 provides a great platform for a multitude of projects and often MCU will need to be paired with LCD to show user interface. In such cases, you want to make sure that the user experience will be as good as possible. Unfortunately, with limited CPU power, graphics rendering becomes very expensive. But fear not! That’s where DMA2D comes to the rescue.

DMA2D (also known as The Chrom-Art Accelerator) is a specialized DMA dedicated to image manipulation. It provides support for a few simple, but important operations (filling buffer with colour, copying and blending images, pixel format conversion). After reading this article you will be able to quickly adopt it in your projects.

To ease the setup of the development environment, we will be using System Workbench for STM32 IDE that is available for free. As it’s based on Eclipse, it supports Windows, Linux and macOS. Another advantage is that it contains all tools that are required to work with STM32 based boards.

You can download it at http://www.openstm32.org.

In case of any issues with installation, there is detailed instruction that should help you with solving typical problems.

In System Workbench for STM32 select menu File > New > C Project:

Enter the desired project name. In Project type tree expand Executable directory, select Ac6 STM32 MCU Project and click Next button:

Leave both configurations selected and click Next:

In Target Configuration window, select Board tab. In Series combo box select STM32F7 and in Board look for STM32F769i-DISCO. Click Next.

In Project Firmware configuration step select Hardware Abstraction Layer (Cube HAL). Download required components and click Finish.

After a successful project setup, you should end up with main.c file opened in the editor that contains very basic code similar to this:

#include "stm32f7xx.h"
#include "stm32f769i_discovery.h"

int main(void)
{
for(;;);
}

Replace it with the following code that will allow us to quickly move to draw something on the screen. It takes care of basic initialization that will be required in the next part of the article:

#include "stm32f7xx.h"
#include "stm32f769i_discovery.h"

void SystemClock_Config();

int main(void)
{
HAL_Init();
SystemClock_Config();

BSP_LED_Init(LED1);
while(1)
{
BSP_LED_Toggle(LED1);
HAL_Delay(150);
}
}

void SystemClock_Config()
{
RCC_ClkInitTypeDef RCC_ClkInitStruct;
RCC_OscInitTypeDef RCC_OscInitStruct;

/* Enable Power Control clock */
__HAL_RCC_PWR_CLK_ENABLE();

/* The voltage scaling allows optimizing the power consumption when the device is
clocked below the maximum system frequency, to update the voltage scaling value
regarding system frequency refer to product datasheet. */
__HAL_PWR_VOLTAGESCALING_CONFIG(PWR_REGULATOR_VOLTAGE_SCALE1);

/* Enable HSE Oscillator and activate PLL with HSE as source */
RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSE;
RCC_OscInitStruct.HSEState = RCC_HSE_ON;
RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;
RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSE;
RCC_OscInitStruct.PLL.PLLM = 25;
RCC_OscInitStruct.PLL.PLLN = 400;
RCC_OscInitStruct.PLL.PLLP = RCC_PLLP_DIV2;
RCC_OscInitStruct.PLL.PLLQ = 8;
RCC_OscInitStruct.PLL.PLLR = 7;
HAL_RCC_OscConfig(&RCC_OscInitStruct);

/* Activate the OverDrive to reach the 200 MHz Frequency */
HAL_PWREx_EnableOverDrive();

/* Select PLL as system clock source and configure the HCLK, PCLK1 and PCLK2 clocks dividers */
RCC_ClkInitStruct.ClockType = (
RCC_CLOCKTYPE_SYSCLK |
RCC_CLOCKTYPE_HCLK |
RCC_CLOCKTYPE_PCLK1 |
RCC_CLOCKTYPE_PCLK2);
RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;
RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;
RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV4;
RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV2;

HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_6);
}

If you are using a different board, make sure you modify SystemClock_Config function to match your configuration.

To test it, let’s try running it on the board — use your USB cable to connect it to your computer. Then in Eclipse, select menu Run > Run As > Ac6 STM32 C/C++ Application.

When the build process completes, the new firmware will be uploaded to the board. When it starts, you should see blinking LED:

Before drawing anything we need to take care of setting up LCD screen. Thankfully it is really straightforward and requires only a few lines of code.

Start with including stm32f769i_discovery_lcd.h header at the top of main.c file:

#include "stm32f769i_discovery_lcd.h"

In main function, right after a call to SystemClock_Config() add LCD initialization block:

BSP_LCD_Init();
BSP_LCD_LayerDefaultInit(0, LCD_FB_START_ADDRESS);
BSP_LCD_Clear(LCD_COLOR_BLACK);

The first line initializes the DSI LCD, the second one takes care of setting up the background layer that will be using frame buffer located at LCD_FB_START_ADDRESS. Finally, the last line clears the screen with black colour.

LCD library provides lots of utilities for basic drawing operations, including text rendering. We will display a simple message to ensure that our LCD setup code works correctly.

First, lets set colours for text and background (yellow text on purple background):

BSP_LCD_SetTextColor(0xffffc300);
BSP_LCD_SetBackColor(0xff571845);

Fortunately, the project generated by System Workbench contains few fonts (located at Utilities/Fonts directory) that we can use:

BSP_LCD_SetFont(&Font24);

Finally, we can draw a simple text message:

BSP_LCD_DisplayStringAt(0, 20, (uint8_t *)"DMA2D Sample", CENTER_MODE);

In the end, after running the code you should see it on the screen:

It is time to dive into drawing with DMA2D.

One of the operations that are supported by DMA2D is filling a provided buffer with colour. In such a case, we need to use Register-to-Memory mode that will use provided colour value.

Start with creating an empty function named DMA2D_FillRect:

void DMA2D_FillRect(uint32_t color, uint32_t x, uint32_t y, uint32_t width, uint32_t height)
{
}

It will be taking care of filling part of the frame buffer (defined by x,y,width and height parameters) with color.

All DMA2D related functions provided by HAL require DMA2D_HandleTypeDef structure filled with configuration data:

DMA2D_HandleTypeDef hdma2d;

For now, we will focus on two fields:

typedef struct __DMA2D_HandleTypeDef
{
DMA2D_TypeDef *Instance;
DMA2D_InitTypeDef Init;
/* ... */
} DMA2D_HandleTypeDef;

Instance field is used to provide a base address of DMA2D register. By default, it will be defined by DMA2D macro declared in stm32f769xx.h header:

hdma2d.Instance = DMA2D;

Second one, Init is a structure of DMA2D_InitTypeDef type that holds all DMA2D communication parameters:

typedef struct
{
uint32_t Mode;
uint32_t ColorMode;
uint32_t OutputOffset;
uint32_t AlphaInverted;
uint32_t RedBlueSwap;
} DMA2D_InitTypeDef;

As mentioned before, to fill a buffer with colour, Mode needs to be set to Register-to-Memory defined by DMA2D_R2M macro:

hdma2d.Init.Mode = DMA2D_R2M;

DMA2D supports three other modes that will be covered in the next steps:

  • Memory-to-Memory (DMA2D_M2M)
  • Memory-to-Memory with pixel format conversion (DMA2D_M2M_PFC)
  • Memory-to-Memory with blending (DMA2D_M2M_BLEND)

ColorMode is used to set output colour format. We will be filling frame buffer which by default uses ARGB8888:

hdma2d.Init.ColorMode = DMA2D_OUTPUT_ARGB8888;

For now, let’s also set OutputOffset to 0:

hdma2d.Init.OutputOffset = 0;

The last two fields, AlphaInverted and RedBlueSwap are used only in DMA2D_M2M_PFC mode.

Once all parameters are in place, it is time to initialize DMA2D and create associated handle:

HAL_DMA2D_Init(&hdma2d);

Finally, we are ready to start DMA2D transfer by calling HAL_DMA2D_Start function:

HAL_DMA2D_Start(&hdma2d, color, LCD_FB_START_ADDRESS, width, height);

It expects five parameters:

  • pointer to configuration information
  • colour value (in Register-to-Memory mode) or address of source memory (in Memory-to-Memory modes)
  • address of destination memory buffer (in our case its address of frame buffer)
  • number of pixels per line
  • number of lines

This function will start a transfer in polling mode so in order to wait for it to finish we will use HAL_DMA2D_PollForTransfer function with timeout set to 10 milliseconds:

HAL_DMA2D_PollForTransfer(&hdma2d, 10);

Finally, we can fill the screen with colour by calling our new function:

DMA2D_FillRect(0xff900c3e, 0, 0, BSP_LCD_GetXSize(), BSP_LCD_GetYSize());

So far, our custom DMA2D_FillRect function is successfully filled whole frame buffer. But let’s assume that we want to fill only left half of it:

DMA2D_FillRect(0xff900c3e, 0, 0, BSP_LCD_GetXSize()/2, BSP_LCD_GetYSize());

It is clear that something went wrong. What happened? By default DMA2D treats output buffer as contiguous memory with lines laid one after another. When the line width of transferred data is lower than the line width of destination buffer, we need to provide offset information that will tell DMA2D what is the number of pixels between the end of a transferred line and the beginning of the next line in an output buffer.

To do that we need to update the value of OutputOffset that we previously set to 0:

hdma2d.Init.OutputOffset = BSP_LCD_GetXSize() - width;

Success!

Perhaps, you have noticed before that our function ignores x and y parameters. Let’s see what happens if we try to fill another half of the screen:

DMA2D_FillRect(0xffc70039, BSP_LCD_GetXSize()/2, 0, BSP_LCD_GetXSize()/2, BSP_LCD_GetYSize());

Another issue! This time we need to offset address of output buffer to match requested position. To do that we will offset LCD_FB_START_ADDRESS by the number of bytes per pixel (4 in ARGB8888 mode) multiplied by x:

HAL_DMA2D_Start(&hdma2d, color, LCD_FB_START_ADDRESS + x*4, width, height);

If you’ve been paying attention, you know that y parameter is still left unused! To fix it, we should again update output address calculations to skip correct number of lines:

HAL_DMA2D_Start(&hdma2d, color, LCD_FB_START_ADDRESS + (x + y * BSP_LCD_GetXSize()) * 4, width, height);

Let’s do the final test by drawing another rectangle in the centre of the screen:

DMA2D_FillRect(0xffffc000, BSP_LCD_GetXSize()/4, BSP_LCD_GetYSize()/4, BSP_LCD_GetXSize()/2, BSP_LCD_GetYSize()/2);

Now, the DMA2D_FillRect should look like this:

void DMA2D_FillRect(uint32_t color, uint32_t x, uint32_t y, uint32_t width, uint32_t height)
{
DMA2D_HandleTypeDef hdma2d;
hdma2d.Instance = DMA2D;

hdma2d.Init.Mode = DMA2D_R2M;
hdma2d.Init.ColorMode = DMA2D_OUTPUT_ARGB8888;
hdma2d.Init.OutputOffset = BSP_LCD_GetXSize() - width;

HAL_DMA2D_Init(&hdma2d);
HAL_DMA2D_Start(
&hdma2d,
color,
LCD_FB_START_ADDRESS + (x + y * BSP_LCD_GetXSize()) * 4,
width,
height);
HAL_DMA2D_PollForTransfer(&hdma2d, 10);
}

Before moving to draw images with DMA2D, we need an easy way of embedding an image into our code. One of the most straightforward solutions is to create an array of pixel data. Of course, doing that manually would be impossible — hence this simple Python 2.7 script may help:

import argparse
import os
from PIL import Image

parser = argparse.ArgumentParser()
parser.add_argument('path')
args = parser.parse_args()

image = Image.open(args.path)
print "const uint32_t IMAGE_WIDTH = %d;" % (image.width)
print "const uint32_t IMAGE_HEIGHT = %d;" % (image.height)
print "const uint32_t IMAGE_DATA[] = {"
pixels = list(image.convert('RGBA').getdata())
for r, g, b, a in pixels:
print ' 0x{:02x}{:02x}{:02x}{:02x},'.format(a, r, g, b)
print "};"

To use it, go to your terminal and call:

$ python img2array.py image.png

It will generate code similar to this:

const uint32_t IMAGE_WIDTH = 200;
const uint32_t IMAGE_HEIGHT = 100;
const uint32_t IMAGE_DATA[] = {
0x00000000,
0x00000000,
/* ... */
0x00000000,
0x00000000,
};

Just create a new empty header in your project and copy generated data.

Drawing images works in a similar manner to filling a buffer with colour. Let’s start with creating a copy of our DMA2D_FillRect function and rename it to DMA2D_DrawImage. Rename color parameter to data (remember to update call to HAL_DMA2D_Start too).

To transfer image data we need to use Memory-to-Memory mode instead of Register-to-Memory:

hdma2d.Init.Mode = DMA2D_M2M;

When drawing images, DMA2D uses background (0) and foreground (1) layers. In case of simple transfer of an image to target buffer, only the foreground layer is used.

DMA2D_HandleTypeDef structure contains an array of layer configuration data:

typedef struct __DMA2D_HandleTypeDef
{
/* ... */
DMA2D_LayerCfgTypeDef LayerCfg[MAX_DMA2D_LAYER];
/* ... */
} DMA2D_HandleTypeDef;

To set up a layer that will be used to draw an image, we need to set three parameters:

  • AlphaMode - this field is used to set alpha mode; we don't need to perform any modifications of an alpha channel so set it to DMA2D_NO_MODIF_ALPHA
  • InputColorMode - colour format of an input image (DMA2D_INPUT_ARGB8888); if your image uses a colour format different from target buffer format, remember to set DMA2D mode to DMA2D_M2M_PFC (Memory-to-Memory with pixel format conversion)
  • InputOffset - when drawing only parts of a source image, use this field to instruct DMA2D how to offset line data; as we are drawing the whole image we can set it to zero
// Foreground
hdma2d.LayerCfg[1].AlphaMode = DMA2D_NO_MODIF_ALPHA;
hdma2d.LayerCfg[1].InputColorMode = DMA2D_INPUT_ARGB8888;
hdma2d.LayerCfg[1].InputOffset = 0;

Finally, initialize foreground layer:

HAL_DMA2D_ConfigLayer(&hdma2d, 1);

Final DMA2D_DrawImage function:

void DMA2D_DrawImage(uint32_t data, uint32_t x, uint32_t y, uint32_t width, uint32_t height)
{
DMA2D_HandleTypeDef hdma2d;
hdma2d.Instance = DMA2D;

hdma2d.Init.Mode = DMA2D_M2M;
hdma2d.Init.ColorMode = DMA2D_OUTPUT_ARGB8888;
hdma2d.Init.OutputOffset = BSP_LCD_GetXSize() - width;

// Foreground
hdma2d.LayerCfg[1].AlphaMode = DMA2D_NO_MODIF_ALPHA;
hdma2d.LayerCfg[1].InputColorMode = DMA2D_INPUT_ARGB8888;
hdma2d.LayerCfg[1].InputOffset = 0;

HAL_DMA2D_Init(&hdma2d);
HAL_DMA2D_ConfigLayer(&hdma2d, 1);
HAL_DMA2D_Start(&hdma2d, data, LCD_FB_START_ADDRESS + (x + y * BSP_LCD_GetXSize()) * 4, width, height);
HAL_DMA2D_PollForTransfer(&hdma2d, 10);
}

Now it is time to use it to draw our image to the screen:

DMA2D_DrawImage(
(uint32_t)IMAGE_DATA,
(BSP_LCD_GetXSize() - IMAGE_WIDTH) / 2,
(BSP_LCD_GetYSize() - IMAGE_HEIGHT) / 2,
IMAGE_WIDTH,
IMAGE_HEIGHT
);

Unfortunately, we can clearly see that the alpha channel was ignored. There are situations where this would be fine. This time though, the image has some transparent regions and we want to make sure it correctly blends with contents of a frame buffer.

In order to achieve that, we need to use HAL_DMA2D_BlendingStart function. It introduces one additional parameter that is used to provide an address of background buffer that will be used to blend with the source image. As we want to blend our image with contents of the frame buffer, it is the same address as of destination:

uint32_t destination = LCD_FB_START_ADDRESS + (x + y * BSP_LCD_GetXSize()) * 4;
/* ... */
HAL_DMA2D_BlendingStart(&hdma2d, data, destination, destination, width, height);

Next, let’s configure the background layer. We can simply copy the foreground setup code and change layer index to zero:

// Background
hdma2d.LayerCfg[0].AlphaMode = DMA2D_NO_MODIF_ALPHA;
hdma2d.LayerCfg[0].InputColorMode = DMA2D_INPUT_ARGB8888;
hdma2d.LayerCfg[0].InputOffset = 0;
/* ... */
HAL_DMA2D_ConfigLayer(&hdma2d, 0);

Make sure to update DMA2D mode:

hdma2d.Init.Mode = DMA2D_M2M_BLEND;

As we are now using a blending mode, we need to make sure that foreground layer alpha inversion is disabled:

hdma2d.LayerCfg[1].AlphaInverted = DMA2D_REGULAR_ALPHA;

Lets see how it works:

As you probably have already noticed, we still need to modify InputOffset for background layer which is now set to zero. Fortunately, we have already dealt with the similar issue before so we know exactly what to do:

hdma2d.LayerCfg[0].InputOffset = BSP_LCD_GetXSize() - width;

After all the above modifications, our image drawing function looks like this:

void DMA2D_DrawImage(uint32_t data, uint32_t x, uint32_t y, uint32_t width, uint32_t height)
{
uint32_t destination = LCD_FB_START_ADDRESS + (x + y * BSP_LCD_GetXSize()) * 4;

DMA2D_HandleTypeDef hdma2d;
hdma2d.Instance = DMA2D;

hdma2d.Init.Mode = DMA2D_M2M_BLEND;
hdma2d.Init.ColorMode = DMA2D_OUTPUT_ARGB8888;
hdma2d.Init.OutputOffset = BSP_LCD_GetXSize() - width;

// Foreground
hdma2d.LayerCfg[1].AlphaMode = DMA2D_NO_MODIF_ALPHA;
hdma2d.LayerCfg[1].InputColorMode = DMA2D_INPUT_ARGB8888;
hdma2d.LayerCfg[1].InputOffset = 0;
hdma2d.LayerCfg[1].AlphaInverted = DMA2D_REGULAR_ALPHA;

// Background
hdma2d.LayerCfg[0].AlphaMode = DMA2D_NO_MODIF_ALPHA;
hdma2d.LayerCfg[0].InputColorMode = DMA2D_INPUT_ARGB8888;
hdma2d.LayerCfg[0].InputOffset = BSP_LCD_GetXSize() - width;

HAL_DMA2D_Init(&hdma2d);
HAL_DMA2D_ConfigLayer(&hdma2d, 1);
HAL_DMA2D_ConfigLayer(&hdma2d, 0);
HAL_DMA2D_BlendingStart(&hdma2d, data, destination, destination, width, height);
HAL_DMA2D_PollForTransfer(&hdma2d, 10);
}

In this article, we covered the very basic usage of DMA2D. There is still much more to uncover so feel free to dig into documentation.

For the simplicity of code samples, we were using polling mode. In real-world scenarios, it might be beneficial to use DMA2D in interrupt mode. With HAL API it is fairly simple: replace HAL_DMA2D_BlendingStart calls with HAL_DMA2D_BlendingStart_IT (notice _IT suffix). When doing that, make sure to set function pointers for interrupt callbacks:

typedef struct __DMA2D_HandleTypeDef
{
/* ... */
void (* XferCpltCallback)(struct __DMA2D_HandleTypeDef * hdma2d);
void (* XferErrorCallback)(struct __DMA2D_HandleTypeDef * hdma2d);
/* ... */
} DMA2D_HandleTypeDef;

Even in such simple cases, DMA2D gives huge performance improvements that can be used to build software that will provide an excellent user experience. Obviously, always make sure to measure your code. In some cases doing lots of small transfers using DMA2D might introduce overhead that will make software renderer a better fit.

Good luck! Live long and render stuff! 🖖

--

--