Xilinx Vivado HLS Beginners Tutorial : Custom IP Core Design for FPGA

My other articles :

Purpose of this tutorial is to help those who are trying to build their own IP cores for FPGA. I will be explaining the basic steps and tips for designing your own IP core (targeted for Xilinx FPGAs) using Vivado High Level Synthesis(HLS) tool by Xilinx. As a beginner, using Vivado HLS can be difficult if you are not used to read Xilinx documentation PDFs and extract information from them.

This will be the first tutorial of tutorial series that explains custom IP core design flow for FPGA embedded systems (ZYNQ-7000 AP SoC). As an example in this tutorial, I will be creating a basic image processing ( 2D convolution ) IP core using openCV functions with AXI FULL (Memory Mapped) interface.

Link to the Vivado HLS project files for this tutorial is available at the end of the tutorial.

First of all, I will give a basic introduction about High Level Synthesis(HLS) for the beginners. If you are familiar with HLS, skip to the start of the steps.

What is HLS ?

Usually hardware systems for FPGAs are designed using Hardware Descriptive Languages (HDL) such as Verilog, System Verilog and VHDL (HDLs are considered as low level languages). As the name suggests, you can describe hardware (digital electronic circuits) using HDLs. But to design hardware on FPGAs using HDLs require digital electronics knowledge. High Level Synthesis was introduced to reduce the electronics knowledge required to design hardware. It also makes the hardware design flow easier when it comes to achieving a certain behavioral model required by the hardware without worrying to much about the electronics underneath.

What HLS does is, when we provide the behavioral model that we want to implement in hardware using a control-flow language such as C/C++, HLS tool will map our behavioral model into hardware and generate the HDL code for it which can be synthesized and implemented on a FPGA.

One of the important things to learn about Vivado HLS is compiler directives. Vivado HLS uses compiler directives starting with “#pragma HLS” to assist mapping C/C++ code to hardware. You can find more information about Vivado HLS pragmas here.

Prerequisites

Basic knowledge of how to create a new project in Vivado HLS.

Step 1 : Create a New Project

Open Vivado HLS and create a new project with the top function name “conv”. Select a part or development board you have ( I am using Xilinx ZC702) and finish creating the new project. Keep the default period (10ns) for now, you can change it later if you want. In the right side of the window, you will see the explorer as in the following figure.

Right click on source and create a new file, name it “core.cpp” and save inside the project folder. After saving, Vivado HLS will automatically open the empty new file. Lets start designing out IP core.

Step 2 : Designing IP core

We basically develop the IP as a C++ function in this step. We only design the IP core for functional testing in this step, I will explain how to prepare it for synthesis in a later step. I will explain each code snippet I use.

First, lets include some of the header files we need.

#include <hls_video.h>
#include <stdint.h>

First header file defines the hardware optimized openCV functions and types. These functions need to be called using “hls” namespace (not “cv”) and dimensions related to the functions must be given through template arguments (since these will be implemented on hardware, dimensions cannot be variables in runtime).

Second header file includes basic integer type definitions that we will be using.

Define the IP core function

#include <hls_video.h>
#include <stdint.h>
void conv(uint8_t image_in[1080*1920],uint8_t image_out[1080*1920]){

const char coefficients[3][3] = { {-1,-2,-1},
{ 0, 0, 0},
{ 1, 2, 1} };

hls::Mat<1080,1920,HLS_8UC1> src;
hls::Mat<1080,1920,HLS_8UC1> dst;
hls::AXIM2Mat<1920,uint8_t,1080,1920,HLS_8UC1>(image_in,src);
   hls::Window<3,3,char> kernel;
   for (int i=0;i<3;i++){
for (int j=0;j<3;j++){
kernel.val[i][j]=coefficients[i][j];
}
}
   hls::Point_<int> anchor = hls::Point_<int>(-1,-1);
   hls::Filter2D(src,dst,kernel,anchor);
   hls::Mat2AXIM<1920,uint8_t,1080,1920,HLS_8UC1>(dst,image_out);
}

This function must be named with your top function name, in my case it is “conv”. My plan is to get 1080x1920 HD grey image as input (from “image_in” array) and do 2D convolution on it with a hard coded 3x3 kernel. Processed image is stored in “image_out” array.

Two HLS Mat data structures were defined to store the input image and processed output image. Kernel is defined as a 3x3 array with name “coefficients”. I have defined the Sobel kernel in this example, you can change it to anything you want. But sum of kernel elements must not exceed 1, otherwise you will have to re-normalize image to values between 0,255. ( you also can define kernel as a floating point array, type of “coefficients” and “kernel” data structure must be changed according to that).

AXIM2Mat function is used to get data from the input array and store in the “src” Mat. Then “Filter2D” function is used to do 2D convolution to the input Mat with the given kernel and store in “dst” Mat. ( You con find Xilinx documentation of Filter2D function here). The point “anchor” indicates the relative position of a filtered point within the kernel, according to the similar openCV function, (-1,-1) means the center of the kernel.

After filtering, data in the “dst” Mat is transferred to the output array. That concludes designing the IP core, but we need a test bench to test whether the functionality of the IP core is what we expected.

Step 3 : Test Bench Design

Create a new file under Test bench and paste the following code. Then download a 1080x1920 image (color or grey, doesn’t matter) copy it to project folder. Add image to the test bench sources by right clicking on the Test bench bar and selecting “include files”.

Vivado HLS test bench can be considered a normal C++ function. You can call the top function(“conv” in my case) in the main function of test bench and pass on inputs and check outputs. In here, “hls_opencv.h” header file includes the normal openCV functions.

#include <hls_opencv.h>
#include <stdint.h>
#include <stdio.h>
using namespace cv;
void blur(uint8_t * image_in, uint8_t * image_out);
int main(){
   Mat im = imread("test.jpg",CV_LOAD_IMAGE_GRAYSCALE);
uint8_t image_in[1080*1920];
uint8_t image_out[1080*1920];
   memcpy(image_in,im.data,sizeof(uint8_t)*1080*1920);
   conv(image_in,image_out);
   Mat out = Mat(1080,1920,CV_8UC1,image_out);
   namedWindow("test");
imshow("test",out);
waitKey(0);

return 0;
}

What I have done here is read a 1080x1920 image named “test.jpg” (change the “imread” function arguments according to your image name) as a grey image and store it in a opencv Mat named “im”. Then 8 bit pixel values of the image is copied to a 1080x1920 unsigned char (uint8_t) array using “memcpy” function. Then “conv” function is called passing the array with input image and an empty array as arguments.

After the function is done, Mat data structure named “out”is initialized using output array of the function and displayed the image.

In test bench main function, returning 0 is really important. Because Vivado HLS use it to check whether the functional simulation was successful.

Step 4 : Execute Test bench for behavioral simulation

Lets execute our test bench. Click on the “Run C Simulation” button on the top tool bar. Then pop-up window will be displayed as the following figure.

Just select “OK” in here and C simulation will start. This will take some time depending on your resources. If everything went well, you will see a pop-up window with the processed HD image with Sobel kernel. Sobel kernel is a edge detector, therefore edges of your images will be detected in white. My original image vs resulting image is posted in following figures.

Original ( Grey version of this image)
Resulting Image

After pop-up window displays, you need to press any key to continue the simulation(because of the “waitKey(0)” function). Then log file will be displayed, if simulation was successful, content of the log file will be something similar to following content,

Compiling ../../../test.cpp in debug mode
Generating csim.exe
@I [SIM-1] CSim done with 0 errors.

Lets move onto preparing our IP core for synthesis.

Step 5 : Preparing IP Core for synthesis

We can use compiler directives to guide the compiler to synthesize the ip core the way we prefer. If we don’t use any compiler directives, Vivado HLS will analyze the code and use default directives in the appropriate positions of the code. You can see these default directives in the right window under “Directive” tab.

In this IP core, we designed input and output as 8-bit unsigned integer arrays. Lets declare these interface as AXI FULL (Memory mapped) interfaces. Click on the “conv” function on the “core.cpp” file and goto the directive view. Input and output array will be displayed in here as in the figure.

Double click on “image_in” and a window will be displayed, fill the options as in the following figure and click “OK”.

You can click on “Help” to know about what each option means. I will explain each option briefly,

  • Type of directive : Interface, because we use “image_in” as a input interface to the IP core.
  • Mode : “m_axi” represents a AXI Master Full port
  • Depth : size of our array is 1080x1920 = 2073600, this is optional, if you keep it blank, Vivado HLS will use default depth. Its better if you can specify the depth here.
  • Offset : This is to control the mapped memory location of the connected input. By selecting “slave”, we can set the memory address of the port at run-time using a AXI Lite port.
  • Bundle : Above AXI Lite port is bundled with the name of “CRTL_BUS”

Use the exact same options for “iamge_out” (other than the name of the array, ofcourse).

Few lines will be added to the source file by above process. Final source code will look like the following,

#include <hls_video.h>
#include <stdint.h>
#include <stdio.h>

void conv(uint8_t * image_in, uint8_t * image_out){
#pragma HLS INTERFACE m_axi depth=2073600 port=image_out offset=slave bundle=CRTL_BUS
#pragma HLS INTERFACE m_axi depth=2073600 port=image_in offset=slave bundle=CRTL_BUS
#pragma HLS DATAFLOW
const char coefficients[3][3] = { {-1, -2, -1},
{0,0,0},
{1,2,1}};
hls::Mat<1080,1920,HLS_8UC1> src;
hls::Mat<1080,1920,HLS_8UC1> dst;
hls::AXIM2Mat<1920,uint8_t,1080,1920,HLS_8UC1>(image_in,src);

hls::Window<3,3,char> kernel;
for (int i=0;i<3;i++){
for (int j=0;j<3;j++){
kernel.val[i][j]=coefficients[i][j];
}
}

hls::Point_<int> anchor = hls::Point_<int>(-1,-1);

hls::Filter2D(src,dst,kernel,anchor);

hls::Mat2AXIM<1920,uint8_t,1080,1920,HLS_8UC1>(dst,image_out);

}

In here, all lines start with #pragma are in a single line (length of this editor does not allow long lines).

You will notice, I have added another pragma called “DATAFLOW”. I am not gonna explain about the DATAFLOW architecture and its conditions in this article. Good explanation by Xilinx itself is available here. Basically using this pragma our timing will be improved. (You can check synthesizing with and without DATAFLOW pragma).

Lets synthesize our IP core by clicking on the “Run C Synthesis” button on the tool bar. If design is synthesized properly, a synthesis report will be displayed similar to this,

Timing Details
Resource Utilization of the design

You will see all the synthesis related details in this report. Before exporting our IP core we need to do RTL co-simulation to check the functionality differences between the C simulation and RTL simulation. Click on “Run C/RTL co-simulation” button and click “OK” in the pop-up window with “setup only” enabled.

After successfully passing this simulation, you can proceed to export the IP core. Click on the “Export RTL” button on the tool bar and click “OK”.

In above window, you can select to evaluate your design using a HDL. This will give more accurate information about utilization and timing of the design, but evaluating process will take quite a long time compared to the synthesis.

After exporting your IP core, you are done with the custom IP core design using Vivado HLS.

Next step is to design the overall hardware architecture including your IP core using Vivado. I plan to write two other articles regarding designing overall hardware on Vivado and testing it in standalone (baremetal) mode using Xilinx SDK. Another article will be focused on writing a basic userspace Linux driver to control this custom IP core from the ARM processors running Linux, on ZYNQ-7000 SoC (only valid for zynq devices).

Source Code Available at : https://github.com/sammy17/vivado_hls_tutorial

Drop a comment if you face any issues, Thank you !