👨🏼‍💻Core Components of Ascend Processors

Alper Balmumcu
Huawei Developers
Published in
6 min readFeb 7, 2023
Core Components of Ascend Processors

Introduction

Hi guys! Today we are going to talk about ATC, AIPP, DVPP, and ACL. We will touch on what these terms are, and how they work, and at the end of the article we will make an inference example for using ACL usage with Python and C++.

Let’s begin!

Ascend Tensor Compiler (ATC)

So, What is ATC?

If you want to run a model on the Ascend AI Processor, you may need to convert the model so they can be compatible with the Da Vinci Architecture.

ATC Architecture and Workflow

The model converter refers to the Ascend Tensor Compiler(ATC). It is essentially a tool of the background console where you can convert models with ATC commands. Specifically saying, it can convert the network models and single-operator .json files into offline models supported by the Ascend AI Processor. This process covers, operator scheduling optimization, weight data rearrangement, memory optimization, and offloaded model pre-processing.

Also, half-precision computing supports the platform because numbers of type half are stored using 16 bits, they require less memory than numbers of type single, which uses 32 bits, or double, which uses 64 bits.

Currently, ATC supports Caffe, MindSpore, TensorFlow, and ONNX models.

I can hear the question, How can we use it? So, ATC commands could be changing from model to model but here is an example usage for an ONNX model.

Artificial Intelligence Pre-Processing (AIPP)

So, can we do pre-processing on the hardware for the image-based dataset when converting the model using ATC? Here is the answer.

When you convert a model, you can enable it to pre-process data that is provided by AIPP.

Artificial Intelligence Pre-Processing (AIPP) is introduced for AI Core-based image pre-processing including image resizing, color space conversion (CSC), and mean subtraction and factor multiplication (for pixel changing), prior to model inference. AIPP is a hardware image pre-processing function provided by Ascend 310.

The pre-processing includes CSC, image normalization (by subtracting the mean value or multiplying a coefficient), image cropping (by specifying the start point of cropping and cropping the image to the size required by the neural network), and much more.

Further Information: AIPP, AIPP_Configuration

Digital Vision Pre-Processing (DVPP)

Also, we can do pre-processing for video datasets. Let’s see what the DVPP is…

The Digital Vision Pre-Processing is an image pre-processing hardware acceleration module provided by Ascend 310. DVPP module pre-processes images through encoding, decoding, and format conversion. DVPP converts the video or image data input from the system memory and network into a format supported by the Ascend AI Processors before neural network computing by the Da Vinci Architecture.

Pre-processing for video datasets can be done on NPU or CPU. If you are using video pre-processing on the CPU for example using OpenCV, these processes will be run on the CPU. And since we know OpenCV has high memory consumption, the same pre-processing can be done using DVPP with lower consumption and faster on NPU.

This module integrates the following six functions.

Format conversion, image cropping, and scaling (by the VPC)
H.264/H.265 video decoding (by the VDEC)
H.264/H.265 video encoding (by the VENC)
JPEG image decoding (by the JPEGD)
JPEG image encoding (by the JPEGE)
PNG image decoding (by the PNGD)

Further Information: Usage of DVPP, Usage of DVPP APIs, DVPP I/O Memory Allocation Modes

Ascend Computing Language ACL

We have set up the development environment and learned how to convert and prepare models in the section so far. The next step is to write the inference code.

ACL provides a collection of interfaces to write the inference code. Basically, ACL is short for Ascend Computing Language. ACL provides a collection of C++ interfaces for users to manage devices, contexts, streams, and memory, load and execute the model or operator, and process media data. Many interfaces also support C language. Users can develop deep neural network applications for object recognition, image classification, and much more.

ACL Structure
ACL Structure

In the actual running process, the ACL calls an interface called the graph engine executor to execute and load models and operators. Next, it calls Runtime to manage resources such as devices, contexts, streams, and memory. The bottom layer is the computing resources which are the hardware computing basis of the Ascend AI Processor. It mainly performs the matrix-related computation of neural networks, general computation, and execution control of control operators, scalars, and vectors as well as image and video data pre-processing.

Python Ascend Computing Language (PyACL) is a Python API library encapsulated using CPython based on ACL. Users also can use Python to manage the running and resources of the Ascend AI processors.

PyACL Example Usage

So, how is ACL structure performed on Python API?

As an example, we are using the YOLOv4 model inference in this repo. Let’s look inside the main function to see how ACL is used.

main function

As we can see in line 2, the AclLiteResource function is defined and called init in line 3. To understand what's happening when initialization of ACL Resource, let’s dig in and find out!

So the first thing is importing the ACL module and initializing pyACL. Then, allocate runtime resources. After that Line 6 in the main function, the model is loading and the model description is obtained.

So right now, we are ready to execute and start inference.

We can basically call the execute function, that shown in main function line 9, from the model for starting the process. But first, you need to pre-process your input. We skip this step because of the pre-processing process varies from model to model.

If we look inside the called function, we will see the model execution using with PyACL structure as shown below.

The last step is post-processing. As I mention above pre-processing is varies from model to model, and the same goes for post-processing. You should prepare it for your model necessity.

ACL Example Usage

Now, it’s time for our C++ example!

We will use similar functions as in the python example we did above. Basically, the API is different, but the general purpose of the functions is the same.

As an example, we are using the YOLOv4 model inference in this repo as we did before in the PyACL sample. This way will be better to understand and comparison of Python and C++ APIs usage.

main function

The whole process is the same. We are initilizing ACL, loading the model, and then executing the inference. As I mentioned above in the PyACL part, you should do the post-processing and pre-processing parts depending on your model.

Let’s dive!

In the main function 3rd line, we are initializing ACL and allocating runtime resources. If we look inside of “modelProcess.Init(0)”, we can see the ACL functions as shown below.

With “modelProcess.LoadModel()” function in the main function 7th line, we are loading the model and obtaining the model description as shown below.

Then, we are ready for inference. We will see the model execution using ACL structure, If we look inside of the “modelProcess.Execute()”. So, here is the execution command;

The inference is done!

Conclusion

In this article, we have learned about ATC, AIPP, and DVPP for model conversion on the hardware. Also briefly explained ACL and how we build the structure for inference using C++ and Python APIs separately. In our next article, we will dive deeper into understanding ACL and its usage with examples.

Congrats! if you followed this tutorial till the end.

Stay tuned for our new article!

“Opportunities don’t happen, you create them” -Chris Grosser

--

--