DLAU: A Scalable Deep Learning Accelerator Unit on FPGA

Published in

FPGA based Deep learning Application

2 min readApr 24, 2021

We all know that, Deep learning uses a multi-layer neural network model. It requires multi-layers to extract high-level features. These features are a combination of low level abstractions to find the distributed data features which are used to solve complex problems in ML. Deep Neural Networks (DNNs) and Convolution Neural Networks (CNNs) are the most widely used neural models for deep learning. These models are known for capability in solving picture recognition, voice recognition and other complex machine learning tasks..

That being said, as the accuracy and complexity requirements for practical applications develop, the size of neural networks increases at an exponential rate. As a result, designing high-performance deep learning networks with low power cost, especially for large-scale deep learning neural network models, makes it difficult.

The Deep Learning Accelerator Unit (DLAU) is a scalable deep learning accelerator unit which accelerates the kernel computational parts of deep learning algorithms. To reduce memory transfer operations it uses tile methods, FIFO buffers, and pipelines. It also reuses computing units to implement large-size neural networks.

DLAU ARCHITECTURE AND EXECUTION MODEL -

The DLAU system architecture contains an embedded processor, DDR3 memory controller, DMA module, and DLAU accelerator. The DLAU consists of 3 processing units organized in a pipeline manner:

1.Tiled Matrix Multiplication Unit (TMMU),

2.Part Sum Accumulation Unit (PSAU)

3. Activation Function Acceleration Unit (AFAU)

DLAU reads the tiled data from memory using DMA, computes for each of the three processing units in turn, and writes the results back to memory. FIFO Buffer, Tiled Techniques, and Pipeline Accelerator are all key features of the DLAU accelerator architecture.

DLAU is a FPGA-based scalable and modular deep learning accelerator. Three pipelined processing units are used in the DLAU, that can be reused for large-scale neural networks. The input node data is divided into smaller sets using tile techniques, and the arithmetic logic is time-shared to compute repeatedly.

DLAU: A Scalable Deep Learning Accelerator Unit on FPGA

Written by SANIA SHINDE