NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators
Large-scale neural networks are deployed on the processors having multi-core structure and organized via network on chip for process the heavy neurons. These Neural network chips are connected via chip-to-chip interconnection networks to increase the efficiency of neuron processing. In th research, the author has proposed interchip and intrachip communication methods for neural network processors. The author has implemented four connected NoC based deep neural network chips with four FPGAs to test the proposed techniques. The research concludes that the proposed interconnection networks can manage the data traffic inside the deep neural networks very efficiently.
Almost every industry is adopting highly efficient deep neural networks to make their products intelligent. These DNNs has a lot of application such as image recognition, object detection, speech recognition which involves machine intelligence. These DNNs have too many staked layers of neurons which require a very high computation power. It becomes more challenging when it needs to deploy on hardware. A lot of optimization is required to run these DNNs. GPUs, CPUs, ASICs, and FPGAs can accelerate the processing of these DNNs. But using these platforms could be energy inefficient because these platforms offer high speed at the cost of high usage of resources. Network-on-chips (NoCs) offer an energy-efficient solution to process the DNNs. NoCs processes multiple neurons parallel and data can be efficiently interchanged from one neuron to another.
NoC-based design paradigm provides:
1. Energy efficiency — Reduces the off-chip memory access.
2. Scalability — Computation resources are independent of the data ﬂow.
3. Flexibility — Handles different data ﬂows through ﬂexible interconnection.
In this blog, a lightweight and efficient chip-to-chip interconnection scheme and virtual band router optimization methods used for on-chip interconnection for NoCs are explained.
It is a chip-to-chip interconnect paradigm which includes both interchip and intrachip connections. The Architecture of NeuronLink is presented in Fig.1. It includes a physical layer, a data link layer, and a transaction layer that implements NoCs.
First packets are received by the data link layer from the transaction layer. The header part of the packet contains packet priority, multicast type, and destination address. The body flits of the packet are then stored in virtual channels (VCs). Then the credit management (CRM) element selects the VC to be sent for further processing. Then the packet arrives at the physical layer. It receives the data and commands. For addressing the issue of high-priority commands, the asynchronous handshake approach is used. Then encoder adds the packet header and check header to the data. At the receiver side, the physical medium attachment layer processes the data. To synchronize the data from the recovery clock, the elastic buffer can be used. A command is analyzed by the CRM unit in the data link layer and the data is sent to VCs subject to address and priority.
Implementation of NeuronLink in DNN Accelerator:
Fig. 2 describes the general architecture of the DNN accelerator. This DNN accelerator contains 4 chips each consist of 16 processing nodes are connected with the NeuronLink interconnect scheme. Each chip consists of a PCIe interface for high bandwidth off-chip data transmission. Every processing node contains:
- eDRAM buffers — Stores input features.
2. FourDigital Processing Units — Performs shift and add pooling and activation operation.
3. 8 analog processing units — Performs Situ MAC operations.
Each analog processing unit (ALU) consists of different crossbar arrays, ADCs and DACs.
To accelerate the Resnet18 model, it should be mapped as shown in Fig. 3. Other models can also be mapped similarly for accelerating the network by using the proposed DNN accelerator.
The circular box represents routers whereas square boxes represent processing nodes. The number inside the boxes represents the number of layer in the resnet18 model. Arrows in the map show the transfer direction or movement of the data. The green circle shows the transfer of data from the local processing node to DRAM.
The proposed interconnect scheme, NeuronLink has low power consumption and simple circuit complexity compared with other interconnects. Also, the cost of implementation of NeuronLink is also low and it simplifies the data flow. NeuronLink based DNN accelerators have better power efficiency and area efficiency than the previous NoC based DNN accelerators such as Eyeriss-v2 and DaDianNao.
1. Shanlin Xiao, et. Al, “NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Sept. 2020, vol. 28, no. 9, pp. 1966–1978.
2. K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Oct. 2016, pp. 630–645.
3. M. F. Reza and P. Ampadu, “Energy-efficient and high-performance NoC architecture and mapping solution for deep neural networks,” in Proc. 13th IEEE/ACM Int. Symp. Netw.-Chip, Oct. 2019, pp. 1–8.