Optimizing FPGA-based Accelerator Design For Deep Convolutional Neural Networks

Loop unrolling can be utilized to improve the utilization of vast computation resources in FPGA devices. Unrolling along various loop dimensions will create different implementation variants. Whether and to what degree 2 unrolled performance instances contribute to data will alter the complexity of produced hardware, and ultimately affect the number of unrolled versions and the hardware operation frequency. The data sharing interactions between various loop iterations of a loop dimension on a given array can be categorized into three categories

1. Irrelevant: If a loop iterator ik does not exist in any access functions of array A, the subsequent loop dimension is unrelated to array A.

2. Independent: If the union of data space retrieved on an array A is totally discrete along a particular loop dimension ik, or for any provided two distinctive parameters p1 and p2, the data read by DS (A, ik = p1) = ꓴ Image (FSA, (Ds ꓵ ik = p2)), the loop dimension ik is autonomous of array A.

3. Dependent: If the union of data space retrieved on an array A is not discrete along a specific loop dimension ik, the loop dimension ik is reliant on array A

Loop pipelining is a crucial optimization technique in high-level synthesis to enhance system throughput by overlapping the performance of functions from various loop iterations. The throughput accomplished is restricted both by resource constraints and data dependencies in the product. Loop-carried dependency will avoid loops being fully pipelined. Polyhedral-based optimization framework can be utilized to execute automatic loop transformation to permute the parallel loop levels to the innermost quantities to prevent loop brought dependence.

--

--