An Ultra-Low-Power FPGA for IoT Applications

Published in

Edge Computing in Internet Of Things Using FPGA

4 min readMar 31, 2021

As there is fast development of Internet of things (IOT), it requires hardware which is energy efficient as well as flexible. Energy-efficient hardware is needed due to the strict energy budget of most of the IoT applications. Also hardware flexibility is required to upgrade the functionality and algorithms of the systems that have been deployed. Hence, a good solution to this problem is FPGA i.e Field-Programmable-Gate-Array. So a good FPGA architecture is proposed which is divided into 3 parts. A: Low Swing Interconnect B: Folded Switch box C: Per-Path voltage scaling and power gating.

FPGA Architecture

A: Low-Swing Interconnect — In this the circuit supply voltage (Vdd) is scaled and brought down to the transistor threshold voltage helps to reduce energy. However, doing so results in large interconnect delay due to the exponentially decreased current of drivers and buffers in sub-threshold.To reduce energy while maintaining acceptable interconnect delay, a low-swing design is made that replaces the buffers structure with pass-gates (PGs). The signal swing along the interconnect paths is reduced due to the transfer characteristics of the PGs in sub-threshold.As a result, this design reduces the energy of FPGA interconnect by 43%.

B: Folded Switch Box — The widely-used Subset SB topology is shown on the left side of Fig. 1. It is built up with switch points (SPs) that appear on each track of the interconnect channel. A Folded-Subset SB layout is proposed which breaks the alignment of the switch points and squeezes them into a minimum square area, as shown on the right side of Fig. 1.

The idea is that it firstly create a custom layout of seven SPs in a square shape, as shown on the left side of Fig. 2. Following by the tile-up of multiple of these squares to create the complete SB, as shown on the right side of Fig. 2.

By comparing the layout of multiple SBs, it is observed that the proposed Folded-Subset is more effective in area reduction when the interconnect has higher channel width (the number of tracks in the interconnect channel).The reduced area also leads 77.2% delay reduction and 83.3% energy reduction of the interconnect due to reduced parasitic RC.

C. Per-Path Voltage Scaling & Power-Gating — Per-path voltage scaling is implemented on FPGA interconnect by assigning half of the drivers for the interconnect fabric at the outputs of configurable logic blocks . In the place-and-route (P&R) process, the interconnect between logic blocks on the critical path is implemented by drivers attaching to Vddh. .According to simulations, these efforts further reduces FPGA energy by 20.4% ~ 67.9%.

Methodology and CAD flow

Although VPR/VTR can generate notional place and route information for benchmarks, it cannot generate configuration bit-streams necessary for implementing benchmarks onto a physical FPGA chip. To solve this problem a custom CAD flow needs to be built enabling schematic and configuration bit-stream generation, as shown in Fig. 3.

This tool takes a parameter file and benchmarks written in Verilog as inputs. The parameter file includes detailed architecture info and circuit info. The custom CAD flow also supports fast power estimation of the proposed FPGA. It takes advantage of the power estimation feature provided by VTR, but using updated parameters.

Chip measurements

To evaluate the proposed design, A FPGA operating at near-threshold in 130nm CMOS is fabricated. A 4-input look-up-tables and a clustering of 8 is used to build the CLBs. The total number of look-up-tables on the chip is 512, which is large enough to implement meaningful applications. A channel width of 84 allows chip to route significant benchmarks.

Results

When compared the proposed FPGA with five existing low-power FPGAs in Table given below. The proposed FPGA consumes 2.5x lower static power than “Grossmann’s design”, even though the proposed FPGA is 4x larger and measured at a higher supply voltage. For fair comparisons, the delay, power, and energy of all the existing works to the same supply voltage (0.6V) is normalised, technology node (130nm CMOS), look-up-table count (512 4-input) as the proposed FPGA. When implementing a 4bit-adder, the measured energy of the proposed FPGA is 15% less than Microsemi IGLOO, and 13x less than Lattice iCE40.

COMPARISONS OF THE PROPOSED FPGA AND THE EXISTING LOW-POWER FPGAS

Conclusion

A fully programmable ultra-low-power FPGA is proposed operating at near-threshold with custom CAD flow support. The FPGA is fabricated with a 512-look-up-table FPGA chip in 130nm CMOS using low-power techniques including low-swing design, per-path voltage scaling, power-gating, and area-optimized SBs. When implementing a 4bit-adder, the measured energy of the proposed FPGA is 15% lower than state-of-the-art. When compared to the low-power commercial products, the proposed FPGA is on average 315x lower power and 75x lower energy, enabling flexible hardware of IoT applications.