Protection against errors in the FPGA configuration memory

Vojtěch Pail
5 min readJul 30, 2019

--

Motivation

Nowadays, FPGAs become extremely popular because of their flexibility. FPGA is useful when ASIC circuit speed and software flexibility is needed. FPGA in contrast with ASIC can be reconfigured — even remotely. FPGA configuration — bitstream — is stored in the configuration memory. Because of radiation from space and other disturbances, radiation-induced errors (single event upset, SEU) can occur. SEU can flip the bits in configuration memory to the opposite value (BF) which can lead to malfunction of the device function. Several methods can be used to detect and repair or mitigate the errors caused by BF. Common methods are using error detection/correction codes (EDC / ECC) or hardware redundancy. In this paper, the effect of BF on circuit outputs will be simulated and protection against errors caused by BF will be discussed.

FPGA

Theoretical background

There are several ways to secure unit functionality. The most common is redundancy in space, redundancy in time or informational redundancy.

Redundancy in space is implemented by hardware redundancy. The most common is TMR. Three units are used instead of one unit. The output of these units is connected to the voter, which in the case of mismatch selects the majority value.

TMR

Redundancy in time is most often realized by repeated calculation.

Information redundancy adds additional information to detect or correct an error. A typical example is EDC / ECC.

EDC
EDC is used to detect errors. The simplest EDC is even parity. Even parity adds one bit to the information so that the number of 1’s bits is even.

ECC
ECCs are used to correct errors. Hamming code is most often used to correct one error. RS, RM, BCH, and other codes can be used to correct multiple errors.

Errors are wrong output signals caused by a malfunctioning system. Errors could be hard or soft.

Hard errors are permanent, soft errors can be repaired.

Related work

Impact of SEU on configuration memory of FPGA is discussed in [1, 2, 3, 4]

TMR design for error mitigation is discussed in [5, 6, 7, 8, 9].

Because of TMR high area overhead, there are many works about DMR, which is based only on two units and additional logic. [10, 11, 12, 13, 14]

First results

To discover the impact of BF on the function of circuits a simulation has been made. Benchmark circuits (MCNC[15], ISCAS[16], IWLS93[17], Adders[18]) up to 21 inputs have been used in this simulation. For every BF an exhaustive circuit test was made and the number of affected outputs has been watched. The simulation results are shown in the table below. Name stands for the name of benchmark, cntx means the number of fails affecting x outputs, cnt> are the number of fails affecting 5 and more outputs, in and out are numbers of input and output respectively, LUTs means the number of LUTs in circuit and lvl is the number of LUTs levels in the design.

Simulation results

The simulation has shown that some circuits are affected by BF only on a few outputs, while in others the effect on outputs is large. An adder is a typical example of a circuit where carry propagating to the upper bits causes errors on multiple outputs. Because the complexity of ECC/EDC coders and decoders is usually increasing with the number of errors to be detected or corrected, the huge number of errors are difficult to correct using ECC.

Next steps

Attention will be paid to the internal circuit structure, especially for circuits where BF leads to multiple output errors. The goal is to determine which components are causing these errors and how to design the circuit to prevent these effects. After reducing the number of errors, a way to eliminate them with minimal area overhead will be sought. As one of the possible elimination of errors, NMR (N-modular redundancy) with a predictor and ECC will be tested.

References

[1] QuickLogic Corporation.: “Single Event Upsets in FPGAs”, www.quicklogic.com, 2003.

[2] Bellato, M., Bernardi, P., Bortalato, D., Candelaro, A., Ceschia, M., Paccagnella, A., Rebaudego, M., Sonza Reorda, M., Violante, M., Zambolin, P.: “Evaluating the effects of SEUs affecting the configuration memory of an SRAM-based FPGA.” Design Automation Event for Electronic System in Europe 2004, pp. 584–589.

[3] Graham, P., Caffrey, M., Zimmerman, J., Sundararajan, P., Johnson, E., Patterson, C.:”Consequences and Categories of SRAM FPGA Configuration SEUs”, Military and Aerospace Programmable Logic Devices International Conference, Washington DC, MAPLD 2003 Paper C6.

[4] Bernardi, P., Reorda, M. S., Sterpone, L., Violante, M.,: “On the evaluation of SEU sensitiveness in SRAM-based FPGAs,” IOLTS2004: IEEE International On-Line Testing Symposium, pp. 115–120, 2004.

[5] Allen, G., Edmonds, L., Swift, G., Carmichael, C., Tseng, C. W., Heldt,
K., Anderson, S., Coe, M., 2011. Single Event Test Methodologies and
System Error Rate Analysis for Triple Modular Redundant Field Programmable Gate Arrays. IEEE Transactions on Nuclear Science 58 (3),1040–1046.

[6] Fras, M., Kroha, H., Loeben, J., Reimann, O., Richter, R., Weber, B.,
2010. Use of Triple Modular Redundancy (TMR) Technology in FPGAs
for the Reduction of Faults due to Radiation in the Readout of the ATLAS
Monitored Drift Tube (MDT) Chambers. In: Symposium on Nuclear
Science Symposium and Medical Imaging Conference (NSS/MIC).
pp. 834–837.

[7] Sterpone, L., Violante, M.: “A design flow for protecting FPGA-based systems against single event upsets “, DFT2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 436–444, 2005.

[8] Nakahara, K., Kouyama, S., Izumi, T., Ochi, H., Nakamura, Y.: “Autonomousrepair cell for fault tolerant dynamic-reconfigurable devices”, In Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, Monterey, California, USA, February 22–24, FPGA ’06, pp. 224- 224, 2006.

[9] Berg, M.: “Fault Tolerance Implementation within SRAM Based FPGA Design Based upon the Increased Level of Single Event Upset Susceptibility”, In 72 Proceedings of the 12th IEEE International On-Line Testing Symposium, IOLTS’06, pp. 89–91, July 2006.

[10] Golander, A., Weiss, S., Ronen, R., 2009. Synchronizing Redundant
Cores in a Dynamic DMR Multicore Architecture. IEEE Transactions on
Circuits and Systems II: Express Briefs 56 (6), 474–478.

[11] Lima, F., Carro, L., Reis, R., 2003. Reducing Pin and Area Overhead
in Fault-tolerant FPGA-based Designs. In: International Symposium on
Field Programmable Gate Arrays (FPGA). ACM, pp. 108–117.

[12] Lima, F., Carro, L., Reis, R.: “Designing Fault Tolerant Systems into SRAM-based FPGAs” In Proceedings of the 40th Design Automation Conference, DAC’03, pp. 650, June 2003.

[13] Mitra, S., Huang, W.-J., Saxena, R., N., Yu, S.-Y., McCluskey, J., E.: “Reconfigurable Architecture for Autonomous Self-Repair”, IEEE Design and Test of Computers, pp. 228–240, May 2004.

[14] Kubalík, P.: “Design of Self Checking Circuits Based on FPGAs”
Ph.D. Thesis Prague: CTU, Faculty of Electrical Engineering, Department of Computer Science and Engineering. 2007. 71 p.

[15] Yang, S.: “Logic Synthesis and Optimization Benchmarks User Guide Version 3.0”, https://ddd.fit.cvut.cz/prj/Benchmarks/LGSynth91.pdf

[16] “IWLS 2005 Benchmarks”, http://iwls.org/iwls2005/benchmarks.html

[17] McElvain, K.: “ IWLS’93 Benchmark Set: Version 4.0”, Mentor Graphics, 1993, https://ddd.fit.cvut.cz/prj/Benchmarks/IWLS93.pdf

[18] Fišer, P.: “Generically constructed 1–17 bit ripple-carry adders.”, https://ddd.fit.cvut.cz/prj/Benchmarks/

--

--