An Integrated Fault Tolerance Technique for Combinational Circuits

Parth Pedgaonkar
Fault Tolerance in VLSI Circuits
4 min readMar 31, 2021

With fabrication technology reaching nano levels, systems are exposed to higher susceptibility to soft errors. Thus, the development of effective techniques for designing soft error-tolerant systems is of high importance.

Soft errors can arise due to high-energy particles, coupling, power supply noise, leakage and temporal circuit variations. A soft error leads to transient error(s) which can last for one or several clock cycles. A single event transient occurs when a charged particle hits the combinational logic resulting in a transient current pulse. This can result in an erroneous value at a gate output if the transient has enough width and magnitude. A single event transient becomes a single event upset if the erroneous value is latched at a memory element.

Reliability in systems can be achieved by adding redundancy. Redundancy can be added at the module level, gate level, transistor-level or at the software level. One of the well-known redundancy based fault tolerance techniques is Triple Modular Redundancy (TMR) which we discussed in the last article.

Soft error protection of combinational logic can also be achieved by adding redundancy at the transistor level. Nicolaidis proposed a scheme where all gates in a circuit are duplicated except the last stage gates. The last stage gates are implemented by duplicating each transistor and connecting the duplicate transistors in series with each transistor fed by a different copy of the duplicate gates. This makes the last stage gates act as state preserving gates in case the two duplicate inputs feeding each transistor pair are different. Selective hardening techniques, as the name suggests, protect only the most sensitive gates of the circuit.

In the proposed implication-based fault tolerance technique, Initially, the circuit is simulated with 1 million random input patterns using HOPE to get the probability of having a value of 1 and the probability of having a value of 0 and stuck-at (i.e., stuck-at-0 and stuck-at-1) fault detection probabilities for all gates in the circuit. Next, the set of source gates S and the set of target gates T are identified. To reduce the computation time and to focus on the identification of potentially useful implications, only source gates that have a probability of output value i.e., probability of zero (P0) or probability of one (P1), greater than or equal to Th1 are selected.

There are two main methods to identify implications in a circuit and any one of them can be employed in this technique. Direct implications are identified by assigning a value at a gate and iteratively performing backward justification and forward propagation until every unjustified gate is either justified or there exist more than one possible justification for it. FAN is a well-known algorithm to discover direct implications in a circuit. The identification of indirect implications is much harder than the direct implications and implications are identified through learning by injecting temporary values at certain gates in the circuit and then examining their logical consequences.

Once an implication is selected, a FRW is added between the source and the destination gate. A masking gate is a gate that is connected to the output of the target gate of an implication. In the proposed technique, a FRW can be added to either the masking gate or the target gate. When an implication is added to the masking gate, the masking gate is considered as the new target gate.

This proposed technique achieves a higher reduction in SER than the earlier approaches for all the compared circuits. This is attributed to several factors including the criteria for selecting the target gates to be protected. In addition, the criteria used for assessing the gain due to adding an implication FRW has an impact on the selected implications. This approach attempts to improve the accuracy by identifying the impacted gates along an implication path and taking fanouts of gates into consideration. Furthermore, some approaches limit the addition of FRWs to two wires for any target gate and only uses implications that do not result in adding extra inverters. While this could reduce the area and delay overhead, it could have a limitation on enhancing circuit soft error tolerance.

In this work, an integrated fault tolerance technique is proposed based on the combined application of an implication-based fault tolerance technique and selective transistor redundancy technique.

An enhanced implication-based fault tolerance technique is proposed. In order to reduce implication learning time, implications are identified between a set of candidate source and target gates. Then, for each implication, the gain in reduction of gate fault detection profitabilities is estimated. The implication with the highest gain is selected and its corresponding functionally redundant wire is added. The process is repeated until the gain is less than a given threshold. In comparison to existing implication based fault tolerance techniques, the proposed fault-tolerant technique achieves higher soft error rate reduction for the name number of functionally redundant wires added based on implications.

The integrated application of the proposed implication based fault tolerance technique and the selective transistor redundancy technique achieves significantly lower area overhead in comparison to applying the selective transistor redundancy technique alone with the same achieved reliabilities.

--

--