Fault Tolerance In High-Speed VLSI Circuits

Parth Pedgaonkar
Fault Tolerance in VLSI Circuits
4 min readFeb 28, 2021

The sustained push toward smaller and smaller technology sizes has reached a point where device reliability has moved to the forefront of concerns for next-generation designs. Silicon failure mechanisms, such as transistor wear-out and manufacturing defects, are a growing challenge that threatens the yield and product lifetime of future systems.

Some sub-micron devices included in a VLSI chip are bound to have some imperfections. They result in yield-reducing manufacturing defects. Therefore, there has come an increasing need for defect tolerance. Increasing attention is being paid to the development and use of defect-tolerance techniques for yield enhancement, to complement existing efforts at the manufacturing stage.

ERRORS IN VLSI CIRCUITS

Failures in VLSI systems might result from varied types of faults that can be classified as either soft (transient) or hardware ones. Transient faults are induced by temporary environmental surroundings, such as cosmic rays, EMI and for example, cause information alteration in memory elements. Permanent faults are the result of the irreversible device and circuit changes, such as the following: Electromigration, which causes thinning and eventual open circuit of metal tracks. Hot carrier effect, which causes a shift in device threshold voltage and it does convey conductance. Time dependant dielectric breakdown, which causes gate oxide to substrate short circuit.

FAULT TOLERANCE OF VLSI DESIGNS

Increasing the yield of ICs proves especially important for new designs and manufacturing processes, which have a high density of process-induced defects and consequently a low yield. Yield improvements of early prototypes of an IC can reduce the product’s introduction time and determine its commercial success. Defect tolerance has proved successful in such cases, and spectacular 30-fold increases in yield have been reported.

Fault Tolerance Techniques

In order to get fault tolerance, a primary requirement is that transient faults have to be detected. Researchers have proposed several error detection techniques against transient faults: watchdogs, assertions, signatures, duplication, and memory protection codes.

Signatures: In this technique, we assign pre-computed — check symbols that indicate whether or not a fault has occurred during the operations of that logic. Signatures can be implemented either in hardware, as a parallel test unit, or in software.

Concurrent error detection technique (CED): This technique automatically extracts the control conditions under which these paths are sensitized and converts these conditions into assertions. The CED technique protects the combinational portion of the circuit against transient errors. The technique introduces an Assertion Checker which takes as inputs and the outputs of the combinational circuit and which gives out a signal whether they conform to a series of assertions.

Block Diagram of CED

Time Redundancy: This technique is a self-checking approach and is used to detect transient faults. This method uses duplicate hardware in addition to the original hardware to perform the same operation at different intervals of time. The fault is detected by comparing the outputs of both original and duplicate hardware. If the outputs of both the hardware are found to be the same, it represents a fault-free condition. However, if the outputs of both the hardware are different, it represents the faulty condition. If the first calculation result is faulty and it is used for other computations, also makes the subsequent modules faulty.

Time Redundancy Technique.

Hardware Redundancy: Triple modular redundancy(TMR) and Double modular redundancy(DMR) are commonly used hardware redundancy approaches.

The double modular redundancy approach is used to detect a single fault at a time. DMRcompares the outputs of operation performed by the original hardware and duplicate hardware in parallel.

TMR method is used to detect a single fault. TMR requires three similar modules in parallel to detect the fault. A fault is detected if anyone output of the modules is different. But this method is not able in demonstrating the exact location of the fault.

The voting logic is the majority voter which takes the majority of inputs to be the output value. The inconsistency in outputs will be caught and corrected by voting logic.

These are few fault tolerance techniques used in High-Speed VLSI circuits. The presence of the faults may destroy the functionality of the system. Therefore, these techniques prove to be very beneficial while designing a VLSI circuit.

References:

  1. A Study Of Fault Tolerance In High-Speed VLSI Circuits by Somashekhar, Dr.Vikas Maheshwari, Dr.R. P. Singh. INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 08, AUGUST 2019
  2. https://bit.ly/2NOONW3
  3. https://bit.ly/2OaX1Y9

--

--