Application of Micro Rollback in VLSI Systems

CELSY PHILLIPS
Fault Tolerance in VLSI Circuits
3 min readMar 28, 2021

We know, a Micro Rollback is used to bring a system back to a few cycles to a state which was reached in the past. In order to carry out that operation, it is necessary to save the state of the subsystem (checkpoint) at each cycle boundary. This is done through a process called checkpointing. This is equivalent of taking a “snapshot” of the state of the system at every cycle, while micro rollback restores the state of a subsystem by overwriting the current state with the snapshot which was taken in the past.

In this segment, we will see how Micro rollback is applied in VLSI systems.

Application of Micro Rollback in VLSI systems

In order to achieve high degree of fault tolerance is the ability to detect errors as soon as they occur. In highly reliable systems, this is generally implemented by adding checkers and isolation circuits in the communication paths from each module to the rest of the system. This additional circuitry results in lower system performance by requiring either longer clock cycles or additional pipeline stages. Here, the technique of micro rollback comes into picture.

Micro rollback is used to allow each module in the system to accept inputs and process them without interrupting for detection and correction circuits to operate on data. Error detection is performed in parallel with the transmission and consumption of data by modules throughout the system. This removes the checkers from the critical path of the system and permits the use of checkers that are area efficient but comparatively slow, without compromising system performance. With micro rollback it is possible to implement error detection techniques in which the hardware which is used to compute a result may then be used to verify the validity of the result and potentially generate an error indication a few cycles later.

A common situation where parallel error checks can be implemented is at the interface between memory and processors or coprocessors in the system, as you can see in Figure 2. Minimizing the access time to memory is often critical to achieving high performance. With micro rollback, the checkers be removed from this critical path.

Another use of parallel checks is in duplex systems where error detection is carried out by running two identical subsystems in parallel and comparing their outputs. This can be seen in the following figure.

With this technique, the two subsystems may be on different chips and there is a significant delay in getting both outputs to the comparator and obtaining the results of the comparison. With micro rollback, the receiver of data from the duplex subsystem can begin to process it before the output of the comparator is used to determine if the data is valid.

In this way, micro rollback has been known to facilitate the implementation of high performance VLSI systems which makes it highly fault tolerant by allowing a variety of concurrent error detection and correction techniques to be used with minimal performance penalty.

References:

  1. Somashekhar, Vikas Maheshwari, and R. P. Singh. “Analysis of Micro Inversion to Improve Fault Tolerance in High Speed VLSI Circuits.” International Research Journal of Engineering and Technology (IRJET) 6.03 (2019): 5041–5044.
  2. Tremblay, Marc, and Yuval Tamir. Fault-tolerance for high-performance multi-module VLSI systems using micro rollback. UCLA, Computer Science Department, 1988.

--

--