Fault Tree Analysis (FTA) is a failure analysis methodology used to identify the root causes of an undesired event by modeling failure events using boolean logic. It is an analytical process that allows to model and identify unique interrelationships of causal factors leading to an undesired event. This allows the system designer to identify high-risk fault paths and eventually mitigate the risk by designing safety measures.
History and Evolution
FTA was originally developed at Bell Labs in 1962 by H.A Watson under a US Air Force contract to evaluate ICBM systems. In 1970, the US Federal Aviation Administration (FAA) changed the airworthiness guidelines to incorporate FTA as one of the methods to determine the failure probability of the system which really started the widespread adoption of the method. In 1976, the US Army incorporated FTA into its Engineering Design Handbook on Design for Reliability which further increased its awareness among the defense community. FTA certainly finds its roots in Aviation and Defense and has played a critical role in the last half-century in deploying and understanding complex systems and evaluating/reducing safety risks.
Fault Tree Analysis is a top-down methodology that is deductive in nature as it allows the analyst to traverse from an undesired top-level event to specific detailed causal factors. At the same time, FTA allows the analyst to develop fault paths across system and subsystem hierarchies for example covering faults from the conceptual level design of the system to a functional level, detailed technical level, and even at the process level. It is important to note that the technique is not used to identify hazards but the causal factors in various combinations that contribute to an already identified hazard.
Fault Tree Analysis in its graphical form is composed of events and boolean gates. Events (representing faults) and boolean gates are interlinked in a tree structure (hence the name Fault Tree) with the aim of analyzing the undesired event at the top. Following are definitions of basic FTA components and a graphical representation of each event type:
- Top Level Event: Undesirable event at the top of the fault tree and the subject of the analysis
- Intermediate Event: Events caused by one or more events and have parent and child events connected to them
- Basic Event: Leaf events (root cause) in the fault tree are called Basic Events. It can be further classified into a circle, a diamond, and a house event. A basic circle event represents a primary failure mode of a component that cannot be further developed. A diamond basic event represents an undeveloped event that can be further developed. House basic event represents an event that is supposed to be a normal operation of the system being designed.
- Gates: Represents the boolean logic symbol that connects events to make a tree. AND and OR gates are the most commonly used gates in FTA. Some other gates that can be used include Priority AND, Priority OR, XOR, and M-out-of-N
- Transfers: Allows a branch/ subtree in FT to be referenced from another part of the tree
Fault Tree Example
The below fault tree considers a System-on-Chip (SoC) used in a safety-critical application as an example system. The goal is to understand and evaluate a set of root causes that can propagate to cause an undesirable event. The undesirable event considered in this case is “Undetected failure of the SoC”. By identifying the root causes of the failures, engineers can work to improve the reliability and safety of the SoC by designing safety mechanisms or improving the design, verification/ validation process, or manufacturing by introducing appropriate safety measures.
We will build onto this Fault Tree further in the next part of the series where we will refine the events further.
Key FTA System Parameters
FTA methodology can be used as a qualitative or quantitative method of analysis, the latter also commonly known as Probabilistic Risk Evaluation. As a qualitative technique, it’s commonly used to assess parameters like cut set evaluation, common cause failure analysis, and evaluation of safety measures/ mechanisms designed in the system. Whereas quantitatively, FTA facilitates the evaluation of parameters like the probability of occurrence of the Undesired Event, the probability of occurrence of each cut set, and importance measures/ criticality of each cut set among a few other important parameters. We will dive deeper into system parameters and examples in part 2 of the Starter Kit.
To be Continued…