Overview of System Reliability Modeling

Published in

Musings on Reliability and Maintenance Topics

4 min readMar 6, 2017

Building and Using a Reliability Model

From the simplest to the most complex system, building and using a reliability model permits the entire team to make better decisions.

Understanding and monitoring system reliability involves knowing both:

the reliability of elements within the system,
as well as how the elements relate to each other reliability-wise.

We use system reliability models to identify weak links, to focus resources, to meet our desired reliability goals.

Being able to build the right model to best meet your team’s needs is one of your roles as a reliability professional.

Reliability Block Diagrams (RBD)

Often depicting elements within a system as a block within a diagram, RBD models provide a graphical and mathematical model of the system reliability given the reliability and relationships of the elements within the system.

The diagram may not reflect the functional diagram of a system as it focuses on the reliability relationships between components or subsystems. For example, within a series system, the RBD will show a string of blocks such that anyone block failing results in the system failing.

It looks like a chain, hence the common analogy of “weakest link”.

Another example has two elements in parallel, such that if one fails the other keeps the system operating. Parallel structures can get complicated. There are:

Standby Redundancy with Equal Failure Rates and Perfect Switching

Standby Redundancy, Equal Failure Rates, Imperfect Switching

Of course, there are unequal failure rate situations, as well as many other situations.

Parallel structures may also include more than two elements. The k out of n structure means the system continues to operate if k of the n parallel elements remain functional, thus permitting n — k elements to fail without system failure.

Fault Tree Analysis (FTA)

A fault tree analysis (FTA) is a logical, graphical diagram that starts with an unwanted, undesirable, or anomalous state of a system.

The diagram then lays out the many possible faults and combination of faults within the subsystems, components, assemblies, software, and parts comprising the system which may lead to the top level unwanted fault condition.

The key to these models being effective is to select the important top-level failures or faults to model. For systems with more than a few top-level faults of concern, then RBD may be a better starting point.

FTA models use a set of symbols to relate system elements, events, etc. The creation of a useful FTA is not difficult, yet may take some time to fully depict all the paths that may lead to failure.

FTA provides the design team a way to organize the relationships between elements and events that may lead (or prevent or mitigate) failures.

FTA is a useful tool for your reliability program.

Success Tree Analysis (STA)

Very similar to an FTA except the top event is a success state rather than a failure/fault state.

Instead of focusing on how the system can fail the model focuses on how the elements of a system relate, including events, such that the system functions as expected.

Markov models

Let’s assume that the future reliability performance of a system relies on the current state of the system, not on its history. This memoryless property is called a Markovian property.

Markov models work well with complex repairable systems when we’re interested in long-term average reliability and availability values.

A nice description of Markov Models is by Kevin Brown with an early version of the book “Markov Models and Reliability”

One of the notable strengths of Markov models for reliability analysis is that they can account for repairs as well as failures. This makes the technique particularly useful for assessing the long-term average reliability of one or more devices with established maintenance and repair strategies.

Petri net models

A Petri net graph is a depiction of a system using a symbolic language. The modeling permits the analysis of complex systems or networks of systems.

It is possible to include elements of the system that are neither functional or failed. In other words, it permits modeling a system when one or more of the elements are in a degraded state or under repair.

Petri net modeling is useful when the repair/restore times are long compared to operating times, as reliability block diagrams and fault tree analysis approach assume short or insignificant repair times, in most cases.

Failure mechanism models or Physics of Failure (PoF) models

Elements, specifically components, may have one or more ways they can fail. Sometimes there are known and dominate failure mechanisms.

Modeling these mechanisms permits us to evaluate design or use changes, differences in use conditions, etc.

Models may be derived from empirical data for a specific failure mechanism and set of use conditions. Or, it may be analytically derived and experimentally verified.

PoF models permit you to model specific failure mechanisms in detail. If you know your customers may use the product in different ways or environments, then the PoF model allows you to estimate failure rates or distributions for each customer group.

Summary

You have options when modeling your system concerning reliability.

Simple systems will do fine with basic RBD models supplemented by PoF models. Complex or very high system availability systems often require the use of Markov or Petri Net models and may require specialized resources to create and maintain the system reliability models.

The model is not useful unless it is useful for decision making across the team. Creating a model should support the team’s ability to focus resources, make design decisions, and evaluate risks.

Which model do you typically use and how well is it working for you? Leave a comment below.

Originally published at Accendo Reliability.