Failure Modes And Effects Analysis

Esteban Spina
Globant
Published in
5 min readSep 15, 2022

--

It is essential to thoroughly analyze the cause-and-effect process and determine the changes needed for the different systems. It’s also essential to plan to predict the effects of your solution. This way, we can identify potential failures before they happen.

One way of predicting potential failures is to use Failure Modes And Effects Analysis (FMEA). This tool builds on the idea of risk analysis to identify points where a solution could fail. FMEA is also a great system to implement across your organization; the more systems and processes that use FMEA at the start, the less likely you will have problems that need RCA in the future.

Impact Analysis is another useful tool here… This helps you explore possible positive and negative consequences of a change on different parts of a system or organization.

Another great strategy to adopt is Kaizen or continuous improvement. This is the idea that continual small changes create better systems overall. Kaizen also emphasizes that the people closest to a process should identify places for improvement. Again, with Kaizen alive and well in your company, the root causes of problems can be identified and resolved quickly and effectively.

How to conduct this kind of Analysis

As in any of our projects, the first step to a successful destination is bringing together the right people to overcome the challenges. For FMEA, that means a cross-functional team with members who bring a variety of perspectives as well as expertise in the processes involved, systems, products, or services involved. It is also important to have people on the team with a deep understanding of the customer’s needs.

Once the team has been defined, they define the scope of the process that will be the focus of the project. They will also set boundaries for the project and determine how far they will go with the details.

Identify faults and potential defects

The team determines this by looking at each functional requirement for the process in question and identifying, based on similar past processes and experiences, where states are likely to occur. It is also important to remember that one failure mode can have a ripple effect and cause other components to fail. The team also looks at the glitch effects of each glitch mode.

This includes the consequences of the failure within the process or related processes, as well as for other products, services, and customers.

Determine severity

In this step, the team scales up the failure effects, determining the severity of the consequences of a failure mode. The team usually develops a scoring system at this step.

The following scoring system is suggested:

1: No effect;
2: very small effect noted only by discriminating or very observant users;
3: minor effect, with only a small part of the system affected;
4, 5, 6: moderate effect, with most users uncomfortable and/or annoyed;
7, 8: high effect, which implies the loss of the main function of the system, leaving users dissatisfied;
9, 10: very high effect, which means that the process, system, or product has become dangerous, leading to customer anger and safety hazards.

Probability of occurrence

The team then ranks each failure mode based on how likely it is to occur. Most teams find it effective to create a system that assigns a numerical value to the potential of each fault, with those rated “1” as least likely to occur and those rated “10” as most likely to occur.

This step often includes performing a root cause analysis to determine the exact causes and the likelihood of their occurrence.

Create systems for fault detection

At this point, it’s easy to see with just a quick review of the failure modes that are most likely to occur and the severity of each. The team can then create or enhance current process controls to catch potential failures before they happen. These include inspections, tests, and other mechanisms used to evaluate the system or process, from a team of people performing periodic checks on a system to automated processes seeking to derive acceptable ranges.

The job of controls is to prevent a failure mode from happening, or at least detect a failure after it has occurred but before it can affect the customer or end user. It is also helpful for teams to rank potential controls with a rating that estimates how well they are expected to perform in fault detection.

Putting it all together

One way to make the effectiveness of FMEA understandable is to use an example from everyday life. The process in question is a trip to work, with the aim of always arriving at the office on time. The team involved is everyone at home. The first step is to identify the possible failure modes of a trip. Some of the most popular include:

  • Leaving the house late;
  • traffic congestion;
  • not find the keys;
  • bad weather;
  • take the children to school before traveling;
    and so on.

The first job is to rate the severity of each. Most people can find their keys in no time (if this is a problem, a cause/effect diagram might help). However, getting children to school is a complex process, and not packing lunch the night before or not serving breakfast on time can cause long delays.

Then it’s time to analyze the probability of it happening, based on experience. For many people, it’s congestion on the same road every day or hitting the snooze button one too many times. Whatever the case, an honest assessment of past experience leads to more accurate predictions about which aspect of the morning routine will cause a problem.

Knowing these problems, solutions can go in place:

  • Put the keys in the same place before going to bed every night;
  • search for alternative transportation routes;
  • traffic check before leaving home;
  • divide the tasks to prepare breakfast and prepare lunch with the children.

Process controls can then come into place. In this case, it is mainly visual inspections.

While obviously simplistic, this example contains the same approach a team should use with work-related processes. FMEA can be applied to everything from accurately filed expense reports and timely payments to reducing the risk of errors when operating heavy machinery on the plant floor.

Summary

A Failure Modes And Effects Analysis (FMEA) is a risk management tool that identifies potential failures within a system or process and classifies them by probability of occurrence and potential severity. A team uses FMEA because it allows them to accurately predict challenges and puts organizations in a proactive rather than reactive stance.

--

--