Root Cause Analysis and Its Impact on Your Bottom-line

Life @ Pie
The Startup
Published in
5 min readJul 11, 2020

Global Quality Compliance leader, Shellye Archambeau says that:
Experts have estimated that COPQ typically amounts to 5–30 percent of gross sales for manufacturing and service companies. Independent studies reveal that COPQ is costing companies millions of dollars each year, and its reduction can transform marginally successful companies into profitable ones. Yet most executives believe their company’s COPQ is less than 5 percent, or just don’t know what it is.

For a company selling to consumers, 5–30% of consumers may be experiencing poor product quality — consider the cost of lost repeat sales. For B2B companies, there is a loss in credibility, cost of personnel almost continuously engaged in root cause analysis, and higher cost of manufacturing.

Most manufacturing organizations will perform statistical process control (SPC) assessment of their operations — more easily for continuous processes. For one, continuous processes have generally had sensors and control systems put in place — making such analysis easier. However, this is not enough. Many manufacturing operations, however, are batch processes; these processes often have few sensors and most data is tracked manually.

There is a large number of processes (whether continuous or batch), where SPC analysis is limited to only certain portions of the process AND the data is not digitally accessible easily. In such processes, the cost of tracking failures and performing root cause analysis is high.

A number of electronics devices are used in applications such as medical applications, critical testing, or defense, where quality is critical. The manufacturing process of many of these include use of polymers as underfill material, and sealant materials. Variability in raw material quality or processing can cause failures that would be critical.

A manufacturer of medical electronics found customers experiencing sudden death of their devices. While the result of failure did not critically harm its users, it did impact the credibility of the manufacturer in hospitals and senior care homes where it was being used. Buying partners suspended further transactions until the manufacturer was able to address this issue.

To solve the problem, the manufacturer had to undertake two key tasks:
a. Identifying whether the failures were related to specific batches or lots
b. Use traceability to help identify the root cause of these failures.

1–2% of the products were experiencing such failures. While this was good news, it also made the problem harder to solve. What might be the reason that a few of the devices failed — and not all of them?

Like most other manufacturers in this industry, this manufacturing company kept records of all its manufacturing operation using paper-based batch records. However, given the number of batches that were manufactured during the period of interest, it would have taken days if not weeks to go through all the product release documents to identify the frequency of failures from different lots. Additionally, analyzing these paper-based batch records to track if there was something else that was different — unidentified deviations or shifting process trends — would require another few weeks.

There was one possible path. If one could scan all the release documents and analyze them via natural language processing (NLP; a kind of machine learning model), one could identify the failure of these devices by production lots. Driven by a forward thinking senior manufacturing engineer, the team scanned the product release documents for six months of production — covering over 200 lots — that seemed to cover the time period when the failing devices were produced. With the support of Pienomial, the solution was setup within an afternoon, and by next morning, the manufacturing team had data showing failures by production lots.

Figure 1. A sample of analyzed data to trace failure by manufacturing lot

The manufacturing team was thus able to identify that there were 4 lots that contributed to 95% of failures. That was a significant learning — the failures were not occurring randomly across lots.

However, now the question that needed to be answered was what was different about these lots? Again, the manufacturing team decided to work smart. The manufacturing process was complex and batch records for each batch spanned over 150 pages. To manually assess this data for 4 batches that seemed to have failures and compare them with other “normal” batches would take weeks. Additionally, it led to the recall of units from only those 4 lots rather than over 200 lots.

The manufacturing team used Pienomial’s support to set up a second analysis — those of the batch records. Over a 12 hour period, the Pienomial team helped setup the solution to identify process data from the batch records. The batch records of the 4 impacted batches along with 6 other “normal” batches were analyzed using NLP. Subsequently, Pienomial’s AI engine ran an analysis to identify process conditions where the 4 batches were statistically different from the “normal” batches. Within 24 hours, the solution had been trained and identified two variables that seemed statistically different — though each was within the process limits and hence would not have caused a deviation alert.

Figure 2. Time of last use of potting material and the number of failures — by manufacturing lot; the horizontal blue line marks the 3 hour time.

The thermosets used to underfill and package electronic components on the device have an open cure time — the time within which they are allowed to be used once they are mixed. For the material used in this process, the upper limit of the open time was 4 hours. For all normal batches, the material was used within 1–3 hours. There were 7 batches, including the 4 with failures, where the material was used between 1–3.5 hours.

Figure 3. Temperature of cleanroom and the number of failures — by manufacturing lot; the horizontal blue line marks 24C.

The second variable that had a statistical difference was the temperature of the clean room. The process conditions required the temperature to be between 21C and 26C. While for normal conditions, the temperature was between 22C-24C, for 6 batches, including the 4 with failures it was between 24C-25C.

The manufacturing leadership knew that they probably had the solution. While the open time and the clean room conditions were within bounds, the combination of a higher temperature and longer open time may be impacting the thermoset. Over the next 48 hours, they ran characterization studies and confirmed that the thermoset had micro-particles which cause non-uniform flow under those conditions and resulted in poor thermoset properties.

The engineering team of this manufacturing company leveraged Pienomial’s intelligent assessment models to quickly identify what variables or factors they should focus on. As a result, a problem that would have taken 4–6 weeks to solve was addressed within 4 days, saving the organization over 19000 units in recall as well as allowing for continued sales that was under threat.

The company was not only able to keep its customers, it also improved its credibility by showing its customers that it truly understood its processes. It has since gone one step further — it uses Pienomial to continuously track its processes, maintaining continuous traceability so it can proactively assess and confirm its confidence on every lot manufactured.

--

--

Life @ Pie
The Startup

Pienomial is an AI powered platform that works across the manufacturing value chain to enable intelligent decision making and risk management.