Looking Beyond Cpk in Manufacturing

Edward Elson Kosasih
The Startup
Published in
4 min readAug 9, 2019

Today I’d like to revisit an index that any Six Sigma practitioners would love to talk about: Cpk or Process Capability Index. Rather than explaining Cpk in length, I’ll put the formula straightaway below.

Cpk Formula

In short, Cpk measures how far away your process is with respect to the upper and lower tolerance limit in terms of standard deviation. The higher Cpk is, the tighter the process. This is why factory managers love to see high Cpk numbers — it suggests that that their processes are good. It is commonly accepted that a Cpk lower than 1.3 (sometimes 2) means that the process is bad.

To illustrate how Cpk is calculated, suppose we have collected measurements from 5000 Device Under Test (DUT). The mean of those values is 10 while standard deviation is 1. The specification document stipulates that these measurements must be within 0 to 20, thus anything outside this range is defective.

Substituting information about the measurements from the paragraph above results in the following Cpk calculation:

High Cpk = No Issue?

So, if our Cpk is large enough, say larger than 2, does it mean that there is no issue with our processes? Not necessarily true. In this article, I will highlight 3 interesting issues that are hidden behind high Cpk. These are based on real problems that I have observed in production lines (although the measurements I show are fake).

  1. Undetected Anomalies

Take a look at measurements from the two charts below. Notice that in the top chart, there’s one anomalous data point in this coordinate (1500, 18), whereas everything is ok in the bottom chart. It turns out that Cpk for both measurements are equal to 3.3. This shows that rare anomalies like this cannot be flagged out by Cpk. We need to look at the raw values.

Capturing this anomaly is important as we want to ensure that DUTs that pass to the next station are truly good. We’d like to avoid having walking-wounded devices that could potentially result in a Return Merchandise Authorization (RMA) later on.

Cpk = 3.3 with Anomalies
Cpk = 3.3 without Anomalies

2. Loss of Localization Information

Consider the two set of measurements below. Both of them have equal Cpk = 2.7 with a bunch of anomalies. The problem is that by looking at Cpk alone, we couldn’t tell where the anomalies are in terms of time. Upon looking at the raw measurements, we can clearly see that for the top chart, anomalies are consistently present all the time. Meanwhile for the bottom chart, the process used to be stable, until anomalies start appearing after the 4000th measurement onward.

This information is useful as both processes require different diagnostics and treatments.

Cpk = 2.7. Anomalies are consistently present
Cpk = 2.7. Anomalies appear only towards the end

3. Shift in Process

Looking at the measurement below, we see a fairly stable process with a sudden shift in mean at around the 2500th measurement. This usually happens because something changes in the process (new parameter setting, or new component supplier etc…). In this case, the Cpk formulation is inaccurate. This is because Cpk assumes that measurements follow a unimodal gaussian distribution. What we see here is a bimodal distribution. If we calculate the mean (expected value), it will fall at around value = 10, which does not represent the measurements well as there actually is almost no values in that region.

Conclusion

There are plenty of useful information lying within the raw measurements of a process. In this case, Cpk alone might not be useful enough. I have shown a few issues that could be uncovered once we look at the original data.

--

--

Edward Elson Kosasih
The Startup

PhD in Operations Research and Machine Learning at University of Cambridge