Capability analysis of continuous data using R

Data analysis using R in Six Sigma style — Part 2

Rafal Burzynski
7 min readMay 13, 2023

This article is a part of series titled “Data analysis using R in Six Sigma style”. If you have not read Part 1, here it is:

All script files are loaded into this Github repo:

If you want to run R scripts in VS Code, you can read a dedicated article that will show you how to set things up.

In this part we will focus on calculating process capability expressed as Z-value (some other values such as Cp and Pp are also used but I’m a Z-value person). But before we do that we should always check for stability of the data and there is a defined section for that.

Stability of the data

In Six Sigma approach we are looking at processes to identify improvements. When improving a process, it is important to understand what is causing its variation. Some portion of variation is natural and is present in each process. This kind of variation is called common cause variation. In contrast to common cause variation, we have a special cause variation, which is something unusual not linked with natural variation in a process. It can be detected by use of control chart and in this article we will show one of the charts called I-MR chart, individual moving-range chart.

I-MR chart is used for continuous data that is not gathered in sub-groups and our measurements taken in Part 1 of this series are such data. This chart has two parts: one is showing individual measurements and the other moving range, which is the difference between two measurement points. Each of those two parts of the chart has what is called a control limit and if a point it outside a control limit, it should be understood why and it is also one of the indication that we may be dealing with unstable process where a special cause variation is present.

In R I have not found a ready function to plot I-MR chart. But we can write our own function to create it.

# I-MR chart
I_MR <- function(vector) {
# calculation of moving range
k <- c(0)
for (i in 2:length(vector)) {
k[i-1] <- abs(vector[i]-vector[i-1])
}
MR <- sum(k)/length(k)

# plotting
par(mfrow = c(1,2)) # 1 row, 2 columns
# I-chart
plot(vector, xlab = "Observation", ylab = "Individual value",
main = "I chart")
abline(h = mean(vector), lty = 2)
abline(h = mean(vector) + 2.66*MR, col = "#750505")
abline(h = mean(vector) - 2.66*MR, col = "#750505")
# MR chart
plot(k, xlab = "Observation", ylab = "Range value", main = "MR chart")
abline(h = MR, lty = 2)
abline(h = 0, col = "#750505")
abline(h = 3.27 * MR, col = "#750505")
par(mfrow = c(1,1))
}

# using function to plot the chart
I_MR(df$First)

In our case of 50 measurement point, we see one MR point that is outside of the upper control limit indicated by the red horizontal line. This requires some more investigation to understand what could happen and should make us aware that the data might contain special cause variation. For the purposes of this article we will use it as it is for further analysis.

Process capability

As mentioned above process capability can be expressed with a Z-value that tells us how many standard deviations fit between mean and a given specification limit. On the chart below we can see two specification limits represented by the vertical red lines. The lower specification limit (LSL) is set at 45 mm and the upper specification limit (USL) is set a 90 mm.

Before we move on to show the calculation of Z-value there is one important thing to notice. Since Z-value tells us how many standard deviations we can fit between mean and a specification limit, it means that the measurement data must be normally distributed. This is what we checked in Part 1 of this article series with the use of Shapiro-Wilk test.

Normally Z-value is calculated for LSL and USL using the following formulas:

Greek letter mu denotes mean value and Greek letter sigma denotes standard deviation

Final Z-value is the lower value of the two above.

Now let’s look at the piece of code in R.

> # capability study
> # mean value
> m <- mean(df$First)
> print(m)
[1] 69.492
> # standard deviation
> s <- sd(df$First)
> print(s)
[1] 2.065444
> # calculation of Z value
> USL <- 90
> LSL <- 45
> Zu <- (USL - m)/s
> Zl <- (m - LSL)/s
> Z <- min(Zu, Zl)
> print(Z)
[1] 9.929102

Since specification limits are quite wide, Z-value is really high, which means that the process capability is very good.

To really understand what a good process is, we can look at it from manufacturing perspective. When there are no rules or procedures, a process capability is between Z = 1 and Z = 2.
I have not mentioned the properties of normal distribution before but now is a good opportunity for it.
Since normal distribution seen as a bell curve is a probability density function, it means that the area under curve is equal to 1 (or 100% if you prefer to look at it that way). Between +/- 1 standard deviations, the area covers 68,26 %. Between +/- 2 standard deviations, the area is 95,44 %. Finally, between +/- 3 standard deviations it is 99,72 %.
Now let us imagine that our process is centered right in the middle between specification limits (this means that Z-value to the LSL and to the USL are equal). With process capability at Z = 2, we can see that less than 5 % of production will be outside specification limits (100 % — 95,44 % = 4,56 %). It may seem not a lot but if we start to look at defects per million opportunities (DPMO), it will come out that roughly 300 000 items coming out of the process will be defective. That is a significant number and this led to exploration what actions could be taken to improve processes up to an ultimate goal of a short-term process capability of Z = 6 (meaning six sigma). The table below links process capabilities with DPMOs.

Data taken from “The Six Sigma Handbook” by T. Pyzdek and P. Keller

Taking into consideration the discussion on process capability above, let’s look at the other specification limits by changing the USL to 80 mm and then to 75 mm.

> # calculation of Z value
> USL <- 80
> LSL <- 45
> Zu <- (USL - m)/s
> Zl <- (m - LSL)/s
> Z <- min(Zu, Zl)
> print(Z)
[1] 5.087527

> # calculation of Z value
> USL <- 75
> LSL <- 45
> Zu <- (USL - m)/s
> Zl <- (m - LSL)/s
> Z <- min(Zu, Zl)
> print(Z)
[1] 2.66674

In case of USL equal to 80 mm, process capability is still very good. But change to 75 mm results in Z-value of 2,67, which is not so good and process should be corrected.

Finally let’s plot capability chart with USL equal to 80 mm.

# plotting normal fit
norm_fit <- dnorm(df$First, mean = m, sd = s)
plot(df$First, norm_fit, xlab = "Length [mm]",
ylab="Density", xlim = c(LSL, USL), main = "Capability analysis")
lines(df$First[order(df$First)], norm_fit[order(df$First)], col = "orange")
abline(v = m, lty = 2)
abline(v = LSL, col="#750505")
abline(v = USL, col="#750505")
grid()

Here it is possible to see that Z_LSL is quite big (11,86) but it is Z_USL that is only 2,67. One of possible fixes is to try to move the process between specification limits towards smaller mean values.

Conclusion

This concludes the second part of data analysis with R in Six Sigma style in which we have shown how to check stability of the data with I-MR chart, have introduced process capability described with Z-value.

Part 3 will introduce hypothesis testing used in groups comparing.

Full script file available also below as Github Gist:

--

--

Rafal Burzynski

I like to learn stats, data science and coding in Python and R. Then I publish what I have learnt. See my other work at https://rafburzy.github.io