TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Statistical Sampling for Process Improvement using Python

Use sample data to estimate the average lead time for processing customer orders in a customer service department.

Samir Saci
TDS Archive
Published in
5 min readSep 28, 2021

--

A visual representation of order processing lead time estimation using statistical sampling with Python. The left shows the population of customer service representatives, represented by icons, with a distribution (σ, μ). The right displays a sample of the population used to estimate the average lead time, with a 90% confidence interval (x̄ ∈ [μ-b, μ+b]). The image explains how sample data is used to infer the overall lead time for processing customer orders in a customer service department.
Statistical Sampling to estimate the average order processing lead time — (Image by Author)

Customer service is where your company gives your customers a sense of the products and business you are selling.

As a data scientist, can you improve the performance of customer service?

An important performance indicator is the average lead time between receiving a customer order and transmitting it to the warehouse for preparation.

In this article, we will introduce a methodology using statistical sampling to estimate this overall average lead time using 200 observations.

SUMMARY
I. Scenario
Problem Statement
You are the Customer Service Manager of an elevator parts supplier that produce and deliver engine parts for elevators.
Question
Can you estimate the average processing time with a confidence interval of 90% using your sample data?
II. Statistical Sampling
1. Notations
2. Application of the Central Limit Theorem
3. Confidence Interval
4. Final estimation
III. Conclusion
1. Other Lean Six Sigma Statistical Tools
2. Assessment of Container Loading Efficiency

Scenario

Problem Statement

You supporting the Customer Service Manager of an elevator parts supplier that produces and delivers engine parts for elevators.

Her team is in charge of order processing:

  • A Customer sends an order by phone or email with a requested delivery time
    (e.g., Customer orders 5 units of SKU X and would like to be delivered the same day at 10:00)
  • Your team confirms the order and allocates it to the closest warehouse for preparation and shipment.
  • The order is prepared and shipped from the warehouse using an express courier company.

You recently received many complaints from your customers because of late deliveries.

According to the warehouse manager, this is mainly due to delays in customer service processing orders.

During three months, you measured the order processing time of randomly selected operators and gathered 200 observations.

Can you estimate the average processing time with a confidence interval of 90% using your sample data?

Statistical Sampling

As we cannot measure the average processing time of all your operators for every order, we would like to estimate the total population average using these sample records.

Notations

To simplify the comprehension, let’s introduce some notations:

A mathematical description of notations used in statistical sampling. The image defines variables such as μ and σ as the total population mean and standard deviation, and x̄ and s as the sample mean and standard deviation. X and S represent random variables for the sample mean and standard deviation of samples taken from the total population. These notations help in the calculation of confidence intervals and estimations based on sample data
Notations — (Image by Author)

Application of the Central Limit Theorem

In a previous article, we used the Central Limit Theorem (CLT) to estimate the probability of a random variable P(X≥k), assuming that X was following a normal distribution.

The CLT also tells us:

Equations illustrating the expected value of the sample mean, represented as E[X] = μ, where the sample mean is the population mean. The standard deviation of the sample mean (σx̄) is calculated as the population standard deviation (σ) divided by the square root of the sample size (n). These equations are used in statistical sampling to estimate population parameters.
Equations— (Image by Author)

Confidence Interval

Our objective is to know the population mean range [µ-b, µ+b] with a confidence of 90%.

A series of mathematical equations explaining how to calculate confidence intervals using the Central Limit Theorem. The equations demonstrate the relationship between the sample mean (X̄), population mean (μ), and Z-statistics associated with the unit normal distribution. The goal is to estimate the probability of the population mean lying within a certain range, based on sample data.
Equations— (Image by Author)

And we know by the construction of the unit normal distribution that for P(-z≤Z≤z) = 0.9, we have z = 1.64

Finally, we can get our estimated range, or the population mean

A confidence interval formula for a 90% confidence level, represented as μ = [x̄ — (1.64 * s) / √n, x̄ + (1.64 * s) / √n]. This equation calculates the range in which the population mean (μ) is expected to lie with 90% confidence, based on the sample mean (x̄), sample standard deviation (s), and sample size (n). The constant 1.64 corresponds to the Z-statistic for a 90% confidence interval.
Equations — (Image by Author)

Final estimation

count    200
mean 22.705
std 6.81
min 4.0
25% 18.0
50% 23.0
75% 27.0
max 41.0
A boxplot representing the distribution of sample data with 200 observations. The boxplot shows the interquartile range (IQR), with the median marked by a green line, and outliers represented as dots outside the whiskers. This plot provides a visual summary of the data, highlighting key statistics such as the median, quartiles, and any potential outliers in the sample.
Equations — (Image by Author)

We have,

n = 200
x̄ = 22.705 (min)
s = 6.81 (min)
The confidence interval is [21.96, 23.54]

🏫 Discover 70+ case studies using analytics for supply chain optimization 🚚, sustainability🌳and business optimization 🏪: Cheat Sheet

Conclusion

For a confidence level of 90% and with moderate experimentation, we have a perfect estimation of the average lead time for order processing.

This approach can be used when process performance measurement is expensive and takes effort and time.

Can we trust the data?

However, it would be best to put effort into the experimental protocol to ensure that your sample data have been built based on a random selection of operators.

Have you heard about Lean Six Sigma Statistical Tools?

If you are interested in statistics for continuous improvement, check out this series of articles covering Lean Six Sigma concepts with Python.

Can we use the same approach to evaluate other processes?

Containers Loading Optimization with Python

Due to the container shortage during COVID, the sea freight price exploded.

A 3D rendering of two large shipping containers, one red and one green, with pallets stacked nearby. A red forklift is seen carrying a pallet toward one of the containers. This image visualizes the container loading process, with different pallet arrangements intended to maximize space utilization inside the sea containers. The scene demonstrates the need for optimization strategies to load a maximum number of pallets efficiently.Containers Loading Optimization with Python
Can you optimize pallet loading strategies? — (Image by Author)

This put a lot of pressure on transportation management teams.

Do you want to assess the efficiency of sea container loading?

Side-by-side 3D renderings of two sea containers showing the difference between an initial pallet packing solution (right) and an optimized solution (left). The left container is packed more efficiently with both European and North American pallets, while the right container leaves space unused. The blue pallets inside the containers represent a more optimized arrangement in the left solution, which results in higher space utilization.
Example of innefficient pallet loading [Right] — (Image by Author)

This example illustrates how inefficient loading strategies can increase costs.

The two pallets on the side won’t be loaded in the same container.

What about the additional cost? Assess the performance of your forklift drivers.

You can collect data on pallet loading (container size, number of pallets) to measure the performance and find patterns.

The insights can lead you to implement an algorithm for loading optimization like the one developed in the article linked below.

About Me

Let’s connect on Linkedin and Twitter. I am a Supply Chain Engineer who uses data analytics to improve logistics operations and reduce costs.

For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.

If you are interested in Data Analytics and Supply Chain, look at my website.

💌 New articles straight in your inbox for free: Newsletter
📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Samir Saci
Samir Saci

Written by Samir Saci

Top Supply Chain Analytics Writer — Case studies using Data Science for Supply Chain Sustainability 🌳 and Productivity: https://bit.ly/supply-chain-cheat

Responses (2)