TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Central Limit Theorem for Process Improvement with Python

Estimate the workload for returns management, assuming a normal distribution of the number of items per carton received from your stores.

Samir Saci
TDS Archive
Published in
6 min readAug 24, 2021

--

An infographic explaining the application of the Central Limit Theorem for process improvement in returns management. The image has three sections: (1) Assumption of a normal distribution (N(σ, μ)) for the number of items per return carton, (2) Calculation of the probability of receiving between n1 and n2 items per carton, and (3) Estimation of the workforce needed for returns management. The graphic includes icons for data distribution, a work desk for processing returned items, and workers for
Central Limit Theorem Framework — (Image by Author)

Improve your returns management process with statistical analysis.

Returns management, often called reverse logistics, manages returned items from retail locations in your distribution centre.

After the reception, products are sorted, organized, and inspected for quality.

If they are in good condition, these products can be restocked in the warehouse and added to the inventory count waiting to be reordered.

In this article, we will see how the Central Limit Theorem can help us estimate the workload for the returns management process.

SUMMARY
I. Scenario
Problem Statement
As the Inbound Manager of a multinational clothing retail company you are in charge of workforce planning for returns management.
Question
Can you estimate the probability to have less than 30 items per carton that you will receive every week?
II. Central Limit Theorem
1. Definition
2. Application
3. Probability to get <30 items per carton?
4. 95% probability to have less than k items per case?
III. Conclusion
1. Generative AI: Statistical Tests x GPT
Create a statistical tool super agent powered by GPT
2. Next Steps

Scenario

Problem Statement

You are the Inbound Manager of a multinational clothing retail company known for its fast-fashion clothing for men, women, teenagers, and children.

A major problem for you is the lack of visibility of your workload for the returns process.

Indeed, because of system limitations, you do not get advance shipping notice (ASN) before receiving returns from your stores.

a. You receive the cartons by pallets you unload from the truck

An illustration showing the unloading area of a distribution center, where a pallet of cartons is being removed from a truck using a pallet jack. The pallet contains returned items from stores that will be inspected and processed. This image emphasizes the initial stage of the returns process, where items are unloaded for inspection and reprocessing
Unloading Area— (Image by Author)

b. You open the box and inspect the returned items

An illustration of a quality inspection workstation in a distribution center, featuring a table with folded shirts, a box containing a shirt for inspection, and a sewing machine. This workstation is used by operators to inspect, relabel, and repack returned items to determine if they can be restocked. The image highlights the second stage of the returns process, where items undergo quality checks before restocking.
Quality Inspection Workstation — (Image by Author)

For each item (shirt, dress …), your operators need to perform the following:

  • Quality check to ensure that the product can be restocked
  • Relabelling
  • Re-packing

You know the productivity per item, and you would like to estimate the workload in hours based on the number of cases you will receive weekly.

Based on the historical data of the last 24 months, you have:

  • An average of 23 items per carton
  • A standard deviation of 7 items

Your team is usually sized to handle 30 items per case.

If it exceeds this threshold, you must hire temporary workers to meet your daily capacity target.

Question

Can you estimate the probability to have less than 30 items per carton that you will receive every week?

🏫 Discover 70+ case studies using analytics for supply chain optimization 🚚, sustainability🌳and business optimization 🏪: Cheat Sheet

Central Limit Theorem

The Central Limit Theorem establishes that when we add independent random variables, their normalized sum tends toward a normal distribution even when the original variables are not normally distributed.

Definition

To simplify the comprehension, let’s introduce some notations:

Mathematical notation explaining the relationship between the population mean (μ), sample mean (x̄), and population standard deviation (σ) with reference to the Central Limit Theorem. The formula establishes that sample means approach a normal distribution as the sample size increases, even if the population distribution is not normal.
Notations — (Image by Author)

In our case, the total population is the entire scope of cartons received from the stores with a mean µ = 23 items per carton and a standard deviation of σ = 7 items per carton.

If you take n samples of cartons Xn (for instance, a sample can be a batch of cartons received at a certain date), we have the following

A formula explaining how random samples drawn from a population with a mean (μ) and variance (σ²) behave. The equation defines the standard normal distribution (Z) as the limit of the sum of normalized sample means as the sample size (n) increases.
Equation — (Image by Author)

In other words, if we randomly measure the number of items per carton using n samples and assume that observations are independent and identically distributed (i.i.d.), the probability distribution of the sample means will closely approximate a normal distribution.

Note: To ensure that we have independent and identically distributed observations, we assume that the samples are built based on return batches coming from all stores in a scope covering 100% of the active SKU.

Application

We can then assume that the average number of items/case is following a normal distribution with a mean of 23 items per carton and a standard deviation of 7 cartons.

A simple bell curve graph representing a normal distribution. The graph shows a symmetrical distribution where most observations cluster around the central peak, with fewer observations occurring as you move further away from the mean, which is characteristic of the normal distribution.
Population Normal Distribution — (Image by Author)

What is the probability to have less 30 items per carton?

Probability to get <30 items per carton?

Probability to have less than 30 items/carton is 84.13%
A bell curve representing a normal distribution with red horizontal and vertical lines indicating specific thresholds. This visual likely marks a particular probability or z-score within the normal distribution, used to calculate probabilities or confidence intervals.
Population Normal Distribution — (Image by Author)

Code

4. 95% probability of having less than k items per case?

Your KPI target is to have at least 95% of the returns processed the same day.

How many items must you assume to size your team for handling 95% of the expected workload?

We have 95% of probability that X <= 34.51 items/carton
A bell curve representing a normal distribution with horizontal and vertical red lines marking a different threshold compared to the previous graph. This graph could indicate a new probability region or z-score range for further statistical analysis.
Population Normal Distribution — (Image by Author)

If you size your team based on 35 items/carton, you will, on average, reach 95% of your target.

You can find the full code in this Github repository,

Conclusion

Generative AI: Statistical Testing x GPT

Following the adoption of large language models (LLMs), I started to experiment with the design of a LangChain Agent connected to a TMS.

A diagram showing an automated supply chain control tower workflow with GPT and Langchain starting with ambiguous input (represented by question marks), proceeding through SQL queries, machine learning analysis, and generating insights that are communicated to users in an understandable form.
Supply Chain Control Tower Agent with LangChain SQL Agent [Article Link] — (Image by Author)

The performance is quite impressive; the agent can answer operational questions by autonomously querying a database of delivery shipments.

What if we create a Statistical Tests super agent?

The image shows agent architecture to process the user’s request for two different analyses simultaneously. After receiving detailed instructions, the agent complete both tasks, prompting the user to guide it further. The flow illustrates how the agent resolves complex requests efficiently to promote Supply Chain Analytics with custom GPTs like “The Supply Chain Analyst”.
Lean Six Sigma Super Agent — (Image by Author)

The vision is to equip customs GPTs with

  • Python Scripts of Lean Six Sigma Tools
  • Context, articles and knowledge about LSS mathematical tools

Imagine you can help continuous improvement engineers with an agent to find the right test, perform it on datasets uploaded and provide answers.

For more information,

Next Steps

This methodology allows you to size your team based on assumptions backed by powerful statistical tools.

If you are interested in learning about statistical tools to solve operational problems,

This analysis can be performed several times a year, especially if the business is evolving (with more collections, e-commerce, or new store openings).

About Me

Let’s connect on Linkedin and Twitter, I am a Supply Chain Engineer using data analytics to improve logistics operations and reduce costs.

For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.

If you are interested in Data Analytics and Supply Chain, have a look at my website.

💌 New articles straight in your inbox for free: Newsletter
📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

--

--

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Written by Samir Saci

Top Supply Chain Analytics Writer — Case studies using Data Science for Supply Chain Sustainability 🌳 and Productivity: https://bit.ly/supply-chain-cheat

No responses yet