TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Improve Warehouse Productivity using Spatial Clustering with Python.

8 min readAug 11, 2020

--

Improve Warehouse Productivity using Spatial Clustering with Python. A visual representation of a warehouse layout with aisles labeled A01 to A19. The image demonstrates the concept of spatial clustering for warehouse productivity. Red dotted clusters group picking locations, marked with coordinates (xi, yi), indicating proximity within a defined distance threshold. The goal is to reduce the walking distance of warehouse operators by clustering orders with nearby picking locations.
Improve Warehouse Productivity using Spatial Clustering with Python Scipy — (Image by Author)

A significant cost reduction lever for logistics management is improving warehouse productivity of goods handling processes.

In a warehouse, 60% to 70% of the picking operators' working time is wasted walking from one location to another.

As a Data Scientist, how can we use Python to reduce this time?

In a previous article, we built the basis to estimate the total picking route walking distance for a set of orders using:

  • Warehouse Mapping: Link each order line with your warehouse's associated picking location coordinate (x, y).
  • Distance Calculating: This function calculates the walking distance from two picking locations.
A bar chart showing the total routing distance for various numbers of orders per wave in a picking process. The x-axis displays the number of orders per wave from 1 to 9, while the y-axis shows the total picking route distance in meters. The routing distance significantly decreases as the number of orders per wave increases, illustrating the impact of batch picking on reducing walking distance. The label on top reads, “Analysis Total Picking Route Distance.
(8) Results for 5,000 order lines with a ratio from 1 to 9 orders per route — (Image by Author)

The impact of grouping orders in waves on total walking distance is

  • Up to 50% reduction after grouping three orders per wave
  • We reach a 75% reduction if we achieve nine orders per wave

Can we find additional levers to reduce this distance even more by grouping orders in a smart way?

In this article, we will deep dive into the Order Wave Processing solution, focusing on using spatial clustering with Python to group orders.

I. Order Wave using Picking Locations Clustering
Group orders by geographical clusters of the picking locations
1. Picking locations clustering using Scipy
Apply clustering techniques on the picking locations to create groups
2. Picking locations clustering for Multi-line Orders
Centers of gravities can be used if we have multiple locations in an order
II. Model Simulation
1. Comparing 3 methods of Wave Processing
2. Tuning Distance Threshold for Clustering

III. Conclusion
1. Improve the pathfinding algorithm
2. Advanced Diagnostic using Process Mining

Order Wave using Picking Locations Clustering

In the previous simulation, we took a straightforward approach for

A visual representation of a warehouse layout, showing multiple aisles labeled A1 to A19, with red and black circles representing picking locations (xi, yi). This diagram demonstrates how spatial clustering is used to group orders in batches based on proximity of picking locations, aiming to reduce warehouse operator walking distances. Orders are clustered within a defined distance threshold, maximizing efficiency during order picking operations.
Two levers for improving our solution performance — (Image by Author)
  • Picking Route Design: Given a choice of several picking locations, the warehouse picker will always choose to go to the closest (Next Closest Location Strategy)
  • Order Waving: Orders are ordered and grouped in waves by receiving time from OMS (TimeStamp)

Can we improve the performance by grouping orders by spatial location?

Let’s start implementing this with single-line orders, as they have the advantage of being in a single storage location.

A table displaying order data for a warehouse. The table includes columns for date, order number, SKU, pieces per order, reference ID, location, alley number, cell location, and coordinates (x, y) for each picking location. This dataset illustrates how each order line is mapped to a specific picking location in the warehouse, which is used to optimize the walking distance for order picking.
(2) Order Lines DataFrame — (Image by Author)

Grouping several single-line orders by cluster can ensure that our picker will stay in a delimited zone.

Where single-line orders are located?

Function 1: Calculating the Number of single-line orders per storage

Code

Insights: Let us take the example of Distribution above

  • Scope: 5,000 order lines for 23 aisles
  • Single line orders: 49% of orders located in alleys A11, A10, and A09
A heatmap including 23 aisles in a warehouse, each labeled A01 to A23. Each aisle shows percentages of order concentration for 5,000 order lines. High-concentration areas are highlighted in red, with notable hot spots in aisles A11, A10, and A09. The data indicates that 49% of single-line orders are located in these three aisles. There are potential clustering areas where operators can optimize warehouse picking routes by grouping nearby locations to minimize travel distance using Python Scipy.
(1) Distribution of single-line orders lines per storage location — 5,000 order lines (%)

We can spot clusters of high-concentration order lines picked, meaning these locations may be visited more than the others.

We should then group these locations by area to ensure the operators stay within their vicinity along the wave.

How can we create clusters of locations?

Picking locations clustering using Scipy

A flowchart depicting the steps of order wave processing, focusing on spatial clustering for warehouse picking route optimization with Python. The diagram illustrates different phases of order processing, from single to multi-picking operations, clustering, and centroid adjustments.
(3) Order Lines Processing for Order Wave Picking using Clustering by Picking Location — (Image by Author)

Group picking locations by clusters to reduce the walking distance for each picking route.

Example: the maximum walking distance between two places is <15 m

Spatial clustering is grouping a set of points so that objects in the same cluster are more similar than objects in other clusters.

A heatmap of a warehouse showing the percentage of order lines picked per aisle. Clusters of high-concentration orders are visible in Aisles A11, A10, and A09. These locations are highlighted to suggest optimization by grouping nearby locations to reduce walking distance for the warehouse picking route.
(4) Example of three Picking Locations Clusters — (Image by Author)

Here, the similarity metric will be walking distance from one location to another.

For instance, I would like to group locations, ensuring the maximum walking distance between two locations is 10 m.

What are the challenges?

Challenge 1: Euclidian Distance vs. Walking Distance

We cannot use conventional clustering methods using Euclidian Distance for our specific model.

Indeed, walking distance (using the distance_picking function) differs from Euclidian Distance.

A warehouse layout with aisles labeled from A01 to A19, with multiple rows of storage locations. Three picking locations, marked as i (xi, yi), p (xp, yp), and j (xj, yj), are shown along different aisles. A blue dashed circle indicates the Euclidean distance threshold around the point p. The paths between points i, p, and j are displayed, showing that the walking distance for the picker is shorter from point i to p compared to point i to j, despite equal Euclidean distances between points.
(5) Euclidian vs Custom Distance Example — (Image by Author)

For this specific example, Euclidian distances between i (xi, Yi) and the two points p (x_p, y_p) and j (x_j, y_j) are equal.

But, if we compare picker Walking Distance, p (x_p, y_p) is closer.

Picker's Walking Distance is the specific metric we want to reduce for this model.

What can we do to adapt the distance to our specific layout?

We can use a custom-made distance_walking function for better performance.

Example: Locations Clustering within a 25 m distance (5,000 order lines)

Two scatter plots show the spatial distribution of picking locations, each point representing a picking location in a warehouse. The left plot shows a vertical rectangular cluster around the middle of the plot, indicating locations that are grouped based on proximity. The right plot shows two horizontal rectangular clusters of picking locations, demonstrating a different clustering arrangement. The points are color-coded by a variable, likely representing the concentration of picking order lines
(6) Left [Clustering using Walking Distance] / Right [Clustering using Euclidian Distance] — (Image by Author)
  • The left example uses walking distance grouping locations within the same aisle, reducing picking route distance.
  • The right example can group locations covering several aisles.

Let’s implement it in a function!

Function 2: Clusters for Single Line Orders using Walking Distance

For a set of orders, lines extract single lines (df_orderlines) orders and create clusters of storage locations within a distance (dist_method) using the custom distance function (dist_method).

The Python code below uses Scipy’s ward and fcluster functions to create cluster-picking locations using the distance_func metric (walking distance).

Code

Now that we have created our clusters, we must link each order line to its optimal cluster.

Function 3: Single Line Orders Mapping with ClusterID

For a set of orders, lines extract single lines (df) orders, clusters id, and order numbers.

In this function, you map your Dataframe with cluster ID for wave creation.

Code

Challenge 2: Picking locations clustering for Multi-line Orders

Unlike single-line orders, multi-line orders can cover several picking locations.

However, we can apply the same methodology to the centroids of storage locations.

Example: Order with three lines covering three different picking locations

A warehouse layout with aisles labeled A01 to A19 for an order with three different picking locations: points labeled 𝑖 ( 𝑥 𝑖 , 𝑦 𝑖 ) i(x i ​ ,y i ​ ) in black, 𝑝 ( 𝑥 𝑝 , 𝑦 𝑝 ) p(x p ​ ,y p ​ ) in blue, and 𝑙 ( 𝑥 𝑙 , 𝑦 𝑙 ) l(x l ​ ,y l ​ ) in red. The centroid of these locations is labeled 𝑏 ( 𝑥 𝑏 , 𝑦 𝑏 ) b(x b ​ ,y b ​ ) at the center of the triangle. The image illustrates how to group multi-line orders by calculating the centroid of the picking locations to optimize picking
(7) Centroid of three Picking Locations — (Image by Author)

Code

After using this function, we return to the mono-line orders situation with a single point (x, y) per order.

We can then apply clustering to these points, trying to group orders per geographical zone with maximum distance conditions.

How can we guide operators along their picking route?

To help operators find their way, your operations can use voice-picking

🏫 Discover 70+ case studies using data analytics for supply chain optimization 🚚and business profitability 💵 in this Cheat Sheet

Model Simulation

To sum up our model construction, see the chart below.

Flowchart illustrating the steps involved in wave processing for warehouse picking productivity improvement using spatial clustering with Python. The diagram shows sequential steps from centroid calculation to order wave processing with parameters, such as clustering and concentration strategies, aimed at optimizing picking routes.
(8) Model Construction with Parameters — (Image by Author)

We have several steps before picking routes to create using Wave Processing.

At each step, we have a collection of parameters that can be tuned to improve performance:

Comparing three methods of Wave Processing

Comparative diagram showing the three different methods of wave processing for warehouse optimization. The flowchart outlines two distinct strategies, one highlighted in blue and the other in red, with their respective decision points for evaluating picking productivity and wave processing performance.
(9) Three Methods for Wave Processing — (Image by Author)

We’ll first assess the impact of Order Wave processing by clusters of picking locations on total walking distance.

Scenario for Simulation

  • Order lines: 20,000 Lines
  • Distance Threshold: Maximum distance between two picking locations (distance_threshold = 35 m)
  • Orders per Wave: orders_number in [1, 9]

We will be testing this dataset using three different methods.

  • Method 1: we do not apply clustering
  • Method 2: we apply clustering on single-line orders only
  • Method 3: we apply clustering to single-line orders and centroids of multiline orders.
A bar chart of the walking distance for picking routes with  different distance methods applied to 20,000 order lines, with a 35-meter distance threshold using Python. The x-axis is the number of orders per wave from 1 to 9. The y-axis shows the total walking distance in meters. Method 1, shows the highest walking distance without clustering. Method 2, applies clustering to single-line orders only, reducing the walking distance. Method 3, in blue, shows the lowest total walking distance.
(10) Test 1: 20,000 Order Lines / 35 m distance Threshold — (Image by Author)

Results

  • Best Performance: Method 3 for nine orders/Wave with 83% reduction of walking distance
  • Method 2 vs. Method 1: Clustering for mono-line orders reduces the walking distance by 34%
  • Method 3 vs. Method 2: Clustering for mono-line orders reduces the walking distance by 10%

Tuning Distance Threshold for Clustering

We validated our first assumption that Method 3 is the best for our particular scenario (20,000 order lines, 35 m Distance Threshold).

Let us look at the Distance Threshold impact on total walking distance.

This visual shows a warehouse layout with aisles labeled A01 to A19, containing rows of storage units. The black dots represent individual picking locations, each marked with coordinates (xi, yi). The picking locations are grouped into clusters (surrounded by dashed red rectangles), with clusters formed based on a specified distance threshold using Python Scipy. These clusters reduce the total walking distance for the picker by ensuring that items within close proximity are picked together.
(10) Different distance threshold for Picking Location Clustering — (Image by Author)

The trade-off between Walking Distance between two locations and Wave Size:

  • Low Distance: The walking distance between two locations is low, but you have fewer orders per wave (more waves)
  • High Distance: The walking distance between two locations is higher, but you have more orders per wave (fewer waves)
A bar chart illustrating the route distance for 5,000 order lines with 9 orders per wave across various distance thresholds, ranging from 1 to 95 meters. The walking distance decreases as the threshold increases, with the lowest distance around 60 meters, showing an optimal reduction in route distance with the warehouse picking route optimization algorithm designed in Python.
(11) Results for 5,000 lines grouped in Waves of 9 orders with Distance Threshold in [1, 95] (m) — (Image by Author)

We can find a local minimum for Distance_Threshold = 60 m, which reduces the distance by 39% compared to Distance_Threshold = 1 m.

A bar chart depicting the route distance for 20,000 order lines with 9 orders per wave across distance thresholds from 1 to 95 meters. The walking distance steadily decreases, reaching its lowest around 50 meters, signifying an optimal balance between wave size and walking distance using the warehouse picking route optimization algorithm.
(11) Results for 20,000 lines grouped in Waves of 9 orders with Distance Threshold in [1, 95] (m) — (Image by Author)

We can find a local minimum for Distance_Threshold = 50 m, which reduces the distance by 27% compared to Distance_Threshold = 1 m.

Next Steps

Improve the pathfinding methodology

The good news is that this solution can be improved even more.

The next closest location strategy has limits that can be easily pointed out by picking route records.

Two route optimization solutions for a picking path in a warehouse. The top graph is labeled “OR-Tool TSP Optimization Solution Route” and displays a more efficient picking route where points A, B, C, and D are visited in a near-optimal path with minimized overlaps. The bottom graph is labeled “Next Closest Location Solution Route” and shows a less efficient picking route with several crossing lines and longer paths between points A, B, C, and D. This visual comparison highlights differences.
Example of the limits of the Next Closest Location Path Finding — (Image by Author)

In the example above, we can see that the operators have to cover the same area several times through the picking route.

This is not efficient!

Therefore, I have implemented a pathfinding algorithm presented in the article linked below.

How to measure inefficiencies and detect failures?

Advanced Diagnostic using Process Mining

Process mining is a type of data analytics that focuses on discovering, monitoring, and improving business processes.

A diagram illustrating the components of a supply chain management system, including various icons representing analytics, transportation, warehouses, software configurations, and operational settings.
Supply Chain Systems — (Image by Author)

This involves analyzing data from various sources, such as process logs, to understand how a process is being executed, identify bottlenecks and inefficiencies, and suggest ways to improve it.

A series of four line graphs showing data on various logistical lead times and delivery performance metrics across different scenarios. The top graph represents loading lead time, the second graph shows airport lead time, the third displays clearance lead time, and the final two graphs indicate delivery on-time performance and total delivery lead time.
Example of Process Mining for Distribution Process — (Image by Author)

Your Warehouse Manage System (WMS) will record every step of the picking process.

  1. Start of the wave by the operator
  2. The first item picked, the second item picked …
  3. The last item picked

A solution using process mining can support the automation of the diagnostic of productivity issues by targeting bottlenecks.

For more information,

About Me

Let’s connect on Linkedin and Twitter. I am a Supply Chain Engineer who uses data analytics to improve logistics operations and reduce costs.

For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.

If you are interested in Data Analytics and Supply Chain, look at my website.

💌 New articles straight in your inbox for free: Newsletter
📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

--

--

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Written by Samir Saci

Top Supply Chain Analytics Writer — Case studies using Data Science for Supply Chain Sustainability 🌳 and Productivity: https://bit.ly/supply-chain-cheat