TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Deep Reinforcement Learning for AGV Routing

6 min readJun 9, 2021

--

This 3D rendering shows the layout of a warehouse with shelves arranged in multiple rows, where Automated Guided Vehicles (AGVs) are used to transport shelves directly to operators. The picking area is divided into sections with designated workstations where workers pack items. Operators are seen near the shelves and packing stations. The layout demonstrates an optimized warehouse design for minimizing walking time by using AGVs to enhance productivity.
Example of AGV setup with 8 workstations — (Image by Author)

In a Distribution Center (DC), walking from one location to another during the picking route can account for 60% to 70% of the operator’s working time. Reducing this walking time is the most effective way to increase overall productivity.

I have shared several methods using optimization strategies to reduce the walking distance of operators in your warehouse.

These methods are limited when there is a large picking area.

Therefore, automated solutions, such as automated guided vehicles (AGVs) that bring the shelves directly to the operators, are now very popular.

This article will explain how Reinforcement Learning can be used to organize the routing of these robots to ensure optimal productivity.

📫 For business inquiries, contact me: Samir Saci

From Man-to-Goods to Goods-to-Man

E-commerce companies were early adopters of this shift from a manual operation—man to goods—to goods to man.

Because their volumes fluctuate significantly (Promotions, Festivals), they have a broad range of references, and they lack labour resources.

Automation is a must for them.

Goods-to-person picking using Automated Guided Vehicles

Goods-to-person picking solutions deliver items directly to your operators at their pick stations. This eliminates the non-value-added time needed for operators to search for items.

This image shows a simulated warehouse layout. Yellow storage racks are organized in rows with pathways between them. Workers are stationed at multiple points in the warehouse, some picking up goods, others transporting them on trolleys. On the left side, there is a break area and sorting stations with workers. The layout appears optimized for efficient picking and packing in an automated or semi-automated setup.
Example of Shelves on Vehicle — (Image by Author)

The goods are stored on shelves that can be moved by these vehicles directly to picking stations, where the operators take the necessary quantity to prepare their orders.

A red Automated Guided Vehicle (AGV) is parked underneath a large yellow shelving unit in a warehouse. The shelving units are elevated on metal racks, and the AGV seems ready to transport the shelves to a different location. The floor is clear, and the warehouse design looks structured to support AGV movement for efficient inventory management.
Example of a Pickstation— (Image by Author)

AGV Installation Layout

This image illustrates an indoor warehouse environment with neatly organized yellow shelving units placed on metal racks. Workers and AGVs are active in the warehouse. Some workers are managing picking stations, while AGVs move between aisles transporting items. The setup reflects a modern warehouse with automated and manual processes working in tandem for order fulfillment.
Example of AGV Setup — (Image by Author)

In this Layout, you have

  • 8 Picking Stations grouped by two with 1 Operator per Station
  • 16 (8 x 2) alleys of shelves
  • 1 charging station for the vehicles

🏫 Discover 70+ case studies using data analytics for supply chain sustainability🌳and business optimization 🏪 in this: Cheat Sheet

Build your Optimization Model

Create a Topological Map of your AGV Layout

A diagram of a topological map representing an AGV layout in a warehouse. It uses nodes to represent different locations (e.g., picking points, rotation points), connected by edges that show paths for AGVs. Key symbols, like “S” for shelves and “W” for waiting points, are marked at various nodes, showcasing how AGVs navigate through the warehouse. The diagram serves as a visual for modeling AGV routes using a graph theory approach.
Example of Djisktra Graph — (Image by Author)

Our layout is modelled by a graph G(N; E)

  • N is the set of nodes (circles above)
  • E is the set of edges (solid lines and arrows)
  • S represent the shelves (filled grey nodes indicate the places to store shelves)
  • R represents the points where AGVs rotate shelves.
  • W represent the waiting points where your AGV with a shelf waits for the completion of the picking activity of the AGV that arrived at the picking station before
  • P represents picking points at where the picker will take the products

This mapping will be included in an AGV Picking Simulation Model that will be used to test our routing strategies.

Pathfind using Djisktra Algorithm

Dijkstra’s algorithm is an optimization algorithm that solves the single-source shortest path problem for a directed graph with weighted edges (non-negative weights).

A simplified graph showing nodes connected by edges with weights, representing a warehouse’s AGV routing system. The nodes are labeled (e.g., A, B, C, D) with connecting edges, each marked with a numerical weight. The image illustrates the use of Dijkstra’s algorithm to calculate the shortest path for AGVs in a warehouse setting, optimizing their travel time and distance.
Example of Djisktra Graph — (Image by Author)

This length can be the absolute length of the path, it can also be computed considering other constraints situated on the edges or the nodes.

We can use three types of weight from the node u to the node v noted w(u, v)

  • Shortest Distance Route Weight: w(u, v) = d(u, v) (1)
    with d(u, v) the distance between u and v
    -
    Objective: take the route with the shortest distance
  • Shortest Travel Time: w(u, v) = d(u, v)/s(u, v) + r(u, v) (2)
    with s(u, v) the AGV translational speed and r(u, v) the time need for all rotations
    - Objective: take the route with the shortest travel time
  • Congestion Avoidance: w(u, v) = d(u, v)/s(u, v) + r(u, v) + Co(u, v) (3)
    with o(u, v) the number of AGVs planned to pass through the edge and C is the constant value for adjusting the weight
    - Objective: take the route with that avoids congestion with other AGVs

Reinforcement learning approach

At a time t, we define the state of the warehouse by:

  • Spatial locations of all the active vehicles (AGV that have routes assigned)
  • Spatial locations of all the active shelves (shelves that have items to be picked)
  • Workstations Order Lines Allocations (stations where items need to be transferred)

These parameters will vary in time, therefore let’s use a reinforcement learning approach to select the optimal route from these candidates by this state.

Agent Rewarding strategies

Your learning agent is rewarded for arriving at a destination node using three reward value approaches.

  • Productivity: the number of items picked per labour hour from the AGV starting at an origin point to arriving at a destination.
  • Idle time: time that a picker waits for the next AGV after picking items from a shelf with an AGV.
  • Speed: average speed of an AGV from an origin point to a destination
A visual summarizing the four steps involved in optimizing AGV routing. The steps are represented with icons: 1) Record order lines, 2) Build AGV picking simulator, 3) Use reinforcement learning algorithms, and 4) Simulate and analyze performance. The image highlights the iterative process of improving warehouse AGV operations using data, simulation, and learning algorithms.
Process — (Image by Author)

Simulation

Scenario

This first simulation is based on three days of picking: day 1 for training and days 2 and 3 for testing.

The first scenario of simulation — (Image by Author)

The results of the RL model will be compared with two simple route-planning strategies.

  • Random: select randomly a route among the shortest distance route, the shortest travel time route and the congestion avoidance route
  • Congestion: chose the congestion avoidance route at all times

Results

Results for each strategy — (Image by Author)

Surprisingly, the productivity reward performs less than the speed reward approach.

Maximizing each AGV's productivity may not be the best approach for collaborative work between vehicles to ensure high global productivity.

When congestion is the main bottleneck (i.e., when a high density of vehicles is running simultaneously), the congestion strategy performs well while requiring fewer computing resources compared to the RL approach.

Next Steps

These results are based on a specific layout with only two days of picking activity.

To better understand this approach, I will explain how to build an AGV picking simulator and implement routing strategies in the next article.

This model should be tested on a variety of order profiles to test the impact on productivity by tuning.

  • Number of lines per order (moves per order)
  • Quantity of units picked per line
  • Range of active SKUs

The choice of the strategy may vary if you have a promotion event on a particular group of SKU, a shopping festival (Black Friday, 11.11) or during the low season.

Go Beyond

For more conventional picking processes, you can find examples of process optimization using advanced analytics tools and process analysis concepts.

About Me

Let’s connect on Linkedin and Twitter, I am a Supply Chain Engineer that uses data analytics to improve logistics operations and reduce costs.

For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.

If you are interested in Data Analytics and Supply Chain, have a look at my website.

💌 New articles straight in your inbox for free: Newsletter
📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

References

[1] Acquisition of Automated Guided Vehicle Route Planning Policy Using Deep Reinforcement Learning, IEEE International Conference on Advanced Logistics and Transport (ICALT 2017)

[2] A Reinforcement Learning Method for Multi-AGV Scheduling in Manufacturing, Tianfang Xue, Peng Zeng, Haibin Yu
Lab. of Networked Control Systems, Shenyang Institute of Automation

[3] Online optimization of AGV transport systems using deep reinforcement learning, Bulletin of Networking, Computing, Systems, and Software, Kei Takahashi, Sogabe Tomah

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Samir Saci
Samir Saci

Written by Samir Saci

Top Supply Chain Analytics Writer — Case studies using Data Science for Supply Chain Sustainability 🌳 and Productivity: https://bit.ly/supply-chain-cheat