Smart Automation bin Packing with Reinforcement Learning

Divya Chandana
The AI Guide
Published in
6 min readMar 28, 2022

It’s not about Kubernetes Automatic bin packing where it places containers automatically based on the required resources and making sure of availability

3D-BPP practical application in Logistics[4]

Main Purpose

Automation improves the accuracy, efficiency, productivity, efficient use of materials, can operate 24x7. In the Logistics industry, the ever-flowing packages put a strain on producers and distributors to stay put with the increasing competition companies need to automate the process as much as possible to fulfill customer satisfaction.

3D bin packing ?

A ton of problems in large-scale industries can be solved using Deep Learning Algorithms. One such problem we are discussing in detail is about 3D Bin Packing Problem(3D-BPP), where the packages are generally conventional cuboid shapes. We want to pack them all together into rectangular bins of the same size. The cost of inventory wrapping, transportation, and warehousing can be effectively reduced by maximizing the storage use of bins. 3D-BPP is a complex and under-explored area, and we still have to rely on heuristic techniques.

Objective

The objective is to place items in such a way that they minimize the surface area of the bigger bin. Along with minimizing the spare in the bin, the major objective here is properly placing the bins in a balanced way while piling up the packages.

The 3-D bin packing method divides a set of packages into several bins of varying sizes to maximize the specified objective function, which is to reduce the number of bins necessary to pack items of varying sizes. This includes an automatic robotic arm, a conveyer belt, and a sensor. The packages must be packed into a single bin, and they must be packed immediately without any buffering time or readjusting the bin’s packages. The catch is that we won’t have all of the information for all of the items at once; instead, we’ll learn it as we go and pack accordingly. The robot can only see a few impending objects, and those items must be packaged within a certain amount of time as soon as they arrive. We must evaluate both cost and efficiency in this case; if the robot often unloads and reassembles the packages, it is inefficient. Deep reinforcement learning with an actor-critic architecture is the simplest way to approach this challenge.

Bin problems are extremely difficult to solve in practice because they are strongly NP-hard[2]. As a result, heuristics are frequently utilized as an alternative to exact algorithms for achieving a plausible answer in a reasonable amount of time, even if it does not guarantee the best optimal option.

Whoa!! Let’s step back, that’s a lot of talk.

In the case of 1D bin packing classic NP-Hard problem, sees for collection of items with various weights into the bin, in which a set of different items of given weight is to be packed into the smallest number of bins of identical capacity, with the total weight of the placed items in a bin not exceeding the bin’s capacity. Known heuristic strategies[1] for handling this problem include the best fit decreasing and first-fit decreasing[4]. The weight of the objects in a bin is equal to the ideal arrangement of items with the fewest bins. Large 1D-BPP examples can be solved in minutes using integer linear programming, and good approximations can be obtained in milliseconds, thanks to state-of-the-art computing hardware.

To maximize the application of 3D-BPPP, a deep reinforcement learning algorithm has been proposed. There are several restrictions, such as not stacking packages on the conveyor; they must be in the correct order. When an item arrives, it is instantly packaged, and no changes can be made after that. Defining the problem as a Constrained Markov Decision Process (CMDP) and presenting a constrained DRL solution based on the actor-critical framework. As an auxiliary job, the agent predicts a feasibility mask for placement actions in this prediction projection scheme for restricted DRL training. The actor’s action probabilities are modulated using the mask.

Neural Network Architecture

How Stuff Works

Various 1D-BPP versions exist, such as the cutting stock problem (CSP), in which we wish to cut bins to reduce desirable items of various weights and the total number of bins used. These are NP-hard issues, and the existing research focuses on developing good heuristic and approximation methods, as well as analyzing their worst-case performance.

Various 1D-BPP versions exist, such as the cutting stock problem (CSP), in which we wish to cut bins to reduce desirable items of various weights and the total number of bins used. These are NP-hard problems, and the existing research focuses on developing good heuristic and approximation methods, as well as analyzing their worst-case performance.

The fundamental distinction between 1D and 2D/3D packing challenges is the verification of packing feasibility, which involves assessing if objects can be accommodated inside the bin without interpenetration and the packing is within bin size.

Upcoming bin in green [2]

Algorithms

According to statistics, solving the 3D-BPP of a size matching and actual parcel packing pipelining for [thousands of packages] is infeasible, thus using approximation techniques is a more practical solution. Heuristic local search iteratively improves existing packing by searching within the neighborhood function across the set of solutions. There are numerous ways for fast approximate algorithms, such as guided local search, tabu search. In contrast, genetic algorithms lead to better solutions global, Randomized search.

Deep Reinforcement Learning (DRL) is a type of reinforcement learning that With high dimensional raw sensory state-space, DRL has shown significant success in acquiring complex behavior skills and performing challenging control tasks. On-policy and off-policy approaches are used here, with the on-policy algorithm optimizing the policy using data from agent-environment interactions sampled from the present policy. Off-policy approaches, on the other hand, are more data-efficient but less stable. Data regarding the subject of robot-environment interaction is easy to come by, hence data efficiency is a top priority. Formulating 3D-BPPP as restricted DRL and solving it by projecting the trajectories collected from actors to the constrained state-action space, is a good way to approach the on-policy actor-critic framework.

Future Scope

We can extend our method to see lookahead items, multi-bin packing, and item re-orientation using this supervision and projections, allowing the agent to develop a feasible policy very efficiently.

Limitations

Ops! what about Convex items? What about the stability of the objects?
Curious? will continue our learning journey in the next episode

References

[1] https://www.researchgate.net/publication/314657085_A_column_generation-based_heuristic_for_the_three-dimensional_bin_packing_problem_with_rotation

--

--