Neural Combinatorial Optimization with Reinforcement Learning (4-Example: BinPacking)

During the previous parts of these series we have gathered the necessary experience to build our first complete optimization model. The problem here presented is a Bin Packing problem.

In the bin packing problem, objects of different volumes must be packed into a finite number of bins or containers each of volume V in a way that minimizes the number of bins used. In computational complexity theory, it is a combinatorial NP-hard problem. (Wikipedia)

In this example, there are different types of packages, each of them having a particular size (i.e. pkt0 has a size of 3 slots). The goal of this optimization problem is to determine the bin in which each packet must be placed in order to minimize the number of total bins used.

Bin Packing problem using Reinforcement Learning

For that purpose, an agent must be able to match each sequence of packets (e.g. service [1,0,0,5,4]) to a sequence indicating the bin in which those packets occupy the minimum number of them (e.g. placement [0,0,1,1,1]). Therefore, this agent must follow one of the sequence model architectures seen in part 3.

For the agent, the environment is a black box. At first, the placement sequences computed are going to be random. Thanks to the rewards that it is going to obtain from the environment, neurons are going to be trained to achieve better rewards.

The interface between agent - environment is quite narrow. The agent receives a state vector, representing a sequence of packets to be placed. It computes the placement vector and acts on the environment. The only feedback it receives for that action is a reward.

Dimensionality of the problem

The number of dimensions in the problem is equal to the maximum sequence length. Note that the dimensions of the action space and the dimensions of the action space are equal, the length of sequence with more packages to be placed.

In the state space, each dimension can take discrete values corresponding to the packet. The same in the action space, where each dimension can take discrete values representing the bin.

In statistics, the number of these sequences of states and actions are about permutations with repetition. The number of permutations in both state and action spaces are exponential to the power of the dimensionality of the problem.

The essence of the problem is to find for each state (service sequence) the corresponding action (placement sequence) that maximizes the reward. With the complexity that the number of states and actions is exponential to the dimensionality of the problem.

The number of permutations in the state and action space can be calculated as:

Therefore, the number of all permutations in the problem:

To visualize the complexity of the problem, let’s set a specific service sequence. The number of all the possible placement permutations for that service can be calculated with the formula above. As they will belong to a high dimensional space, to visualize it a dimensionality a reduction technique as t-SNE shall be used.

t-SNE action space for a single state

As depicted, the action space is extremely discontinuous. Placements vectors that are quite similar, and threfore are close on the t-SNE map, can have completely different rewards. All methods based on the exploration are non-efficient on this environment. The aproach of Neural Combinatorial Optimization is to build an agent that embed the information of the environment in such way that states (service sequences) for which the agent has not been trained could point to optimal placements.

The agent

As the size of state and action sequences is the same, it is not mandatory to use a sequence to sequence model to build the agent.

In the code linked below, the solution is based on a multi-stacked LSTM cells. This structure picks each element on the service sequence and place it just remembering the items already located in the environment. This AI is performed to behave like a first-fit algorithm.

In case we want the agent to perform actions bearing in mind the whole sequence, a bidirectional RNN or a sequence to sequence model could be used.

Code for Bin Packing problem using Neural Combinatorial Optimization is available on GitHub!