Machine Learning and Combinatorial Optimization Problems

Nathan Perlmutter
Crater Labs
Published in
5 min readJul 31, 2019


Perhaps the most ubiquitous of all mathematical topics that appear in industrial applications is the topic of combinatorial optimization. Combinatorial optimization is a class of problems that consists of finding an optimal object from a finite set of objects. Famous and ubiquitous examples of such problems include the traveling salesman problem, capacitated vehicle routing problem (CVRP), the knapsack problem, maximal flow problem, and any integer programming problem. By definition, the search space for a combinatorial optimization is finite, and thus an exact optimal solution always exists. However, for most interesting problems (including those listed above) the size of the search space grows exponentially with the size of the problem’s input, thus finding an exact optimal solution is usually computationally infeasible.

Due to the computational infeasibility of finding exact solutions, in practice one has to settle for the use of heuristic algorithms yielding approximately optimal solutions. Heuristics are often fast and effective but, unlike exact solutions they lack any theoretical guarantee of optimality. New heuristic algorithms are generated by humans through trial and error with very little general theory to guide their development. For this reason there is potentially much to be gained by using artificial intelligence (AI) to automate and systematize the heuristic design process.

An AI algorithm differs from the traditional heuristic approaches by the fact that it has the ability to learn from experience, and improve its own performance over time as it gains exposure to more data. For this reason an AI algorithm can be viewed as a meta-heuristic for learning new heuristics. The application of AI to combinatorial optimization problems is currently an active area of research in academia. Some recent influential papers include: 1) Learning combinatorial optimization algorithms over graphs; 2) Reinforcement learning for solving the vehicle routing problem; 3) Attention, learn to solve routing problems. These three papers apply deep reinforcement learning models to generate solutions to the traveling salesman problem and certain versions of the capacitated vehicle routing problem. The above mentioned papers report results that are competitive or close to competitive with current (non-AI) state-of-the-art solvers in some cases. As more research is carried out, the results should continue to improve.

Our work on the vehicle routing problem

At Crater Labs during the past year, we have been pursuing a research program applying ML/AI techniques to solve combinatorial optimization problems. We have been building on the recent work from the above mentioned papers to solve more complex (and hence more realistic) versions of the capacitated vehicle routing problem, supply chain optimization problems, and other related optimization problems. Thus far we have been successful in reproducing the results in the above mentioned papers, and we are now actively working to push their work further to tackle more complex problems. Some of these problems include:

[a] The CVRP with multiple delivery trucks stationed at multiple depots. The delivery orders are equipped with time constraints and the trucks are subject to compatibility constraints on the items that they can carry.

[b] Very large instances of the CVRP consisting of thousands or tens of thousands of nodes as opposed to just hundreds. Indeed, the papers referenced above only consider the CVRP with one-hundred nodes or less.

[c] The stochastic vehicle routing problem. This is where the delivery orders and/or positions of the nodes changes randomly while the delivery truck is making its route. The challenge here is to forecast the future supply and demand of the different nodes on the graph, and then generate routes according to those forecasts.

[d] Supply chain optimization problems. Similar to the vehicle routing problem but where the trucks are making both pickups and deliveries, and the start and end locations of the delivery truck are not necessarily the same.

Below we describe some details of our working models for the vehicle routing problem.

AI Models for the capacitated vehicle routing problem

Our current working model for the vehicle routing problem consists of two main components, an Encoder and a Decoder. The job of the encoder is to receive the CVRP instance (in the form of a graph) and return a vector representation for each of the nodes in the input graph. The decoder receives the output of the encoder as input. Its job is to sequentially return the destinations of the delivery truck’s tour in the order determined by its “learned policy”. The encoder and decoder consist of neural networks with trainable coefficients. The optimal coefficients are learned using policy-gradient reinforcement learning employing the REINFORCE algorithm (see Policy gradient methods for reinforcement learning with function approximation). The model is “modular’’ in the sense that one could substitute various neural architectures for the encoder and decoder so long as the logic by which the output route is processed remains constant.

In the paragraphs below we describe in some detail the neural architectures that we use for the encoder and decoder.


As mentioned in the above paragraph, the job of the encoder is to receive a graph representing a particular CVRP instance, and return a vector representation for each of the nodes in the input graph. The input graph is represented by a pair (N, E), where N = (x_{1}, .., x_{n}) \in \R^{n* k} is the list of nodes, and E \in \R^{n²} is the weighted adjacency matrix for the graph. Each x_{i} \in \R^{k} is a vector describing the features ascribed to the given node, i.e. its geographic location, the volume and weight of its demand, the start and end times of its time window of availability, etc… The output of the encoder is a list of new vectors \hat{x}_{1}, …, \hat{x}_{n}) \in \R^{n*m} encoding the topological structure of the graph. The output vector \hat{x}_{i} \in \R^{m} corresponds to the input node x_{i}. But, as a result of having been processed by the encoder, \hat{x}_{i} contains information about its neighbours (and neighbours of neighbours, and neighbours of neighbours of neighbours, etc…), while the input x_{i} is just an isolated node with no knowledge of the graph topology. Observe that the encoder’s output no longer contains the adjacency matrix, it contains just a sequence of “encoded’’ node vectors. The integer m is a chosen hyper-parameter in the model. It is usually chosen to be larger than $k$, the length of the input node features. The neural architecture used for the encoder is a graphical attention network similar to those found in Inductive representation learning over graphs, and Graph attention networks. The models proposed in these papers are very flexible, and they leave lots of room for our own adjustments and additions tailored for the peculiarities of our own specific problem. Another candidate encoder model that we have experimented with is the transformer. This is an architecture that was first used for neural machine translation (and other NLP tasks) in Attention is all you need, and later applied to the vehicle routing problem in Attention, learn to solve routing problems.

Decoder Architectures

As mentioned above the decoder receives the output of the encoder as input. Its job is to sequentially return the destinations of the delivery truck’s tour. This generally contains three submodules, a recurrent neural network (RNN), an attention mechanism, and an output softmax classifier. This design is inspired by models traditionally used for neural machine translation and have been adopted for the purpose of solving vehicle routing problems. There is some question as to how useful the RNN piece really is in this model, and by ``beefing up’’ the attention mechanism, one may be able to do without it.This was shown in Attention, learn to solve routing problems and it is something that we have been experimenting with as well.