An Overview of the Traveling Salesman Problem (TSP)

Team members: Sandeep Kota Sai Pavan (, Angelos Mavrogiannis (

Figure 1: Solution to 48 States Traveling Salesman Problem [15]

This post presents an overview of the Traveling Salesman Problem (TSP) and delves into its connection with decision-making for robotics.


The Travelling Salesman Problem (TSP) is a classical, well-studied problem in the fields of computer science and operations research. Given a set of cities and their pairwise distances, the goal is to calculate the shortest hence most efficient route that begins from an origin city, visits each city exactly once, and finally returns back to the origin city. Although this problem is easy to describe, it is not trivial to solve. It even gets more difficult when the number of cities increases, as there are additional combinations to consider while determining the optimal route. Just to give an example, over 3.5 million different possible routes must be considered when dealing with a set of just 10 cities.

Problem Statement

The Traveling Salesman Problem is the most popular example of a combinatorial optimization problem.

Let the label i, j=1,2,3,….n represent n cities, and:

Figure 2: Symmetric TSP with four cities. [15]

Then the traveling salesman problem can be simplified as a minimization problem as follows:

Mathematicians and researchers have proved that the traveling salesman problem is a NP-hard problem, i.e. the solution to this problem cannot be computed in polynomial time. There are many heuristics and exact algorithms to solve the TSP where one may achieve a result as close to 1% of the most optimal solution. The brute-force search approach is the easiest solution to the problem, but computationally it is not an efficient algorithm and takes O(n!) time for getting a solution, since there are (n-1)! possibilities. Early research shows that dynamic programming techniques helped reduce the time complexity of the algorithm, but it is still not feasible as it is can take a large amount of time for a considerably higher number of vertices.

The traveling salesman problem can be generalized to any routing problems and any task assigning problems. Instead of optimizing the equation by taking distance as a parameter, the other cost parameters specific for the application can be employed. The upcoming sections of this blog will discuss about the different solutions (both definite and approximate) that people have come up with.

Overview of the key results

Branch and Bound Techniques - Concorde TSP Solver

The early solutions of TSP involved using branch-and-bound techniques [1]. Branch and bound is a state space search method in which all the children of a node are generated before expanding any of its children based on a selection criterion for expansion. Being similar to the brute force approaches like breadth first search or depth first search, the time complexity of this method is not a significantly improved one. Branch and bound technique is the algorithm behind the Concorde TSP Solver developed by David L Applegate [2] in early 2000’s where the proof of optimality of this algorithm was shown by calculating an optimal tour of 85,900 cities [3]. The Concorde TSP solvers is a very fast and accurate solver used as a benchmark even in modern research. The Concorde solver was able to generate a near-optimal solution.

Ant Colonies for TSP

Ant colony optimizations is a class of optimization problems that is used to find paths in graphs using probabilistic techniques. This algorithm for solving TSP was developed by Marco Dorigo in 1999 is a bio-mimicry inspired solution to the TSP [4]. A group of artificial ant colony simulations were found capable of solving the TSP. The main concept idea of ant colony optimizations is the concept of pheromone secretion by ants. Ants are found to leave a trail of pheromone secretion behind while walking. The TSP graph is modified with cities representing food sources and the edges representing the pheromone secretions by the ants and the path followed. It was found that ant colonies follow the path with the most amount of pheromone secretion. They start with exploring in random directions. Once a food source is found, the ant that collects the food from source and returns to the base location secretes more pheromone on the path. Thus, the ants that are starting to explore from the base would start exploring from the path with the highest pheromone presence. It was found that after a few iterations, the ants are able to find the shortest path among a given set of vertices. This simulation of pheromone performed by artificial ant colonies showed that an optimal solution can be found by this method.

Figure 3: Ant colony optimization example from the original paper [4]. When ants find an obstacle, they will start exploring with equal probabilities around obstacles and the shortest path around the obstacle will have more pheromone as more ants may have passed through from that path in a given time.

Pointer Networks

Pointer networks [5] is a new architecture that is used to learn the conditional probabilities of an output sequence given the input sequence. This network is very similar to the popular sequence-to-sequence [6], where a series of RNN encoder and decoder network was used to generate a probability-based output sequence learned on the examples of input sequence. The main disadvantage of this network in combinatorial optimization problems was the fixed dimensionality of the output sequence which is equal to the length of the input sequence. This limitation did not allow these networks to be applied on general combinatorial optimization problems. A slight modification to this network architecture was made where an attention mechanism is generated over the inputs at the end of every iteration acting as a pointer to the next element of the output sequence. This modification removes the dimensionality constraint on the output sequence that can be generated.

Figure 4: Pointer Net architecture as shown in the pointer network paper. [5]

Figure 4 shows the network architecture of the Pointer network. The blue cells representing the encoding network is an RNN that converts the input sequence into a code which is fed into the purple decoder RNN. In the image shown the input sequence having size ([5,2]), the first axis dimension is 5 because the first data element represents the start of the sequence, and then passes through the encoder RNN. At each step, the content-based attention mechanism returns a vector of of softmax distribution with dictionary size equal to the input sequence. The argmax of this attention mechanism at each step, points to the next element in the output sequence. In the example shown in the figure, the attention mechanism at the first step points to the first node of the output sequence, i.e. Node 1 (x1, y1) followed by (x4, y4), (x2,y2), (x1,y1), end of sequence. The attention mechanism at each time step is represented by the gray arrows while the red highlights point the argmax values at each timestep.

The network architecture used in this paper is an LSTM network with either 256 or 512 hidden layers, trained over 1 million examples with a batch size of 128 and learning rate equal to 1. The input output pairs for training is (P, Cp), where P represents the cartesian coordinates representing the cities chosen at random and Cp represent the permutation of integers in the range of input sequence size representing the optimal path. The optimal path labels are generated using optimal solvers like Concorde or Held-Karp algorithm. The results of this network architecture has shown near perfect solutions to TSP when considering up to 25 cities, and good results when considering up to 35 cities.

Neural Combinatorial Optimization with Reinforcement Learning

This work by Bello [7] from 2017 is an updated version of the previously discussed pointer networks architecture. The main problem with training the mentioned pointer network architecture in a supervised manner is the expensive dataset needed to be generated for training. This work focusses on training the network using reinforcement learning with negative graph lengths as a reward function and finding the optimal parameters of the network through a policy gradient method.

An Efficient Graph Convolutional Network Technique
for the Travelling Salesman Problem

In one of the more recent works, C.K. Joshi [8] showed that graph convolution networks can be used as a very good approximation for the TSP problem. Trained on the optimal tours found using the Concorde solver, this GCN based approach was able to predict solutions with 0.01% optimality for graphs with 50 nodes, and 1.39 % for graphs with 100 nodes.

Figure 5: The proposed Graph Convolutional Network [7]

Graph convolution networks are most suited for problems which involve learning graphs. Simple fully connected networks cannot be used in such cases because of the lack of knowledge of the connected nodes and their properties. The graph convolution layer takes the information of the edges and the vertices as the input and predicts the adjacency matrix as the output. A beam search operation on the adjacency matrix would return the best possible TSP tour from the predicted output graph. A flow diagram of the model is shown in figure 5.

Given the input mentioned above as the node feature, as well as the edge features at layer l, the features at the layer l+1 are written as:

where σ denotes the sigmoid function, BN represents batch normalization layers and ReLU represents the Rectified Linear Unit activation.

Trained on more than a million TSP tour examples generated by the concord solver, the GCN generated a very optimal solution in terms of the length of the TSP tour. But when considering the time to find the solution, it is not close to the concord solver, even though it can be better than the other deep learning-based techniques.

Variants of TSP

Most of the times, the symmetric version of the TSP (sTSP) is considered. This means that the distance between a city A and a city B is the same, regardless the direction of travel, leading to a formation of an undirected graph.

However, sometimes paths may differ depending on the direction, forming a directed graph. This corresponds to the asymmetrical TSP (aTSP) which has increased computational complexity.

In the case when more than one salesmen are allowed to participate in the problem, we get the variant of multiple traveling salesman problem (mTSP).

A known variant of the TSP is the Bottleneck Traveling Salesman Problem (bottleneck TSP), in which the goal is to find a Hamiltonian cycle in a weighted graph which minimizes the weight of its weightiest edge.

Numerous profit-based variants of the TSP have been proposed [9], which associate a profit value with each vertex and do not require the visitation of all nodes, but explore the tradeoff between the total collected profit and the total cost coming from the distance traveled.

In the generalized TSP, vertices are divided into clusters and the salesman has to visit at least one vertex from each cluster.

The vehicle routing problem (VRP) constitutes a well-studied generalization of the TSP, according to which there is a fleet of vehicles that have to deliver goods to a set of customers, and the goal is to determine the optimal set of routes for these vehicles, minimizing the total cost of the routes.

In the time windows-based TSP, each destination has to be visited within a specified time window.

The maximum TSP is the exact opposite of the traditional TSP, since the salesman has to visit every destination exactly once and return to the origin destination, however maximizing the total distance covered in this case.

An interesting variant called the traveling purchaser problem (TPP) consists of a set of marketplaces, the cost of traveling between them and a list of available goods along with their prices at each marketplace. The goal here is to minimize the combined cost that comes from traveling as well as from purchasing the goods.

Opposite to the traditional TSP, kinetic-based variants assume that the destinations can move. In the moving-target TSP, a chaser has to reach a set of destinations in a minimum amount of time, and the destinations are moving with constant speed.

TSP in Robotics

In many robotics applications, time- as well as energy-efficiency are very important factors. Especially in industrial robotics, where production speed is crucial, optimization of both the high-level task planning as well as the motion planning of robots is paramount. In these cases, end-effectors are required to move efficiently between goal locations where they have to execute certain tasks (e.g. welding). To this end, the Traveling Salesman Problem formulation can be applied and solved in order to optimize performance. Gueta et al. [10] study the problem of a 6-DOF robot having to reach multiple locations efficiently on an object moving on one dimension. As the goal locations change, collisions start to become imminent. However, exploiting the redundancy of the robotic system, a large set of different robot configurations is possible, and along with the solution of the TSP, the dynamic nature of the problem is successfully dealt with. To deal with the large number of possible configurations, Gentilini et al. [11] model the problem as a Traveling Salesman Problem with Neighborhoods (TSPN) with each neighborhood representing all the configurations corresponding to each goal location.

When robots are deployed in the field with objectives such as sensing, collecting data and exploring the environment, they can benefit greatly from introducing the TSP to the path planning problem. Tokekar et al. [12] explore the joint deployment of an Unmanned Aerial Vehicle (UAV) and an Unmanned Ground Vehicle (UGD) in order to conduct aerial and soil measurements of the nitrogen levels in an agriculture plot. The UAV has limited battery life and the UGV may take too much time to take measurements. To remedy this, the UGV carries around and deploys the UAV in selected locations for aerial measurements, while itself optimizing its own process of taking soil measurements through a formulation of a variant of the TSPN. In a similar note, Cai et al. [13] explore a human-robot collaborative environment, where a target to be classified is located at each one of N locations. A UAV performs the initial sensing and classification, however, as the classification may not be satisfactory, a closer site inspection from a ground robot or communication with a human operator would be required. Due to limited motion energy budget allocation to the ground robot, as well as limited bandwith for communicating with the human, the ground robot must optimize its performance by solving a selective TSP (or orienteering) problem.

Figure 6: An example of a human-robot collaborative Traveling Salesman Problem studied in [13].

Overview of the important applications

There are numerous applications of the Traveling Salesman Problem, but here we are going to mention the most important ones as seen in literature [14].

Drilling of Printed Circuit Boards: During the production of printed circuit boards (PCB), holes need to be drilled on the boards in order to connect a conductor from one layer to a conductor on another layer. However, since not all the holes to be drilled have the same diameter, a different drill tool is required for each different diameter. In order to optimize production, all holes with a certain diameter have to be drilled in consecutive time intervals so that the time spent changing the drill head is minimized, and the drill tool has to travel the least amount of time to traverse each set of holes. This can be approached with a set of TSPs, one for each set of holes with a certain diameter.

Computer Wiring: In order to connect different modules on computer boards, a set of pins has to be connected, with the requirement that each pin is connected to a maximum number of two wires. Opting for minimum length of wire used, this problem can be solved by TSP, if the starting point coincides with the terminating point, otherwise it can be solved through identifying the shortest Hamiltonian path between the two points.

Order-picking in warehouses: An important application of the TSP is associated with the order-picking process in warehouses. An order arrives including items located in the warehouse. In order to ship all items to the customer, a worker (or a vehicle) needs to travel to all the item locations and collect them. Accordingly, to optimize the procedure time-wise, a TSP can solve this problem, with the item locations as the nodes and the time it takes for the worker to travel from one item location to the other as the distances.

Vehicle Routing: Another important application is the vehicle routing problem, according to which a fleet of trucks is deployed with the objective of delivering goods to a set of customers. The problem lies in the computation of the optimal delivery route for every truck, accommodating all customers and reducing the total distance covered. Given no constraints in time and capacity of the trucks, the problem can be solved with a TSP formulation. (classical TSP for one truck, and mTSP for multiple trucks).

X-ray Crystallography: Another application can be found in the field of crystallography, where a detector is moving to a large number of points in order to measure the intensity of X-ray reflection of the crystal at these points and hence draw some conclusions about the structure of the crystalline material. TSP can be used to minimize the total time that the detector is moving, by determining the optimal sequence of points.

Discussion of Open Research Problems

Even though the Concord solver gives the best optimal solution in the least possible time, it has not been possible to develop learning-based approaches with similar performance. It is required to solve the TSP problem with learning-based approaches for generalizing the solution to the other NP-Hard problems. Reinforcement learning-based approaches may help in reducing the dependency on expensive datasets for training, but it is an active research problem to analyze and improve the performance of such techniques.


  1. Manfred Padberg and Giovanni Rinaldi. “A Branch-and-Cut Algorithm for the Resolution of Large-Scale Symmetric Traveling Salesman Problems”. In: SIAM Review33.1 (1991), pp. 60–100.doi:10.1137/1033004. eprint:
  2. David Applegate, William Cook, and Andr ́e Rohe. “Chained Lin-Kernighan for Large Traveling Salesman Problems”. In:INFORMS Journal on Computing15.1 (2003), pp. 82–92.doi:10.1287/ijoc.
  3. David L. Applegate et al. “Certification of an optimal TSP tour through 85,900 cities”. In: Operations Research Letters37.1 (2009), pp. 11–15.doi:10.1016/j.orl.2008.09.006.
  4. M. Dorigo and L. M. Gambardella. “Ant colonies for the travelling salesman problem.” In:BioSystems43 2 (1997), pp. 73–81.
  5. Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly.Pointer Networks. 2017. arXiv:1506.03134[stat.ML]
  6. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le.Sequence to Sequence Learning with Neural Networks.2014. arXiv:1409.3215 [cs.CL].
  7. Irwan Bello et al.Neural Combinatorial Optimization with Reinforcement Learning. 2017. arXiv:1611.09940 [cs.AI].
  8. Chaitanya K. Joshi, Thomas Laurent, and Xavier Bresson.An Efficient Graph Convolutional Net-work Technique for the Travelling Salesman Problem. 2019. arXiv:1906.01227 [cs.LG].
  9. K. Ilavarasi and K. S. Joseph. “Variants of travelling salesman problem: A survey”. In: InternationalConference on Information Communication and Embedded Systems (ICICES2014). 2014, pp. 1–7.
  10. L. B. Gueta et al. “Coordinated motion control of a robot arm and a positioning table with arrangement of multiple goals”. In:2008 IEEE International Conference on Robotics and Automation.2008, pp. 2252–2258.
  11. I. Gentilini, K. Nagamatsu, and K. Shimada. “Cycle time based multi-goal path optimization for redundant robotic systems”. In:2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2013, pp. 1786–1792.
  12. P. Tokekar et al. “Sensor planning for a symbiotic UAV and UGV system for precision agriculture”.In:2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2013, pp. 5321–5326.
  13. H. Cai and Y. Mostofi. “A human-robot collaborative Traveling Salesman Problem: Robotic site inspection with human assistance”. In:2016 American Control Conference (ACC). 2016, pp. 6170–6176.
  14. R. Matai, S. Singh, and Madan Lal Mittal. “Traveling Salesman Problem: an Overview of Applications, Formulations, and Solution Approaches”. In: 2010.
  15. “Travelling salesman problem” Wikipedia, Wikimedia Foundation, September 22, 2020,