Traveling as a Salesman

The Traveling Salesman Problem (TSP) is a famous and very well studied problem in mathematics, computer science, and operations research. The problem is follows:

Biff Koman is a salesman tasked to visit several sites and he wants to visit them in the best order so as to minimize his total travel distance without backtracking. Biff is able to calculate the distance between any two sites but calculating the total distance of every possible route isn’t possible. The number of possible routes is huge! If there are $N$ sites to visit we have $N!$ many routes, or the permutations of the sites. Biff unfortunately doesn't know python or linear programming so he resorts to a plausible greedy approach: the next site to visit should be the one closest to whichever site he is currently in. This doesn't often work though, poor Biff.

This problem belongs to a class of problems called NP-Hard which sorta means we shouldn't expect to solve it in polynomial time without some clever approximations or heuristics. For a small number of sites, a good computer, and a strong solver, linear programming has a solution! Because TSP is a very well studied problem with tons of practical applications, for larger problems there are better solutions available and highly optimized software packages to help out. TSP also has many variants which we will explore later also.


In [1]:
from pulp import *
import numpy as np

1. First lets make some fake data


In [2]:
#a handful of sites
sites = ['org','A','B','C','D','E','F','G','H','I','J','K']

In [3]:
#non symetric distances
distances = dict( ((a,b),np.random.randint(10,50)) for a in sites for b in sites if a!=b )

The model

We are going to model this problem as an integer program. Lets imagine we have a tour through our sites, and site i is in the tour followed by site j. We can model this with a binary variable $x_{i,j}$ that should be 1 only when site i is connected to site j.

$$x_{i,j} = \begin{cases} 1, & \text{if site i comes exactly before j in the tour} \\ 0, & \text{otherwise} \end{cases} $$

Where this holds for all i,j combinations except where i = j.

Each site is visited exactly once, so that means if we fix j and look at all the $x_{i,j}$, then these represent the connections into that site j so only one of those can be 1. We can express this equivalently by requiring that the sum of those $x_{i,j}$ equal 1 for each fixed j. i.e.,

$$\sum_{i \neq j} x_{i,j} = 1 \space \forall j$$

Alternatively, there should be one and only one way of exiting a site, so we have also

$$\sum_{j \neq i} x_{i,j} = 1 \space \forall i$$

So we have our variables, what should the objective be?

Our objective is the total tour distance:

$$\sum_{i,j \space i \neq j} x_{i,j} Distance(i,j)$$

This is a lot of variables! If we have $N$ sites then we are creating $N^2 -N$ many binary variables and in general the more integer variables you have the harder the problem gets, often exponentially harder.


In [4]:
#create the problme
prob=LpProblem("salesman",LpMinimize)

In [5]:
#indicator variable if site i is connected to site j in the tour
x = LpVariable.dicts('x',distances, 0,1,LpBinary)

In [6]:
#the objective
cost = lpSum([x[(i,j)]*distances[(i,j)] for (i,j) in distances])
prob+=cost

In [7]:
#constraints
for k in sites:
    #every site has exactly one inbound connection
    prob+= lpSum([ x[(i,k)] for i in sites if (i,k) in x]) ==1
    #every site has exactly one outbound connection
    prob+=lpSum([ x[(k,i)] for i in sites if (k,i) in x]) ==1

Subtours and why we need way more constraints

There is still something missing in our solution. Lets imagine we have 6 sites A,B,C,D,E,F. Does the following satisfy our existing constraints?

$$A->B->C->A \\ and \\ D->E->F->D$$

Each site is visited only once and has only one inbound and one outbound connection, so yes, it does. The problem is what is known as a subtour. We need to require that all of our sites are on the same tour! A common brute force way of doing this is to require that every possible subset be connected, but this requires an exponential amount of constraints because of the way the number of subsets grows. Instead we will introduce a new $N$ many dummy variables $u_{i}$ which will track the order at which site i is visited.

$$u_{i} : \text{order site i is visited}$$

Consider what $u_{i}-u_{j}$ should be. It should depend on what $x_{i,j}$ is. If they are connected the delta should be exactly -1. If they are not connected it could be anything up to N-1 cause the tour only has N-1 steps.

$$u_{i}-u_{j} \leq N(1-x_{i,j}) - 1$$

We need to add this for every site connection possible except for the site we start off at. We are adding on the order of $N^2$ many more constraints.


In [8]:
#we need to keep track of the order in the tour to eliminate the possibility of subtours
u = LpVariable.dicts('u', sites, 0, len(sites)-1, LpInteger)

In [9]:
#subtour elimination
N=len(sites)
for i in sites:
    for j in sites:
        if i != j and (i != 'org' and j!= 'org') and (i,j) in x:
            prob += u[i] - u[j] <= (N)*(1-x[(i,j)]) - 1

In [10]:
%time prob.solve()
print(LpStatus[prob.status])


CPU times: user 10.4 ms, sys: 6.05 ms, total: 16.4 ms
Wall time: 246 ms
Optimal

And the result:


In [11]:
sites_left = sites.copy()
org = 'org'
tour=[]
tour.append(sites_left.pop( sites_left.index(org)))

while len(sites_left) > 0:
    
    for k in sites_left:
        if x[(org,k)].varValue ==1:
            tour.append( sites_left.pop( sites_left.index(k)))
            org=k
            break
            
tour.append('org')

tour_legs = [distances[(tour[i-1], tour[i])] for i in range(1,len(tour))]

print('Found optimal tour!')
print(' -> '.join(tour))


Found optimal tour!
org -> E -> B -> H -> I -> J -> D -> G -> K -> A -> F -> C -> org

The total tour length:


In [12]:
sum(tour_legs)


Out[12]:
181

In [ ]: