Traveling Salesman Problem


If you already know enough about genetic algorithms for the Traveling Salesman Problem and don't like reading like me, check this out: Traveling Salesman Problem demo.

The Traveling Salesman Problem (TSP) is a famously difficult problem to tackle. As the name might suggest, it can be understood best through the following little story:

A salesman wants to solicit magazine subscriptions in several cities. He simply wishes to visit each city once and end up back where he started. As a business man, he also would like to minimize traveling costs so he wants to find a route that minimizes the total distance traveled.

A naive method for finding the guaranteed best path, would be to try all possible paths. For a set of 10 cities, there are 9!/2 (181,440) paths to consider. The number of solutions to consider grows with the factorial of the number of cities. Too many cities makes the problem intractable. There are other approaches that don't guarantee perfection but approach it fairly well and fairly quickly. Genetic algorithms are one such approach. They are a method of searching a solution space.

In a nutshell, these algorithms implement a survival of the fittest evolutionary approach. For the TSP, generations of solutions, or paths, evolve until a generation arrives with paths that are all essentially the same. Evolution occurs through two operations, crossover and mutation. Crossover typically takes two of the fitter (shorter) paths and combines them somehow into a single offspring solution. Mutation simply ensures variety by modifying a single solution in some way.

My main goal was simply to build a nice little web tool to visualize and play around with the genetic algorithm approach to the TSP. Firstly, in the background you can see the currently best path discovered by the genetic algorithm. Over that is a plot of the distances of the best path in each generation. At the top is a representation of convergence. A single city is considered to have converged if it connects to two other cities 95% of the time or more. If a city only connects to two other cities, it is likely that those are near optimal cities to visit either before or after the city in question. The visualization at the top gives a bar for each city. The length of the bars shows the percentage converged. I am quite proud of my visualization if I do say so myself.

I did, however, implement two of my ideas with some success. (Note: I by no means claim or even suspect that I am the first to discover them.) The Sequential Constructive Crossover (SCX) operator seems to work quite well so I focused most of my effort on mutation operators and an initialization method. Originally, the initialization of paths in generation 1 are created by randomly selecting one of the unvisited cities one at a time. My approach chooses the next city based on the proximity to the current city. However, if it always picked the closest city, generation 1 would have no variety and thus would not explore the solution space very well. So instead closer cities are simply given a greater probability of being chosen. The following visualizations should demonstrate the impact of the probabilistic method. You should particularly be able to notice the lower initial distance on the orange plot.

Random

Probabilistic

While using what I like to think of as a probabilistic greedy initialization consistently produces a fitter initial pool of potential solutions, the end result is usually fairly negligible. Sometimes it helps the genetic algorithm converge a few generations sooner, but not always. My experimentation with mutation operators displayed more significant results.

One of the simplest and easiest to implement mutation operators merely swaps the order of two randomly chosen cities within the path. So instead of visiting city ABC 5th and city XYZ 9th, you swap them and visit ABC 9th and XYZ 5th. This operator does it's job in ensuring that the genetic algorithm does not converge too quickly without trying various different orderings, but as the algorithm converges, it becomes increasingly unlikely that an advantageous swap will take place.

So the operator that makes the most visible sense to me involves intersections. It is never the optimal solution if there are intersecting edges in the path. For example, imagine someone's given the task of setting up the corner flags on a soccer field. They could go in a sort of X shape or they could walk along three of the sidelines/endlines. Triangles can prove that the X pattern would be longer. Anyway, my mutation operator searches through a path and attempts to fix the first intersection it finds. There are a number of ways it could try to do this, but it simply takes one of these ways at random which means sometimes it will not fix the intersection problem - sometimes it will even make it worse.

Random Exchange

First Intersection

This operator eliminates most, if not all, obvious errors. The biggest downside to this operator is the fact that it only applies when trying to minimize physical distance. One might also like to minimize cost. If we were assessing the genetic algorithm's fitness with flight costs, there would be no useful sense of intersection.

For the right problem though, this operator can be quite useful. It performed much better than the operators I built to try to discourage very acute angles <insert joke about very cute angles>. One final thing about the intersection operator is that the logic to determine intersections could be used in place of a convergence measure. There is no need to stop the genetic algorithm if there are known intersections in the best solution because that means it's not optimal.

Though SCX already works well, I may someday make my own crossover operator. The idea I played around with didn't seem to work too well though, so I will investigate further and see if I need to go back to the drawing board.

If you have made it this far, you should check out the TSP demo I suggested at the top of this post. You can tinker around with it. For example, you can try a dataset with up to 662 cities and see how the algorithm fares.