*almost for sure
TLDR: Using genetic algorithms approach I calculated a near-optimal route through all the Moscow Subway stations (there were 200 stations at that moment). Then on the 12th of May 2016, we tested this route in a real offline marathon. It took 16 hours and 22 minutes to visit and took photos of each of 199 stations (one station was on maintenance) while broadcasting our trip real-time to social networks.
About Moscow Underground System
Moscow Underground System is known as one of the most beautiful and efficient underground transport systems in the world. Opened in 1935, now it has 13 underground lines and 214 stations, many of them decorated with statues, marble columns, and frescoes. At the moment Moscow Metro is on the third place in TripAdvisor’s “things to do in Moscow” list. Moscow Metro opens at 05:25 and closes at 01:00 for night maintenance. The minimum interval between trains is 90 seconds during the rush hours and up to 10 minutes at night before closing. The exact schedule is really complex and kept in secret due to the security reasons.
It started like a joke and fast became a kind of challenge — is it really possible to visit each of Moscow Underground stations and take photos in one day? I started to investigate and soon discovered a lot of similar projects: there are similar competitions in New-York Subway since 1966. As for Moscow Subway, previously, somebody known as estrella-de-sur made a trip through each of station in 12 hours 36 minutes (making one step on each of them). But we wanted to make a good photo of each station, which means we probably would have to wait for another train. I tried to take it into account. So I decided to calculate the optimal route, but I needed to get a detailed data on graph and schedules first.
First of all, I had to get the graph of the Subway (considering stations as nodes and train lines and walk transitions as two types of edges). Then I wanted to enrich this data with some meta-info: geographic coordinates, the exact time of opening and closing, times of transitions, delays between trains and so on. I also added some data on bus routes between the end stations, hoping to use them as shortcuts between far-distant ends of lines.
The main graph and a lot of useful meta-data I took from a freeware calculator named pMetro. I found almost everything in its data file Moscow.pmz, which appears to be a zip-file with a lot of files in .ini format. I still needed to get info on some new stations, so I took the info on them from Wikipedia articles and an online service Yandex.Metro.
Here you could find my data as gists:
The resulting graph was looking like this:
Traveling salesman’s genome
Now I had to solve a classic traveling salesman problem for a dynamic incomplete graph with 200 nodes. As you probably know, this problem is NP-complete and probably transcomputational. In other words, we don’t know a way to find the optimal solution except for full brute-force search, and a number of candidates is large enough to keep us busy for billions of years. In practice, we could try to find some sub-optimal solution — not the best one, but good enough for our purposes.
Here I should note that the first normal motivation for a person is to try to draw the best route manually or at least sketch it and then funetune it automatically. But since the delay of the edge depends on its position in the route (ie, from the time of day), and 51 stations are located on a dense section of the graph (inside the ring and on it), the common sense approach results are usually really bad.
So I decided to use the genetic algorithms approach: let’s describe each possible route as a genome vector, define the fitness function (total time in our case) and run an evolution with random mutations and pairing between the best specimens. After several thousands of generations, we could have a nice solution, perhaps.
First of all, I had to choose the way to encode a route as a vector. After several unsuccessful attempts, I ended with the permutation of length 200 as a genome. This vector captures the order of first visits to each station, assuming the shortest path between each pair of neighbors.
Then I tried to use several different types of mutations and pairings:
- a random shuffle of a random subinterval in the given genome — could be done pretty fast and useful in the early stages of optimization;
- a reflection of a random subinterval in the given genom e— could be useful for leaving the local suboptimal extremums;
- an exchange of X random positions in the given genom e— a delicate but slow mutation, useful for light finetuning.
- a crossover of two genoms — select the point on genomes, cut the tails at this point and swap them, then make the genome unique and add all the lost stations at random places.
On each iteration, we take X best specimens and run on them a bunch of random mutations and parings to build the next generation of specimens. It’s also useful to add to the population some random routes after each iteration just to avoid the inbreeding. The 50K generations calculation took in average something like 20 minutes on my desktop. During the optimization, we could visualize the best route of each, say, 100th generation:
Let’s look at one of the best routes which duration is 19 hours and 51 minutes. It’s bad because as I wrote in the beginning, Moscow Underground System opens at 05:25 and closes at 01:00, which gives us 19 hours and 35 minutes. Remember, this duration was calculated under an assumption we would wait for a next train on each station, which is not necessarily true.
So we had a choice:
- to try to run this pure route without skipping the train on some stations,
- to add some bus or auto chords.
To check this second option I made two more graphs and run the same optimizations on them:
Gathering the bus actual schedule and routes was hard. Dmitry Kryukov from Yandex.Transport service helped me a lot. Then I ran a name-based matching through the base of bus stops and attached some bus stops to some stations to make the additional transitions.
Afterall, it appeared what both ideas (buses and taxi) could save us only 20–30 minutes (and it’s unsafe due to the risk of traffic jams).
So we decided to ran the original route.
I asked some of my friends and colleagues to help me with this project — to join the run or to help with online support. We made a small website to track our progress online:
I thank everyone who helped us in this crazy venture in one way or another — the marathon team itself, the online support team, and many people who helped us in preparation. Detailed thanks can be found in our closing post on Facebook.