A Technical Follow-Up: How We Built the World’s Prettiest Auto-Generated Transit Maps
by Anton Dubrau
Six weeks ago, we launched Transit Maps, and wrote this blog post about why we took on the mammoth task of creating automatically-generated yet aesthetically-pleasing maps. We were blown away by the public reaction to our efforts, though not altogether surprised given the amount of time, thought and inspiration it took to create them. Today, we’re fulfilling our promise to publish a technical follow-up from Anton, our resident mapping wizard, who explains in much greater detail what went into building these maps.
When you think of Transit, you might think sleek, colourful interface. Given that we’re extremely particular about making the app as beautiful and usable as possible, that’s no big surprise. But UI isn’t the only thing we’re about: our team extends well beyond expert designers, and our app is much more than just pretty. Below the surface, there’s a lot of ‘hard’ technology quietly driving it.
First, our powerful backend quality checks hundreds of transit data feeds, automatically fixes broken data, and flags issues that need investigation. This system enables us to manage 350 transit feeds in 125 cities with a small team.
Then there’s our compression algorithm. It shrinks transit schedules up to 100 times smaller than the zip-files transit agencies provide. This allows Transit to download the schedules of the user’s entire region, store the data on the user’s device, and return search results… all in the amount of time it takes other apps to request and load a single schedule. And while our users may now be accustomed to our app’s speed, when the feature was first introduced, it effectively cut response time from several seconds to 0.1 seconds. That’s fast.
But there’s one particular technology that we’ve been working on for years. To our great joy (and relief), we finally launched it this summer: automatically-generated transit maps.
The idea itself is hardly new: Google launched their transit maps almost six years ago. But it turns out it’s quite the problem to solve, and Google (even after all these years) still can’t generate very pretty, or even very useful transit maps.
So how did we manage it? With sweat, tears, and creative thinking.
1. Shapes on a Map
The idea of creating automatically-generated maps is something that has captivated me since before joining Transit App three years ago. At the time, Google was the only player in the industry, and to be frank, their transit maps were kind of crappy. We had just finished our aforementioned super compression algorithm and felt ready to tackle a new problem. We figured, rather naïvely, that it would take about three months. Little did we know…
The first step was to show the source data on a map. Many of the trips in the underlying transit data already contained shapes representing the routes that transit vehicles took. If we simply drew all of the shapes defined by all of the trips, we’d get a simplistic sort of transit map:
Doing this was relatively simple: we set up a processing pipeline to extract the shapes from the source data; stored the shapes in a data-interchange format; ensured that the data was available to the device; and used the device’s mapping libraries to draw a new layer with the data.
Easy. We had this up and running within a couple of weeks.
But while it’s close, it’s not a real transit map. All the lines were drawn on top of each other. You can’t tell which lines go where – and the only visible line was the one that was drawn last. For proper diagrammatic maps where you can follow the lines with your finger, you’d need them to be drawn in parallel, and to intersect as little as possible.
2. Matching with OpenStreetMap
Building our maps with shapes posed other problems: what should we do when an agency doesn’t provide any shape data, or if an agency provides very poor shapes? The only geographical data we would have in those cases would be the stop locations. Sure, we could draw straight lines between the stops, but that’s ugly, messy, and confusing.
It’s also the problem with Google’s Transit Maps. In Berlin, Google makes straight line connections between stops; in London, they use some sort of spline-interpolation that doesn’t follow the actual tracks; and in LA, they use the shapes provided by the agency even though the quality of the shapes is really quite bad.
What’s funny is that when you zoom into the maps, you’ll see that Google often has data for the underlying railroad tracks, which begs the question: why don’t they combine the tracks with the shape data? Did they decide that it’s not important?
While Google might not think it’s important, we certainly do. Sure, we don’t have access to Google’s rich underlying map data, but we can use the next best thing: OpenStreetMap (OSM). And thanks to their community of volunteer map-geeks, OSM has virtually all the tracks for the public transit rail lines that we use in our app.
By creating a shape along the OSM street grid that connects all the points along a given route, we could generate our transit shapes! So we created a dynamic programming algorithm which follows roads or tracks likely to be used by transit lines. The shape-making algorithm considers what type of vehicle runs on the line, and minimizes matching errors (i.e. the distances between the generated shape and the actual locations of the source points).
Here’s an example. In the diagram below, we have a trip with three stops, and no shape information whatsoever. We extract the set of tracks the trip uses from OSM (grey lines). Our matching algorithm then finds a trajectory (black line) that follows the OSM, while minimizing its length and the errors to the stops (e1, e2, e3).
It’s tough to get this algorithm to work for all cases, so sometimes, we have to supply parameters to make specific lines work. Overall, it gives us high-quality shapes for all of the public transit lines we need today, and most of those we’ll need in the future — even in developing countries where OSM is often the best data available.
3. Processing in Pixel Space: Skeletonization
OSM gave us the shapes, but the lines were still being drawn on top of each other. Real transit maps have lines drawn in parallel. What we needed to do was identify common segments, where they travel on the same street, and then ‘snap’ those lines together.
So how does Google do it? They seem to compute shared segments by looking at the stops. As long as two lines share the same stops, they are ‘snapped’ together. But when the next stop isn’t shared, the lines ‘unsnap’:
But that’s not how trains and buses really travel! Lines stay together for some distance before diverging. What we needed was an algorithm that would find where the lines begin to branch off in real life.
We tried to compute the route separations in vector space, which seemed simple at first: take two lines that travel closely together and then find the centerline of the shared segment. This turned out to be surprisingly complicated, however, as we kept running into simple examples that would break our algorithm. A small loop in the route would throw off the centerline to infinity, and we also needed to deal with multiple lines, multiple branches, different stop patterns…
After two months of bashing our brains against our keyboards, we finally threw in the towel. We just couldn’t find a stable, general solution that would work reliably, until…
Pixel space to the rescue!
Instead of processing the lines in vector space, we decided to to try something crazy. We used pixel space.
Usually pixel-based processing is done for image-based data. It’s very unusual for GIS processing, because at 1px/meter resolution, our image would be 64 terabytes. Memory is cheap these days… but not that cheap!
So how did we do it? We implemented a special sparse image library that could deal with these very large monochromatic images with relatively few white areas.
We then created an algorithm to draw transit shapes on a giant black-and-white canvas representing the whole world, where every pixel is equivalent to one square-meter. Each line was drawn thickly on the canvas, so wherever the lines were close, their pixels merged together.
Once all of the lines were drawn, we used a ‘skeletonization’ process to successively thin the lines until each was only one pixel thick. So while the lines were no longer merged, they stayed connected, maintaining the same topology. These thin lines represent where the transit lines travel together, and reveal the structure of the network.
Although we now had the center lines of the network, we had destroyed more information than we’d gained. What we had now, were a bunch of pixels denoting the skeleton, which meant we knew every line must be travelling along this skeleton, but we still had to figure out which lines were travelling where.
Using the skeleton, we now rebuilt the lines, as opposed to the shapes we formerly had. We then processed the resulting network in order to get rid of the glitches introduced by the skeletonization.
This step was long and tedious. Altogether, the drawing, skeletonization, network building and glitch removal took somewhere around six months to develop. (So much for having the whole thing done in three!)
But the final results were satisfying. We had an internal representation of lines travelling together and diverging. It looked like this:
When we rendered the lines in parallel, we got this:
Pretty good for a Version 1. Much better than Google, seeing as you can more or less tease out where each line is going. We were ready to roll out Transit Maps! And then… Apple Maps happened.
In the summer of 2015, after having worked on our maps for the better part of a year, we were finally ready to release our first version of Transit Maps. Then Apple rolled out their transit maps, and they were really pretty.
They instantly raised the bar for what transit maps should look like. In our drawings and designs, the end goal was something similar to (or better than) what Apple subsequently released, but we were planning to get there after releasing our Version 1.
Compared to Apple, our proposed Version 1 was kind of mediocre. Our Designer-CEO decreed that beating Google was not good enough — we also had to at least play in the same league as Apple.
After closer scrutiny, we hypothesized that Apple was drawing their maps manually. There were huge lags between the release of new cities, and there was something strangely off about the way the maps looked — as though they were drawn by humans, not computers. This meant that although our maps weren’t quite as pretty, our algorithm was still ahead of theirs.
At this point, we also knew that the hard part was behind us. We had figured out a network that would allow us to draw lines in parallel. Now, all we had to do was make it look good.
4. Transit Line Ordering Using Integer Linear Programming
Before publishing our maps we needed to get rid of the ugly, unnecessary criss-crossing of lines, which was turning them into a horrendous spaghetti-mess.
If we could sort the lines to minimize the visual clutter near intersections, we would have a publishable map. To do this, we had to decide which lines would go left and which would go right, in order to minimize their crossings.
Google had (and still has) a similar problem — except their lines criss-cross each other even where there are only stops and no intersections.
For us, criss-crossing only happened where lines actually joined and diverged, so we were already doing better than Google’s algorithm. That’s because we were storing shared sections based on geography.
So how did we get rid of the spaghetti? First, we tried a heuristic solution — sorting lines based on where they terminate — but this often failed, working in some places, but not others.
To improve on the heuristic solution, we set up a mathematical model that would ‘score’ a given ordering of lines, penalizing the crossing of lines, as well as other visual clutter.
As you might expect, avoiding a cross-over in one place on the map could create another one somewhere else. Everything is connected! So what did we do? We found the ordering of lines that had the lowest global penalty score.
Integer-linear-programming was what allowed us to explore all the possibilities and find a solution that globally minimized the penalty function. But the processing time for integer-linear-programming is exponential in the problem size: solving one problem can take one second; another more difficult problem can take a year! That made it risky to use, even in ‘offline’ pre-processing inside the backend.
We were worried. Processing Chicago’s data took us hours. A larger area like the Northeast Corridor (Boston to Washington) could take weeks! Fortunately, we found a different plan of attack: one which allowed the integer-linear-programming solver to explore the problem space more efficiently and find optimal solutions faster. What previously took an hour, now took 0.2 seconds.
Seeing optimization like this in action is uncanny: when you see the algorithm make decisions, it’s like witnessing some brilliant mathematician effortlessly solving problems with the clearest, most concise solutions.
With the other processing steps already completed in pre-processing on the server, the data was now stored in binary files, and sent off to the device for the actual rendering of maps at any desired zoom level.
5. Circle-Arc Rounding of Lines
We still weren’t quite finished, however. Hand-drawn diagrammatic transit maps still don’t really look like the maps shown above. Their lines are nicely rounded with smooth transitions at intersections. We wanted our maps to have a similar rounded look.
When lines traced around corners, we wanted them to stay perfectly parallel, even in potentially degenerative cases, like in Chicago. There, a large number of lines travel together around sharp curves, so drawing them in parallel could result in lines bunching on the inside of the bend.
Usually rounding is done using bezier curves, which look like they’re easing into the curves. But in order to stay faithful to the look of diagrammatic transit maps, bezier curves weren’t quite right. Transit maps have straight lines that fall sharply into circular arc segments. So we used arc segments for rounding.
Also, unlike bezier curves, any line parallel to a circle arc is itself a circle arc. As long as the radius is big enough, we were guaranteed to have no degenerative cases.
We came up with a custom algorithm that, given a shape, would remove and add points to round it off using circular arc segments. It guarantees a given minimum radius by simplifying the geometry as necessary. The minimum radius depends on the total width of all the parallel lines.
The resulting shape is smooth. It is entirely composed of straight lines and arc-segments, which means we can always draw lines in parallel without any artifacts or degenerative cases.
This approach gave us something like this:
The rounding only happens along shared segments. You might also notice that we removed all the intersections. Dealing with the intersections was a major problem because we had to ensure that each line continued from one segment to the next and linked up properly. We also used the arc-generating algorithm to have the same rounded look. Here’s the final result:
Pretty great, right? But while they were pretty… they still looked strangely naked. That’s because they were missing stops.
So we decided to hold off on the release yet again — and add one last step.
6. Adding Stops
Adding stops might seem straightforward, but it actually requires a fair amount of processing to propagate the stop information through the lengthy pipeline we had created.
We also encountered many cases where multiple stops in the data actually corresponded to one single physical station, so we needed to collapse them into one stop.
Here’s what we did. For stops with multiple lines, we drew a white bar with a black outline (for contrast) across all of the lines. For stops on a single line, we drew a simple circle using the colour of that particular transit line. We also added a white overlay to reduce the contrast of the map layer below. This is the final result:
To allow users to turn lines on-and-off selectively from our apps’ settings page, we decided that the rounding, as well as some stop-processing and rendering should be done on the device. So in New York City, you can disable all New Jersey-based transit lines (or all NYC lines if you live in New Jersey). With so many transit lines in certain areas, this allows users to create fully customized maps.
So that’s how we did it. Sure, implementing automatically-generated transit maps took a lot of work, but it was worth it. Our maps are a heck of a lot more powerful than the PDFs you’re used to getting from agencies, never mind the paper ones you fold up and jam into your wallet. What are the main differences?
Our Transit Maps are scalable, so we can easily add new cities in the same visual style, wherever in the world we expand to next. They’re customizable, so users can turn on/off networks and modes to create their very own personalized transit maps. And they’re also contextual: unlike a PDF of an agency map, our maps incorporate your location giving you a sense of where you are relative to nearby lines and adjusting the look depending on your zoom level.
And ultimately, our diagrammatic transit maps provide more than just the basic information about transit systems. They’re emblematic of cities themselves: important pieces of functional art that connect people to their environments. We want to help build that connection, and we believe that our new Transit Maps do just that.
We’re excited to keep improving, but are pleased with what we’ve accomplished so far. We launched with 55 cities. The response to our blog post comparing our maps to Google’s and Apple’s has been incredibly positive. For the backend team, it’s great to have people see and appreciate the work and effort we put into what drives the experience of the app. It motivates us to continue to push our technology further.
Beyond that, we still have many more ‘hard’ problems to solve. We will continue to work under the hood, not just toward having the prettiest app with the best UI, but the most functional, powerful and accurate transit app out there.