The Data Network Effect of Transit Navigation Apps
How a mapping startup is suddenly a contender for the urban mobility space
A couple of weeks ago Citymapper, a transit navigation app particularly popular in Europe’s capital cities, made an announcement that seemed totally out of left field: they launched a bus (they also launched a transportation planning tool called Simcity, which I’ll get to later).
A mapping company launching a bus service? With an actual, physical, bus? It sounds like OpenTable opening a restaurant, or AirBNB launching a chain of hotels. Improbable at best. But a closer look reveals why it makes sense, and why Citymapper may have instantly turned itself into a serious contender for the future transportation market.
While the demise of other startups trying to reinvent surface transit including Chariot, Bridj, Split, and Leap, would not seem to bode well for this business model, none of these had Citymapper’s install base or usage numbers. The advantage of this is less in brand awareness or distribution — although those don’t hurt — but in the location and routing data they get from this usage (including realtime).
The key to the transportation business — which has a high fixed cost per vehicle in service, but much lower variable cost — is passenger load factor. The formula for load factor is total passenger miles dived by total seat miles, but the easiest way to think of it is utilization. If your bus is being highly utilized on each trip — ie if it is full — you make money. If it’s not, you don’t. With a low number of riders — and hence a low number of data points on the location and destination of demand — a transit service is essentially guessing at where they should position their buses [1].
Compare that with Citymapper. On any given day, in cities around the world, they know the exact routes of millions of people, including the time of day, and this gives them an immense advantage in creating efficient, potentially dynamic, routes with profitable load factors. They know where people are travelling, when, and even what modes they are taking. They even have information on split-mode trips, such as a bus to a subway station, or a subway to a bike share. And if they can offer a superior “product” (aka transportation service), they will bring more people into their orbit, navigating more trips with their app, creating more data for them, leading to better, more efficient routes with higher load factors, etc. Proprietary data leads to better service which leads to higher usage which leads to even more proprietary data, etc. A data network effect.
A logical question is how much marginal data could they possibly collect from their first-party transit service, over and above what must be millions of pieces of location and travel intention data they already get from their app users using all forms of transportation, all over the city? The key to answering that question is to consider that for transportation, the data network effect does not need to happen at the level of the entire network, but only along a particular travel corridor [2]. In other words, it is the spatial density of the data that matters for attaining data network advantage, not the total amount of data. By starting small, in a certain neighborhood or travel corridor, leveraging the data from that area to improve the service in the same area, and therefore to increase usage of the service in that area and collect more data, they can expand area by area until they reach a more significant scale of transit operations, maintaining their data network effect as they go.
The purpose of the Simcity tool is less clear; my assumption is that Citymapper has some intention of sharing it with the public or select partners, or else they wouldn’t have bothered announced it publicly. My best guess is that it will serve as a carrot for achieving partnerships with transportation agencies and authorities in cities they would like to operate in, similar to their partnership with Transport for London, with whom they have coordinated their first line. I imagine that the tool itself will be free to partners, but official status and a robust partnership with local authorities could be invaluable, as these bodies control physical and fiscal assets from curb access to dedicated lanes to ride subsidies.
[1] This is not precisely true. There are varying amount of publicly available data on where people live and work; neighborhood levels of income, which predict transit usage vs private car usage in many cities, etc.; and this can be supplemented with traffic counts on arterial roads, and data collected from measuring current ridership, which, if you don’t have much, is meaningless. Unfortunately, other than current ridership, none of this is capable of connecting origins with destinations, and hence routes and times of day of travel. The current state of the art for doing this typically requires hiring large transportation consulting companies to do demand studies, most of which only look at a few points in time (and are certainly not realtime), use small sample sizes, and are often survey-based.
[2] Again, not precisely true, because every trip that requires transfers creates dependency on another geography. But it is close enough that it is possible to get started this way.