App-opens: a perilous proxy for shared mobility demand

Published in

Zoba Blog

8 min readAug 14, 2019

by Evan Fields, lead data scientist at Zoba. Evan holds a PhD in Operations Research from MIT, believes aggressively in cities, bakes enthusiastically, and can be found on Twitter at @evanjfields.

Zoba provides demand forecasting and optimization tools to shared mobility companies, from micromobility to car shares and beyond.

In my last post, I outlined the difference between demand for a mobility service and the utilization of that service. The natural question is then “okay, how do I know the demand?” In the mobility space, there’s a common misconception that app-open or search data perfectly indicate the demand for a mobility service. We often see this misconception at Zoba and the topic was a large part of my PhD research, so I want to explore some of the perils of relying on search as a proxy for demand.

The intuitive logic¹ linking search to demand goes like so: “To use my mobility service, my customers must first open my app and search for a vehicle. Customers search if and only if they want to use the service. Therefore, knowing the spatio-temporal pattern of searches is the same as knowing the spatio-temporal pattern of demand.” In other words, this argument envisions a world that looks like so (arrows represent causal effects):

In the above diagram, search is caused only by demand. In this post, I want to try to convince you that the situation isn’t so simple. Search and app-open data may contain valuable information about demand, but are not perfect proxies thereof. In fact, these data are typically subject to lots of noise and confounding factors, and the real world looks more like the following byzantine diagram:

Notice how many factors affect search in this diagram! A mobility operator who naively assumes that search is a perfect proxy for demand will produce flawed demand estimates, inevitably leading to suboptimal utilization.

Before we dive in, some brief nomenclature:

App-open data describe the conditions under which a mobility service’s app was opened: who opened the app, when and where they opened it, what kind of device they used, etc.
Search data describe searches users initiate to investigate a mobility service’s available resources. Like app-open data, search data typically contain information on who/when/where/how the search was made. In addition, search data may contain information on what the user actually searched for.

Some mobility providers record app-opens but not searches or vice-versa. App-opens almost always trigger searches; when I open a shared mobility app, I typically see something like this:

These images show apps from Lime and Bird, the largest US scooter operators, and Zipcar, the largest US car share operator; almost all shared mobility apps have similar home screens. Opening the app triggers a search so the user can be shown nearby vehicles. Therefore app-opens can be considered a specific kind of search; in this post I use “search” as an umbrella term which also encompasses app-opens.

With this broad definition of search in mind, we’re ready to elaborate the three reasons search data aren’t a great proxy for demand: 1) users can search without demand; 2) users can have demand without searching; 3) interpreting search data is difficult.

Search without demand

There are myriad ways in which a user can trigger a search when they don’t actually want to use or reserve a vehicle. We’ve already encountered one of these ways: app-opens trigger searches. But users open apps for many reasons beyond immediately trying to use a vehicle: to investigate pricing or availability, to review past rides, in response to notifications, to update profile or payment information, accidentally, etc. Therefore, unless you believe the spatio-temporal distribution of these non-demand app-opens has exactly the same shape as the spatio-temporal distribution of demand (my prior on this is extremely low; your mileage may vary), then non-demand-related app-open searches distort the search data away from perfectly matching latent demand.

Each particular type of non-demand app-open likely represents only a small fraction of app-opens and total searches, but they add up. More generally, there are lots of ways in which users can trigger searches without particularly having immediate demand, and in aggregate these non-demand searches likely do meaningfully distort the search data away from well representing latent demand. Here are a few other kinds of non-demand searches:

Every time a user drags a map in-app, a new search is triggered to display the available vehicles within the updated region shown. My PhD dissertation considered car sharing search data, and in my research I found that this kind of “map drag” search made up a plurality of total searches! And search data can’t be cleaned by simply discarding map drag searches because sometimes such searches do represent user intent: a user will drag a map right where they want a vehicle.
A user can search to investigate the pricing or availability of vehicles without any intent to actually use one of the vehicles the search might reveal.
A user can search accidentally, for example by mistyping the address or time at which they would like a vehicle.

Demand without search

It’s also quite common for a user to have demand for a mobility service without conducting any searches. For example, a user could leave their home on foot and head to a bakery for a tasty cookie, hoping to find a scooter on the way to speed up the trip. If the user doesn’t find a scooter on the way, then they’ll have unmet demand despite never searching for a vehicle. In addition, users only search intentionally when they have some hope of finding a vehicle. For example, imagine a user who would like to use scooters for their morning commute. If that user searches for a week but finds no vehicles near their home, they’ll likely give up searching indefinitely, even though they have demand every workday. As a result, search data almost surely fail to accurately capture demand in areas with historically low vehicle coverage.

More dramatically, users don’t search when there’s nothing to search for. That is, in any market that lacks a given mobility service, users will conduct almost no searches, even if they would like to use that service. In fact, I’m an example of this behavior. Zoba is based in Boston and I live across the river in Cambridge. Neither Boston nor Cambridge currently allows scooter companies like Bird or Lime to operate within city limits. So even though I have lots of demand for these kinds of micromobility services (they are, after all, very cool), I trigger exactly zero searches.

Difficulty of interpretation

Even in the cases where a search is intentionally triggered by a user because they want a vehicle, it can be hard to interpret what the search tells us about that user’s demands and preferences. Here are a few examples (all of which I’ve done myself or encountered in my academic research):

Users often search for vehicles at convenient locations such as landmarks or their current locations (recall the automatically triggered app-open searches) rather than the exact locations where they are hoping to find vehicles.
Users often make repeated almost-identical searches, especially if the offered price for a vehicle or ride may change search-to-search. Are repeated searches evidence of high demand or evidence of high price sensitivity?
A user can see a vehicle and spontaneously decide to use it. As the user pulls out their phone and opens the app, they trigger a search. In other words, the supply of vehicles has a causal effect on the searches performed.
Users make sequential searches when trying to find a vehicle that meets their needs. Suppose a user searches at one location, finds nothing, and then searches at a second location. Should the second search count towards that user’s demand? They probably wanted a vehicle at the second location enough to search for it, but had they found a vehicle they liked via the first search, they wouldn’t have continued searching.
Suppose a user searches for a vehicle, finds a vehicle matching their search, and then doesn’t use that vehicle. Why? Did they really want the vehicle, but it was too expensive? Were they willing to use the vehicle at the offered price but something came up last minute? Were they just exploring options?

These last two bullets illustrate an important point: search is cheap. Triggering a search costs a user just a few seconds and no money, so users trigger lots of searches, including searches which don’t perfectly match their preferences. In contrast, actually using a vehicle is relatively expensive in time and money, so users are incentivized to only use a vehicle when doing so meaningfully satisfies their needs.

Conclusion

The central premise of the “demand = search” argument is that users search if and only if they have demand. We’ve seen that users can search without demand for mobility and can have demand without searching. Therefore, this premise doesn’t universally hold, and we need to be careful when interpreting search data. Even in cases where a user searches because they have demand, the exact searches they make result from complicated interactions between the user’s preferences, the user’s price sensitivity, the supply of available vehicles, and the software the user relies on to perform the searches.

As a result, search data does indeed contain information about demand, but accessing that demand signal requires disentangling the many confounding factors which plague search data. Given these challenges, search data are a fraught foundation on which to build demand estimation models. Usage data, such as rides and booking data, are low-noise and unambiguously capture what users were willing to pay for. At Zoba, we’ve chosen to build our demand models on top of these data² while continuing to explore how search data might enhance our models.

Addendum: what about Uber?

Hold up a minute! What about Uber? Don’t they use search data to estimate demand, and aren’t they, like, really good at spatial data science? Uber does use search data as part of their demand estimation strategy, but there are a couple important distinctions to be made between ride-hailing services like Uber and other mobility offerings. Perhaps most importantly, searches for Uber mean something different than searches for most other mobility providers. When a user searches for an Uber ride, they explicitly indicate to Uber their origin and destination; Uber responds with a cost in dollars and minutes to serve the desired trip. In contrast, when a user searches within a shared mobility service, they’re exploring what supply is available rather than requesting resources to meet a specified need. So in some sense, Uber searches are representative of user desires in a way that shared mobility searches may not be. Nonetheless, even Uber searches are subject to many of the confounding factors outlined above, which is perhaps a part of why Uber employs hundreds of data scientists and needs a demand modeling team.

Footnotes:

¹ Besides “I heard Uber uses search so I probably should too.” More on this in the addendum.

² To be clear, the utilization data are not the same as demand. They are, however, a highly reliable source from which to infer demand using specialized models, which is one of Zoba’s core competencies.

Zoba is developing the next generation of spatial analytics in Boston. If you are interested in spatial data, urban tech, or mobility, reach out at zoba.com/careers.

App-opens: a perilous proxy for shared mobility demand

Written by Zoba