Bikes, scooters, and personal data: Protecting privacy while managing micromobility

Jascha Franklin-Hodge
7 min readOct 8, 2018
Scooters in Santa Monica, CA after the city launched its pilot program. (Source: Santa Monica Next)

Over the past year, U.S. cities have seen an influx of dockless electric bikes and scooters from companies like Bird, Lime, and Spin. Even rideshare pioneers Uber and Lyft have jumped into the micromobility business. In response, cities have quickly crafted policies and license programs that accomodate these new services while balancing civic concerns over public safety and public space. Data is key to successfully managing micromobility programs — but with this comes risks to a cornerstone of civic life: individual privacy.

Data Use and Reidentification

Many cities require operators of dockless services to provide information about trips and vehicles, typically with precise details of when and where the vehicles are used. As I wrote in ”The Scooter Data Opportunity”, there are many valuable uses for this data, including:

  • Enforcement of program rules such as caps on the number of vehicles and allowed service area;
  • Management of vehicle parking and monitoring of fleet safety;
  • Planning for infrastructure investments such as bike lanes or designated parking areas, and;
  • Understanding the role of micromobility in the larger transportation system.

Regulations for dockless services are still evolving and a complete picture of how cities will use data has not yet fully developed. However, it is clear that accurate data will be a part of any regulatory program. At scale, it may be impossible to enforce rules, such as prohibited riding or parking zones, without precise information about vehicle movements.

Designated scooter parking in Long Beach, CA (Photo credit: Dongho Chang)

While micromobility data is typically anonymized, recent research has shown that certain types of anonymous location data can be re-identified when combined with other data sources. A research project using New York City taxi cab records allowed the identification of several patrons of “gentlemen’s clubs” based on travel patterns observed in the anonymous data. And in the landmark study Unique in the Crowd: The privacy bounds of human mobility, researchers using location information from cell towers were able to uniquely identify 95% of individuals using just four data points.

While neither of these studies used data identical to that provided by bike and scooter companies, they nevertheless illustrate the potential risk of seemingly innocuous information. As cities receive more data, the question of what they request and how they should protect it becomes more urgent.

Levels of Risk

Not all data is created equal.

The riskiest mobility data is that which contains personally identifiable information. It’s easy to understand how a precise log of all your trips throughout the city would allow someone to retrace your movements, learn about your habits and preferences, and intrude your life in innumerable ways. Even if a person’s name is replaced with an anonymous ID number, it is often possible to reidentify the individual by looking for patterns. For example, if a dataset reveals that someone starts each day at a certain place and returns there at the end of the day, one can safely assume this is their home. A quick search of property records or other public data sources could reveal their name, spouse, other real estate holdings, and more.

On the lowest end of the risk spectrum is aggregated data. A summary report showing how many trips were taken between neighborhoods in a given day is unlikely to reveal anything personal about the people who made those trips. Provided that enough data points from many individuals are combined together, aggregated data has very little privacy risk.

The bike and scooter data received by cities today falls somewhere in the middle of these last two examples. Although cities capture individual trip data (along with the precise locations where trips started and ended), they generally do not ask for any kind of user identification. These trips are what are known as “unlinked,” meaning there is no easy way to know that two or more trip records were generated by the same person.

Bike and scooter data most closely resembles the detailed trip records that many cities already receive from taxi fleet operators. Information from dockless services is perhaps slightly less sensitive, as most bike and scooter trips begin at the location of the nearest available vehicle (rather than at the person’s true point of origin as they would with a taxi trip).

But no matter the source, careful pattern analysis of anonymous, unlinked trip data can sometimes tease out information about individuals — in ways potentially perilous to users, as demonstrated by the gentleman’s club example cited above.

Animated map of Bird scooters in DC created by Conor McLaughin using public data.

Protecting Privacy

Given the importance of detailed information in the administration of bike and scooter programs, it is unrealistic to ask cities to forego data gathering and analysis. But there are steps they can take to manage privacy risk:

  • Be thoughtful about data you request. Cities should consider the ways they intend to use information. Because micromobility regulations are still evolving, cities may choose to ask for more data than they might need today. This provides future flexibility, but it also increases risk. Cities should weigh this tradeoff carefully, in the context of local laws and attitudes about privacy.
  • Be transparent about data use. Clearly communicating how the city uses data helps reassure the public and establishes guardrails. Cities should publish a plain-language policy that explains what data is collected, who has access, what it will be used for, how long it will be retained, and when/how it will be shared with other agencies, law enforcement, or the public. As part of their license, cities should require operators to clearly inform their users about data sharing practices.
  • Follow good information security practices. Trip data should be treated as sensitive, which means protecting it from hackers or accidental release. Agencies collecting mobility data should engage their IT teams to make sure good security practices are used when storing, sharing, or analyzing data.
  • Don’t make detailed data public. Aggregated data is generally safe to publish. To the extent allowed by local public records/sunshine laws, the most detailed mobility data should not be released. It may be possible to argue against release under the “personal information” exemptions found in many laws. When releasing individual data, it’s a good idea to reduce its precision to lower the risk of reidentification. One simple method is to, round times to the nearest hour and publish GPS coordinates at no more than two or three decimal places of precision.

The Open Data Privacy playbook, published last year by the Berkman Klein Center at Harvard University, offers a comprehensive roadmap for cities working with potentially sensitive data.

A data visualization published in DC comparing Capital Bikeshare and dockless service usage by ward. (Source: DDOT)

Looking Ahead

Beyond shared bikes and scooters, personal mobility of all kinds increasingly generates data — much of it valuable for governments charged with managing public streets and providing oversight for privately-operated services. In a world where all mobility is digital, three approaches may help manage privacy risks:

  1. Create legal protections for mobility data. Governments deal with highly sensitive information all the time: health records, criminal justice information, student data. These are governed by legal regimes that limit how data can be used and shared, and specify what steps must be taken to protect it. We may need a similar law for mobility data — one that lets government do its job while providing people with reasonable assurances that their information will be kept safe.
  2. Entrust data to a third party organization. The nonprofit organization Shared Streets recently announced a partnership with Uber and Lyft to make data about rideshare usage and road speeds available to cities. These companies have previously resisted data sharing with cities citing privacy and commercial grounds. Shared Streets was able to get data access by providing a mechanism for sharing with built in privacy protections. Alternatively, Sean McDonald has proposed the idea of a Civic Trust, a novel legal structure in which data would be held for the benefit of the public and overseen by an open, participatory governance structure that would decide how it could be used and shared.
  3. Use data science to reduce risk. Data privacy and reidentification are relatively new fields of study. As our understanding evolves, researchers are developing techniques such as differential privacy and synthetic data vaults to allow for deep analysis while protecting personal information. As these techniques mature and become more accessible to cities, they may be useful in the mobility space.

Cities must be mindful of risk, but should embrace the use of micromobility data. It is not a binary choice between privacy and effective management of bike and scooter services. With a careful approach, cities can do their job of enhancing public space and public safety, while also protecting individual privacy.

With appreciation for the people who offered me their perspective on this topic, including: Andrew Salzberg, Ben Green, Benjie de la Pena, Dmitry Yakovlev, Emily Castor Warren, Hunter Owens, Jan Whittington, Kade Crockford, Kevin Webb, Marcel Porras, Michael Schwartz, Michal Migurski, Mollie Pelon McArdle, Paul Supawanich, Russ Brooks, Stephanie Dock, and Todd Petersen.

--

--

Jascha Franklin-Hodge

Mobility + Tech. Former Visiting Fellow at the Harvard Kennedy School, Chief Information Officer for the City of Boston, co-founder of Blue State Digital. @jfh