Better, Faster, Smoother: Collecting Data for Transportation

Good Data Initiative
Good Data Initiative
8 min readJan 20, 2021

Advances in modern transportation technology rely heavily on collected data, whether for simply notifying users of free parking spaces or for more complex applications like calculating rideshare apps’ surge-pricing. In this week’s essay, GDI Research Analyst and engineering masters student Shahvez Ul Haq takes us through several examples of the oft-overlooked technical side behind how different types of data are collected for use in optimizing travel and reducing urban traffic congestion.

Mapping the City with Smartphones

The first common method of collecting transportation data is right in your hand. Small and mighty, smartphones enable their users to take an active role in planning routes using common mapping applications including Google Maps, Apple Maps, or smaller privately developed public transport apps (including Citymapper, Transit, or Moovit). These applications prompt users to agree to terms and conditions whereby location services within each smartphone track the user’s location and use this resulting data to both advise users on their best route, as well as provide additional anonymized and aggregated data when analyzing on-the-ground conditions. When combined with existing map data (see Wired’s 2014 article on the Google Maps ‘Ground Truth’ effort for a look behind the scenes), these live data points create new variables as each app’s algorithm seeks to find the shortest path between two points.

Understanding how this passive side of transportation data is collected prompted Berlin artist Simon Weckert to create his 2020 performance and installation piece, called “Google Maps Hack”:

“99 second hand [sic] smartphones are transported in a handcart to generate virtual traffic jam in Google Maps. Through this activity, it is possible to turn a green street red which has an impact in the physical world by navigating cars on another route to avoid being stuck in traffic.” #googlemapshacks

Text and image via artist Simon Weckert

Through this unique art, Weckert makes visible how transportation data from smartphones is collected and used to optimize travel within urban locations like Berlin.

Ridesharing Apps: Optimizing routes and economic gains

For those who would prefer to let someone else drive, smartphones have also opened a new world of convenience. Traveling between locations has never been easier — all you have to do is open a ridesharing app, type your destination, and click ‘Find ride’.

The experience is purposefully designed to be seamless: While you wait for your cab, a team of data scientists at any one of a number of mobile app-enabling ride share companies are focused on solving existing challenges (including better optimization of routes), providing a better user experience based on your feedback data, and improving geo-mapping beyond even that offered by the mapping applications I previously mentioned. Machine learning is utilised here not only to improve user experience but also to help these companies’ technologies learn to better serve customers in the future.

The process itself is straightforward. After a ride is booked, the AI team receives an enormous volume of information about you, from your preferred pick-up points to your most frequent destinations, your behaviour, preferences, interests, and even the battery level of your phone. Using this data, the AI team determines demand, allocates resources, and sets fares to maximize profit depending on the “surge” levels. Surge levels inform a pricing mechanism known as ‘surge pricing’ or ‘dynamic pricing’ — effectively, any situation where there is a demand-side imbalance. The variables taken into account span a variety of factors including public events, non-peak hours, traffic conditions, emergencies, and weather conditions. While the exact algorithm for calculating this pricing remains unknown to the general public, Uber has applied for a patent on this method of calculating surge pricing.

This algorithmically-optimized pricing makes it convenient for drivers to be available for rides in areas scheduled for higher demand, and stop when the demand drops. Uber also collects data from its drivers irrespective of whether they are carrying passengers. Multiple variables including the driver’s speed, acceleration, location, ride data, and data about whether they work for a competitor (such as Ola) is all gathered and, as recent court cases have shown, has been used by the companies in lawsuits against their former drivers. Such extensive data collection primarily applies to ride-sharing apps such as Uber, Ola, and Bolt, and might apply to smaller cab companies like London Black Cab though there is currently little available data about the latter.

While this collected data may optimise both the travel routes and economic gains for rideshare app companies, privacy concerns remain at the forefront of debates about the breadth of data collected and used through these apps. However, it is worth remembering that these technologies put users without a smartphone — or users who do not want to agree to the terms of service — at a disadvantage as they cannot access the services, even as the services themselves become a usual part of everyday urban life.

Public Transportation and Data Collection

Public transportation networks have similarly transitioned towards collecting user-generated data to gain insights into how local transportation systems are being used. For example, public transport networks’ access tickets provide local governments with real-time data for monitoring the transport sector. With the introduction of ‘tap & go’ contactless ticketing (such as Transport for London’s (TfL) Oyster or contactless Travelcards) and shift away from single-use tickets, the data of people transiting through any given station is sent to the system operator as soon as the passenger has ‘tapped’ in or out.

This data is primarily used to track the number of people entering/leaving a train station or bus stop to gauge the total number of people using public transport at a given point in time. Analysing this rich data enables planners to identify regular points of congestion, assist in the long-term tracking of use trends, and from there, derive conclusions that will help inform the creation of better future transport networks.¹

As was also mentioned by GDI Research Analyst Meghan Keenan in her earlier essay discussing TfL’s use of contactless cards, valid concerns do exist over the collection and use of this passenger information. For example, according to researchers in Melbourne (Australia), data from the public transport card “myki” used in conjunction with social media were able to reveal significant personal information about card users, including a local Australian MP². Such events highlight the need for publicly operated systems to couple strong data security with regular critical reviews of what data must be collected (for operational purposes, network development, etc.), as well as the way in which it will be stored and for how long.

Vehicles and Traffic Monitoring

Unlike the smartphone and contactless card-generated user data generated within ridesharing apps and public transportation networks, vehicle and traffic monitoring seeks to capture data about external environmental conditions. One of the oldest methods for doing this, dating back to a 1799 invention by George Medhurt, is through the use of pneumatic tubes. The pneumatic tubes are placed one meter apart from each other, perpendicular to the directional lane (to reduce detection errors), and across straight road segments with a flat and good pavement surface condition. This remarkably simple system costs about $30,000 and is widely used throughout the world to monitor transport in conjunction with traffic lights.

Another deceptively simple yet innovative method of data collection is that of in-ground and surface mount sensors which monitor individual parking spaces, and relay occupancy status to gateways. This data in turn sends live status information to a cloud platform, allowing real-time parking information to be viewed on multiple devices and digital boards. Vehicle detection sensors enable efficient parking, save time, and help avoid congestion in parking lots. Moreover, this data can be extrapolated to create an overview of traffic in an area.

A 2018 study revealed that highly accurate vehicle detection can be achieved through sets of passive sensors that are low cost and simple to install. One of the most important traffic monitoring tasks is the detection of vehicles and pedestrians. In the case of conventional traffic monitoring systems, this task is carried out with the use of intrusive detectors that are installed in the pavement — for instance, inductive loops and piezoelectric sensors, which I describe in more detail below. These are low-cost sensors which, when used in conjunction with adaptive traffic lights, adjust the flow of traffic based on sensor input to decrease traffic congestion:

  • Inductive loops are considered one of the most reliable traffic detection methods available even when not linked to adaptive traffic lights. This method of traffic detection consists of wire coiled to form a loop that is installed into or under the surface of the roadway, measuring changes in the field when objects (such as vehicles) pass over them.
  • Finally, Piezoelectric sensors collect data by converting mechanical energy into electrical energy. When used to count vehicles, the sensor is mounted in a groove cut into the surface of the road. When a car drives over the sensor, it squeezes it and causes an electric potential where the size of the signal is proportional to the degree of deformation. When the car moves off, the voltage reverses. This change in voltage can be used to detect and count vehicles.

Final Words

Collecting data to optimize travel and reduce urban traffic congestion has become both commonplace and critical for the daily operation of major cities around the world. Yet, questions still remain as to how best optimize the balance between their provided utility and ongoing data privacy concerns.

About the Author: Shahvez Ul Haq

Shahvez Ul Haq is reading an MPhil in Engineering for Sustainable Development at the University of Cambridge, and holds a BEng in Mechanical Engineering from Monash University. His current research interests focus on reducing carbon emissions through a swift transition to renewable energy and developing efficient transportation methods. Shahvez is currently a Research Analyst at GDI and also writes on cutting-edge technology research for the Science section of Varsity, Cambridge University’s largest student newspaper.

[1] For an example of the types of reports that are created using this data, see this 2017 report on Understanding and Managing Congestion in the Greater London Area.

[2] In a 2019 Compliance Notice, the Information Commissioner of Victoria (Australia) revealed that Public Transport Victoria had breached privacy laws by releasing 1.8 billion lines of de-identified data from 15 million cards between July 2015 and June 2018 to support a data science competition in mid-2018.

--

--

Good Data Initiative
Good Data Initiative

Think tank led by students from the Univ. of Cambridge. Building the leading platform for intergenerational and interdisciplinary debate on the #dataeconomy