Regulating the Quality (and Equality) of Taxi, Uber and Lyft Service by Algorithm

Eric Spiegelman
7 min readApr 12, 2016

--

Two years ago, a friend of mine received an anonymous tip on a website he edited. The tipster, who claimed to be an Uber driver, alleged that Uber cut him off from picking up passengers in Washington D.C.’s Dupont Circle neighborhood because his driver rating fell below 4.7. Dupont Circle, he said, was one of the more lucrative areas to get a fare, and Uber’s restriction was designed to punish him, fiscally, until his customer service skills improved. My friend never determined whether this claim was true — but I want it to be true. If you can discipline a driver by cutting him off from a particular neighborhood, then you can also reward a driver for serving a particular neighborhood. And if an app can do the first one automatically, via algorithm, then it can do the second one algorithmically, as well.

The quality of Los Angeles taxicab service is not equal across neighborhoods. Our most notoriously “hard to serve” area is Taxicab Service Zone D, which is more commonly known as South Los Angeles, or (when I was a kid) South Central. Zone D is everything south of the 10, east of Culver City and north of the 91. It takes roughly twice as long to get a cab after having lunch at Locol, in Watts, than after seeing a movie at the ArcLight, In Hollywood. Years ago, the discrepancy was even worse. The only reason taxis provide even this level of service to Zone D is because City regulations force them to.

Taxi response times in Hollywood (Zone C) and South Los Angeles (Zone D) over time. Graphic by James Stanek.

The time it takes for a cab to show up after you call it is a measure of quality. (Think about how progressively irritated you get the longer it takes for one to show up.) Response time is also a function of supply — the more cars on the road, the shorter you have to wait. South Los Angeles has low response times because taxi drivers prefer to work in other neighborhoods. South LA has a lower population density than Hollywood. Its residents are less affluent. Racial discrimination is more prevalent. At City Hall, we believe that everyone should have equal access to transportation, so we require our taxi companies to keep a certain number of cabs operating south of the 10. We do that by setting a minimum average response time, and we monitor this through quarterly reviews of taxi service.

One of the benefits of algorithmic regulation is that the act of gathering data and analyzing it can happen profoundly faster and more frequently than traditional forms of regulation, since it’s being done by a bot. In my last essay on the subject, I showed how an algorithm can monitor and regulate the safety of taxi service, constantly instead of periodically. An algorithm can also monitor and regulate the quality of taxi service, constantly instead of periodically. This is what Uber’s driver rating system is all about.

Average Uber driver ratings in San Francisco. Graphic from leaked report published in Business Insider.

At the end of an Uber ride, you’re given the chance to rate your driver on a scale from 1 to 5. These individual scores get averaged together into a driver’s overall rating, which Uber compares to all the other drivers nearby. Uber, essentially, grades its drivers on a curve. If a driver’s rating falls too far below the mean, the app automatically triggers a set of consequences that can result in deactivation.

In many east coast jurisdictions, it works like this: A new Uber driver has to maintain a 4.3 average rating or higher during his first 25 trips. If he doesn’t, the app automatically deactivates his account. If his rating is above 4.3 but lower than 4.6, the app puts him on probation for the next 25 trips. If the driver doesn’t bring his average up to 4.6 by the end of the probationary period, the app automatically deactivates his account. Once he makes it past this point, an active Uber driver must continue to maintain a 4.6 average or higher, calculated over his most recent 100 trips. If his rating ever drops below 4.6, the app puts him on probation for the next 50 trips. The algorithm keeps watch.

Composite frames from Uber driver orientation video, illustrating Uber’s deactivation algorithm.

This algorithm regulates driver behavior through the threat of deactivation. Ratings are inherently subjective, which encourages drivers to err on the side of concierge-level service. This is why you often see water bottles, cell phone chargers, and reading material in the back of an Uber car. Uber offers its drivers several tips on “How to be a Five Star Driver,” such as “take the most direct route,” “keep your car clean” and “dress well.” Drivers who ignore these recommendations do so at their peril.

The Los Angeles Taxicab Commission also prefers our taxi drivers to keep a clean car, dress well, and take the most direct route. We have specific rules to this effect (Rules 730, 734 and 765, to be precise). We even have a rule that requires taxi drivers to wear socks. We require our drivers to be courteous, under threat of suspension or fine if an enforcement officer catches them breaking a rule. Uber, however, merely suggests ways for their drivers to be courteous, with their passengers as the final arbiters of quality.

Uber’s method is better for passengers. Driver ratings take into account all discourtesies, regardless of how minor. With taxicabs, minor discourtesies are underreported. The process for making a complaint is cumbersome, so taxi passengers only speak up over more outrageous infractions. In addition, the City simply doesn’t have the resources to investigate and prosecute minor discourtesies on our own. We focus primarily on discourtesies that rise to the level of safety concerns. Uber’s method is also better for drivers. Reports of good taxi service are also underreported. The City only measures complaints and violations; we don’t have a way to measure 5-star taxi rides. A while back my phone died on a hike. A taxi driver was kind enough to take me back to my car — with my dog. My only outlet for commendation was Twitter.

While a driver rating system may be better for regulating the quality of service for individual passengers, it falls short of the City’s method for making sure that passengers, collectively, get equal levels of quality no matter where they live. There is evidence that Uber, like taxi companies, provides longer response times in Watts than in Hollywood, but there’s no evidence that Uber has an algorithm to solve this problem. The Los Angeles Department of Transportation, by comparison, keeps a close eye on the aggregate quality of taxi service provided to each area of town, and has an excellent track record of making it better.

If Uber wanted to, they could probably adjust their driver rating algorithm to improve equality of service. Uber has a reputation for futzing with its algorithms. They’re on an ongoing quest to perfect their routing algorithm for UberPOOL. Late last year they tested two new driver ratings systems, one based on a binary “thumbs up or thumbs down,” the other on a choice among three emoticons. There’s evidence that the deactivation algorithm described above didn’t always work like that. The solution to regulating equal service quality may involve one aspect of the algorithm that Uber hasn’t yet experimented with: rehabilitation.

Uber deactivation notice received by Maryland drivers.

If an Uber driver is deactivated for poor ratings, he can be reinstated if he takes a “Quality Improvement Recovery” class. The class takes “85–100” minutes. It reminds me of going to traffic school after you get a speeding ticket. I’ve been to traffic school. If the “Recovery” class is anything like it, then it’s a relatively limited form of discipline. What if, instead, Uber allowed a driver to rehabilitate his rating through community service? This is why the Dupont Circle story got me so excited. If Uber can restrict access as punishment for undesirable behavior, then it can grant access as a reward for desirable behavior.

One way this can work is by giving extra weight to good ratings received in “hard to serve” neighborhoods. For example, maybe a 5 star rating from a passenger picked up in South Los Angeles is actually recorded as 6 stars. A driver on probation would be far more inclined to serve an area that grants him extra credit, since he faces deactivation if he doesn’t raise his average. This would increase the number of cars serving South Los Angeles, which would increase response time, our metric of both quality and equality.

A weighted rating system might also reward high levels of community service with extra privileges. A driver who gets enough extra credit can maintain an average above 5 stars. (Sort of like those high school kids who take a lot of honors classes and end up with a GPA above 4.0.) We might, for example, want to grant that driver priority access to LAX. Maybe the app pings him before it pings drivers with lower ratings when a passenger calls for a ride. Maybe it allows him to accept tips. These are just surface-level ideas. Uber engineers can probably come up with better ones.

Many of the most successful regulations rely on a “carrot and stick” approach — they offer incentives for doing good alongside punishments for doing bad. Uber’s deactivation algorithm (and, to be honest, much of Los Angeles taxi regulation) is heavy on stick but light on carrot. As we design the algorithms that govern driver behavior on a transportation network, there may be challenges better solved by a more balanced approach.

--

--

Eric Spiegelman

President of the Los Angeles Taxicab Commission. Opinions expressed here are personal to me and do not reflect the opinion of the Commission as a whole.