Machine Learning for the Win

10 min readSep 24, 2019

How Yandex.Taxi’s algorithms help track car cleanliness, plan out driver work time, assign ride requests, and so much more.

Yandex.Taxi is powered by hundreds of algorithms built and supported by a dedicated machine learning and data analysis division. This team has two main focuses:

№1 — pinpoint aspects of the product that can be improved by big data and algorithms.

In other words, build an app that already knows where the user wants to go as soon as they open it; is today a drop-the-kids-off-at-preschool day, or a hit-the-town-with-friends-for-brunch day? These trips require different cars, with child safety seats or maybe even a minivan for large groups. Also, will the driver pull up immediately, or in a couple minutes so everyone has time to get outside first? The perfect app tells you upfront how long you’ll have to wait for pickup depending on the service class, and where to meet the driver to get the lowest fare.

№2 — optimize the cost, speed and structure of Yandex.Taxi’s internal business processes. This includes user support, vehicle quality control, driver relations and marketing.

Roman Khalkechev, Director of the Machine Learning and Data Analysis Division at Yandex.Taxi, shares some examples about how data and algorithms make life easier for riders, drivers and the whole team behind the scenes.

Machine Learning Assigns Rides More Precisely

When a user taps the “Order” button, our system starts solving the task of finding a driver, or dispatch, as it’s known in the industry. It homes in on the most suitable car among all available nearby drivers, considering in the process all sorts of factors, including the likelihood the driver will accept the ride request.

Today, our algorithm focuses first and foremost on specific data related to the driver and rider in question, but ideally all ride requests (i.e. all other riders and drivers) in the area would also be considered. This approach would help avoid the issue of minimizing pickup time for one rider at the expense of increasing it for a dozen others.

To tackle this problem, we’re in the midst of testing what we call “buffer dispatch.” If there’s ten drivers and ten users in one area all placing separate ride requests, this tech analyzes the ten requests and drivers simultaneously, instead of assigning one car to one request step by step.

In other words, the algorithm solves the driver assignment issue for all users at the same time by optimizing the rider-driver match system.

By uploading all ride requests and cars in one area to a single “buffer,” rider-driver matches are made considering an expanded array of factors. For example, the algorithm sees that one driver is ending their shift, so there’s a good chance it will assign them the rider headed in the same direction as where they live.

The more ride requests and drivers in the “buffer” at one time, the more efficiently rides are distributed. Here machine learning is used to expand the “buffer,” and not just by adding more free drivers, but by folding in other drivers currently with a rider but headed into the area now to drop them off.

The same goes for potential riders, as our system already knows how to forecast the likelihood a user is about to request a ride once they open the app. If it’s high they’ll do so within one minute, their request is also added to the “buffer,” and the algorithm starts looking for their ride even before they tap “Order”.

Machine Learning Assigns Rides More Fairly

In the taxi world, airports are in a league of their own because there’s always a high volume of ride requests to distant destinations from a single point. These rides come in waves depending on flight arrival schedules, especially if the airport is smaller.

Airport ride requests used to be distributed using the same algorithm as in the city, meaning based on pickup time, service class, passenger options and other factors. But because pickup times for cars waiting around at the airport were all close to identical, ride requests were distributed rather unpredictably for drivers.

For example, a driver who just dropped off a passenger for their flight could get a new ride request back to the city that same instant, while another driver (with all else equal) who had already been waiting half an hour was stuck with nothing. This wasn’t fair to drivers, so our team got to fixing this issue as fast as they could.

In July 2018, Yandex.Taxi debuted a new algorithm to solve the airport problem by distributing ride requests in a queue (again, with all else equal). Now when a driver drops off a passenger in the airport zone, they join the queue automatically and can track what place they’re in.

This made things more convenient, but still didn’t help drivers know for certain how long they’d have to wait for their next passenger. In fact, queue wait time fluctuated significantly depending on the time of day, as more arriving flights means less time waiting for your next rider.

To show drivers how long until they get a passenger, the Yandex.Taxi team built a model taking into account queue length, historical data on ride request volumes based on time, and flight arrival schedules.

This freed up drivers to make an informed decision on whether to wait for their next rider, or head right back to the city. Dead mileage might go up a bit, but less time is wasted without a passenger and therefore without earning.

In large airports where we have more historical data to work with, the system forecasts wait time with just a 10% margin of error. In smaller airports, the margin of error is closer to 15–17%. In newly opened airports we don’t have all the stats we need, but this doesn’t stop our team from studying how to make forecasts without them.

Machine Learning Answers Support Requests Faster

Yandex.Taxi Support processes requests based on complexity and urgency. If a rider left something behind in a car, we need to be on top of that fast. But if we get a complaint that a driver didn’t answer a message in the app’s chat, we can address that a bit later.

Our old Tech Support system handled requests in the order received. The operator had a list of request “tickets,” and doled them out to their team based on importance and urgency.

But as Yandex.Taxi expanded into new cities and countries, the number of users grew along with support request volumes. Our Support system required automation to cut expenses associated with initial, routine request processing.

For starters, the machine learning team trained a neural network to independently determine how critical a support request was. They did this by feeding it a massive volume of old support requests written by passengers and already processed by our team with an indication of “critical” or “urgent.”

Then we integrated this neural network to process the real flow of support requests, and now the Support interface shows the most critical and urgent requests at the top automatically.

The next step was automating the ticket answering system. The Yandex.Taxi Support team uses around 200 answer templates to typical issues based on specific situations and circumstances. Every time an operator used to process a request, they’d have to search through this template list, find the closest relevant version, edit it a bit and send it out.

To speed things up, our machine learning developers fed a different neural network historical data on Support request answers to trained it to suggest one of five templates that most likely relate to the issue at hand. In 70% of cases, one of the recommended templates was indeed relevant.

In some cases, the system can answer entirely independently of human interference, for example, when dealing with non-urgent tickets where more information is needed first. All the algorithm does is write the user so they know that their request was received and the team is looking into it.

Support algorithms also handle ride feedback when riders leave 1, 2 or 3 stars, and can answer around 40% of requests on their own automatically. But fully automating Support is impossible, as there are too many situations where only a person can decide how best to respond.

Check out the picture below to see how funny the automated answer system testing stage sometimes was. At one point, the system received a 5-star feedback rating from a passenger, but the system didn’t react the way it was trained to at all.

Machine Learning Studies Driver Needs More Systematically

Recently, Yandex.Taxi debuted its “Big changes” program. Its main focus is to process feedback from drivers to better adapt our services and the Taximeter driver app to meet their needs. That means we needed to find out quickly what real issues drivers were dealing with, and how to solve them. In fact, this was the program that led to our airport queue algorithm.

Drivers send their feedback to our Support team, social media pages and other channels. But reading all of it is impossible, and its study and sorting is extremely time consuming, so our team once again harnessed the power of machine learning to cluster all messages by subject. This helped us get a good look at which parts of the service could use the most work.

There were a large number of complaints that drivers couldn’t see exactly where their riders were. Taximeter (the driver app) showed where a passenger requested pickup, but that didn’t guarantee they’d be waiting there. Especially if the ride request was from an area with lots of cars and people around, like an airport, stadium or central square. Our team heard these complaints loud and clear, and added an option in the Yandex.Taxi app where riders can share their location with drivers. Now drivers can see exactly where their passenger is on the Taximeter map.

Machine Learning Monitors Car Quality 24/7

One of Yandex.Taxi’s main focuses is making sure our drivers’ vehicles meet our quality standards. That means clean, damage-free exteriors, clean empty trunks, available child safety seats (if the driver is authorized to accept ride requests with children), an exact match between the car type and plate number with what’s in the app, etc.

But Yandex.Taxi works through partners — taxi companies or self-employed drivers — numbering in the thousands, so direct quality control is challenging. To solve this issue, our team launched a remote quality check (RQC) system.

Taximeter regularly asks drivers to take several photos of their vehicle, inside and out, and send them through the app. These pictures are sent to Yandex.Taxi assessors who evaluate them one by one to make sure they match what the driver is claiming and meet all our service standards.

If everything is up to code, the driver will continue to receive ride requests. If not, they must first resolve any issues.

This driver wouldn’t pass their photo check

This process has potential for automation as well. Algorithms can determine if an RQC passes or not without any human interference. So our machine learning team went along and created another neural network capable of 1) checking pictures to see if everything is okay, and 2) suggesting what may be the issue and what needs to be solved, for example, properly apply branded Yandex.Taxi stickers.

When it comes to automating access to ride requests, errors can result in some serious consequences. That’s why we worked backwards, first trained the network to find pictures that definitely preclude a driver from the right to use the app.

The algorithm is based on a multi-stage assessment process. First it looks at photo quality and blocks any dark or blurry pictures, as well as pictures without a vehicle or where the car didn’t fit fully in the frame. In these cases, the system automatically notifies the driver that they need to send new pictures.

Then the algorithm checks the license plate number by recognizing it in the image and comparing it with what’s in the vehicle’s card in the system (created by the self-employed driver or taxi company partner). Further on, the vehicle’s make, model and color are also checked against what’s in the system.

Machine Learning Helps Acquire New Users

Yandex.Taxi riders have the option to request a car with a child safety seat. This is a major competitive advantage we offer that our competitors don’t, so it’s no surprise that our marketing team wanted to tell our users, especially parents with children.

VKontakte, Facebook and Instagram are just a few of the channels we host ad campaigns on. Here we can either show ads to random users, or use integrated social media targeting. But there’s also a third option — connect algorithms that help make targeting even more precise.

Thanks to Yandex.Audience, we can find out how likely it is a specific user has kids or a car. Plus, Yandex.Taxi also has data on users who request rides in the Kids service class. These two data sources can be used to make a look-alike model to find anonymous profiles who don’t use Yandex.Taxi, but based on audience features look a lot like users who request Kids service class rides. These are the people best targeted to see our ads on social media.

To evaluate algorithm effectiveness, the marketing department compared the look-alike model results to random targeting and targeting platforms. Ultimately, the system drove down the price of install by three times all thanks to the fact that the ads were shown to users who were actually interested.