Improving Ride Dispatch with Data at Rapido

Siddharth Panchanathan

Published in

Rapido Labs

8 min readAug 18, 2020

Given our last blog post of the series, which can be found here :

Designing Data Experiments During a Pandemic

These are unprecedented times. With the Covid-19 virus taking the world by storm, pretty much nothing is the same…

medium.com

We thought it would be helpful to explain how we implemented all of the above into an on-ground experiment. We mentioned above about how the lack of a logical time-based control group forced us to pivot to geo-temporal control formation. I would like to take this opportunity to talk about an experiment we ran as part of the Dispatch team @ Rapido.

What is a Ride Dispatch?

The system that decides which order request (when you tap the Request Rapido button, aka the Book my Ride button, on your app) should be sent to which particular Captain(s) to ensure that the Captain reaches the customer in the quickest and most efficient way possible, is called ‘Dispatch’. It is an homage to the days of old when Taxi services were run over the telephone and a Customer who had called in for a pickup would be patched through to an Agent who would find a willing cabbie (often after multiple calls) and that driver was “dispatched” for that order.

Dispatch is one of the key levers of a ride-hailing marketplace. It is one of those systems that EVERY ride request has to propagate through, hence the room for error is low, with the stakes being very high.

One of the first questions we had to answer while even thinking of a product to build was, “What metrics do we look at to see if marketplace conditions are being improved”? Is the ETA the gold metric for this system, or do we look at other things like Matching Time, Distance Driven by the captain to get to the customer, and cancellations from both the demand and supply sides? We definitely had to be cognizant of these metrics while evaluating any changes to our system.

What was Dispatch @ Rapido like before we started rebuilding it?

Going into the rebuilding process, the current dispatch system was a simple radial system, where a customer requests a ride on the app, and the system draws a circle of radius say 2 km, and looks at all the captains in that area, calculates the crow-flying distance to the customer, and propagates the ping in order.

As a first solution, this is fine, but discerning data enthusiasts can probably find many issues with this system — how to design the optimal radius, what happens if there is a huge divider like a ring-road or a railway crossing that results in a short euclidean distance but long route based distance. In the latter case, this would be categorized as a sub-optimal match, as now the captain has to spend more time driving empty kilometers to reach the customer, and the customer gets frustrated about being matched to a captain who looks close by but takes twice the time to reach the pickup location.

This specific use case can be reduced to a higher-level question: for a given pickup location, is there a corresponding nearby area that should be geo-fenced when considering it to be a part of the “dispatch radius”?

Furthermore, is there a location that is potentially further away in a euclidean sense, but closer by in terms of driving time?

Won’t this problem be alleviated by paying for a maps API?

Too expensive at a per-request level. Right now, even though we are at 20% of our pre-COVID levels (and recovering every week!), servicing each request via google-maps API would be prohibitively expensive for a growing startup like Rapido, especially in these times where innovation is warranted. The goal was to deploy a smart solution, without breaking the bank, that would still have a high impact on the ground.

Building the driving time estimates

The most crucial component of a smart Dispatch system is having reliable driving time estimates. This is essentially built by leveraging the huge store of data available to us from our historical rides. As part of our internal logging, we record the time taken from :

The captain to the customer aka the ETA
The customer’s pickup to the customer’s drop aka the Ridetime

Each part of this gives us more coverage within a city in terms of pickup-to-drop driving times. The ETA gives us short-distance coverage, and the Ridetime gives us longer-distance coverage. We combine the two sources of data and group-by at a time-of-day and a day-of-week level, remove outliers, add a few filters for the minimum amount of rides being done in that bucket to be considered valid, and store the output in a dataset to be consumed by any concerned team.

Designing the experiment

Once we have a pickup-to-drop driving time map, at a time-of-day and day-of-week level, we now get to the dirty work of actually designing an experiment. The first step was to answer the question of, “for a pickup location, can we find a close-by area that has a worse driving time to the source than a further away area”. I will use this segue to introduce some of the terminologies we use in this regard :

source_hex : the Uber h3 derived hex8 in which the ride request originates

bad_hex : the Uber h3 derived hex8, which is closer to the source_hex geometrically, but not while driving

good_hex : the Uber h3 derived hex8, which is further away from the source_hex geometrically, but has a faster driving time than the bad_hex

We do this analysis at a time_of_day and day_of_week level, so a trio of HexA HexB and HexC could be mapped as : Source_hex -> HexA, Bad_hex -> HexB, Good_hex -> HexC on a Monday morning, but on a Sunday evening, it is not necessary that HexB and HexC’s relative driving times to HexA are the same. We were cognizant to not make too many dangerous assumptions here.

An example of a source_hex, bad_hex and good_hex

Here the brown hex is the source_hex, the yellow hex is the bad-level-1-hex and the orange one is the good-level-2-hex. Now, from this map, it is not clear what is the reason for the increase in ridetime from yellow to brown as opposed to orange to brown. But when we look at the google maps view it becomes evident :

We see that the brown and orange hex8s are bifurcated by a huge railway track ( Vijayawada is one of the biggest railway junctions in the country and regularly reports trains crossing road tracks ). On the other hand, the orange hex8 has clear unfettered access to the source_hex.

Once we have the universe of such hex trios, we are back to the problem or how to do a test-control split. Given that time-based control is not an option, we tried to use features ( relevant to dispatch) of each hex-trio and passed it through a vector-similarity measure to calculate the similarity scores of each pair restricted by both of them having the same day and time at which the hex-trio is valid ( aka both test and control source hexes have bad and good hexes on the same day and time period ).

It doesn’t make a lot of sense to say HexA on a Monday morning is similar to HexB on a Wednesday afternoon. So we only do the split if HexA and HexB are both source_hexes on the same day and time period.

Once the above is done for each pair in the universe, we start building the test control split to ensure that no hex in the test group is also in the control group through some other mapping, as this would contaminate the experiment results.

Given that we now have our test-control split, the measure we take is that the test-group source_hexes will have the good_hex included and bad_hex excluded when creating the “dispatch radius”, whereas the control-group will not have the bad_hex excluded and good_hex included. Given that everything else remains the same, the test group should show a reduced ETA compared to the control group post-experiment.

We then ran this experiment for 2 weeks and tried to get 1000+ orders cumulatively in both the test group and the control group, so we don’t suffer from data-sparsity while analysing what happened.

Experiment Results

We ran this experiment in Hyderabad where we saw an ETA reduction in test group vs control group of around 9% when comparing the Median ETAs and almost 13% when comparing Mean ETAs. Pre experiment the test and control groups had a difference of only about 3% when looking at both Mean and Median ETAs, thus showing us that the changes we made actually added value to on-ground ETAs.

We know that no experiment can be called successful without statistical tests of significance, so we went into the experiment having defined our hypothesis as follows:

H0 ( Null Hypothesis ) : Hex based swaps have no effect on realized ETAs

H1 ( Alternate Hypothesis ) : Hex based swaps DO have an effect on realized ETAs

Rejecting the null hypothesis at a significance level of above 95% is the gold standard that we were striving for, and we are happy to report that we achieved statistically significant results at a level of around 98%, with p-values in the 0.01 range when using a few statistical tests of significance.

When viewed visually, what we got was something similar to this :

Test vs Control group change visualized -> Blue vertical line represents the mean of the test-group, and the Purple vertical line represents the mean of the control-group

What this image is telling us, is that when viewed on a relative scale and after adjusting for pre-experiment ETA delta, we have shifted the center of the test group ETA distribution towards the lower side when compared to the control ETA group, thus showing that our changes have made an impact in lowering ETAs as we expected.

Conclusion

The high-level goal as mentioned at the start was to improve a key aspect of dispatch: ETA. We wanted to add a good amount of value by doing something that was not cost-intensive, rather by doing something that leveraged the technology and information we already had. This is the hallmark of any data-science team, to use common sense and best practices to uncover hidden insights using as simple an approach as possible.

If you enjoyed this blog post, check out what we’ve posted so far over here, and keep an eye out on the same space for some really cool upcoming blogs in the near future. If you have any questions about the problems we face as Data Scientists at Rapido, about transitioning to a start-up after a few years in a different field, or about anything else, please reach out to me on LinkedIn or on siddharth.p@rapido.bike, I look forward to answering any questions!