Are those fries really missing? Preventing refund abuse with machine learning
Have you ever had an issue with one of your food deliveries? I’m sure you have. In fact that’s the first thing people tell me about as soon as I mention I work for Glovo. “Can you believe they missed my fries?”.
There could be issues with couriers, issues with partners, miscommunications, etc. So of course, refunds are very important to us. They are the best way to compensate for any issue they could have had. But like with so many other systems, fraudsters and abusers come in and try to take advantage of refunds. We’ve realised that if you block the worst abusers and increase the number of refunds you grant automatically, it not only reduces costs but also makes the experience faster and better for customers.
Refunds and the risk score
When you request a refund in Glovo you basically have three different possible outcomes:
- Self-service or automatic refund: is the process of receiving the refund without having to talk to anyone. Automatic refunds aren’t just better because they help us cut costs, they are also advantageous in that it’s seamless for the customer to get their refund.
- The customer support window, where you can talk to an agent and sort out any issues that you’ve had in your order.
- Refund blocked, a message indicates that you won’t be getting a refund.
But of course when the process is easy, some people take advantage. These are what we call refund abusers, customers that are taking advantage of how little friction there is in this process to get compensated. They’re characterised by having a high number of refunds, they are usually newer accounts, they often have multiple accounts (on the same device, using the same phone number, credit card, etc) and they also tend to buy items that can be resold (bottles of liquor, SIM cards, video games, etc.)
In order to address the issue of customers abusing the automatic refund system, a risk score was created to identify these users who are likely to commit refunds-related abusive behaviour. It is a score between 0→100 where higher values represent customers with a higher risk. The risk score works in conjunction with a set of rules, designed and updated by fraud analysts. We describe it as a kind of traffic light: customers with low risk are “green” and have their refunds granted automatically, medium risk ones are “yellow” and are sent to customer support while high risk ones are “red” and are automatically blocked.
When it’s live, the model generates its prediction in between the time the customers order and the time that they would (on average) request a refund. The score is ready for every customer that orders, regardless of whether they’ve ever requested a refund.
Now, to calculate this score we use a machine learning model. After testing several alternatives the algorithm chosen was Isolation Forest, an unsupervised anomaly detection algorithm. It is similar to a Random Forest, in that it develops a tree structure in the data. However here the trees are made in a way where the smallest branches give an indication of which data points are more anomalous. For the purpose of the risk score, we consider the most anomalous data points, the most risky.
We supply the model a variety of features in order to discern which data points are more anomalous. In the model we include features including the total refund rate (refunded amount in euros / total amount spent in euros) and cancellation rate (percentage of orders canceled), to give information about how they normally behave regarding refunds; if it’s the customer’s first order; promo codes used and; the number of other customers in the network.
When we talk about customer networks we refer to another model that works connecting customers to each other based on shared devices, credit cards, phone numbers, among others. For the refunds model these network features are used to better identify abusers that are creating new accounts, to try to get more refunds after they’ve been blocked. These new accounts usually share at least one of these elements with previous accounts, making them part of the same network.
The complicated part about an unsupervised model is verifying the results. We have three important points of analysis when we look at the model’s results:
- Are we identifying the most abusive customers?
- How many non-abusive customers would be penalised (false positives)?
- Which are the customers identified as outliers?
First, we study the period after a refund is granted, to analyse repeat offenders and validate the model’s results. Also we produce a sample regularly that is reviewed and manually labeled by fraud experts. The fraud team checks the cases in depth and identifies when the model has mislabeled an order. This way we can determine false positives/negatives and see specific cases where the model is making wrong predictions and why. Among others, we also monitor feature importance, to see which factors are the ones increasing and decreasing the risk score.
Overall we’ve seen some amazing results but in some areas more work needs to be done. Since it’s been implemented, the model is fairly precise at identifying the riskiest customers, and with the addition of the network features it can identify abusive customers that are creating duplicate accounts. The biggest issue so far has been with false positives, the score can be punishing. For example a particular customer can have a bad week with three refunds and then they might not be granted an automatic refund for a while, because the model still puts a lot of weight on those three refunds. We’re currently working on ways to make the score less punishing in cases where it is not warranted.
The model has been in production since 2022 and we have achieved significant savings. In addition to blocking the most malicious customers, we are also reducing the cost of all the refund requests that would have gone to customer service that now are done automatically. The first iteration that we did was only for refunds dealing with missing items or mistaken items but we have since then expanded it to cover a broad range of cases.