Introducing Shield: talabat’s Fraud Prevention Service

Yassin Zain Alabdeen
talabat Tech
Published in
7 min readJun 27, 2022

If we’re not as enthusiastic about catching cheaters as cheaters are about cheating us, then we’re definitely getting screwed.
Ruth Langmore, Ozark

Introduction

Fraud is a significant challenge for us at talabat, due to being a hyper-growth multi-sided marketplace. There are many opportunities for vendors, riders, and customers to exploit our system and gain from it financially. We have observed several fraud patterns that range from casual customers taking advantage of services and features on a small scale, to professional fraudsters who make a living out of it. Some would create fake accounts and sell them on social media platforms and in some cases, they even strike a deal with a vendor. Some riders also collaborate to form a fraudulent community and help each other commit fraud. They make money by delivering fake orders placed by the riders themselves.

The Fraud Data Team’s mission is to protect talabat without negatively impacting our genuine customers, riders, and partners. We built a real-time fraud service (Shield) from scratch to achieve this. Pre-Shield, there was no reliable or unified system in place to detect and prevent fraud. Essentially what existed was:

  1. A set of uncoordinated and relatively basic customer-level rules scattered across multiple services.
  2. A team of human investigators (which we refer to as the SWAT heroes) would manually investigate any irregular orders and then react by canceling suspicious transactions and blocking fraudulent accounts.

Shield Architecture

Shield is a central place where fraud can be fought without other micro-services worrying about implementing their own logic. Fraud prevention requires a holistic view and keeping the logic scattered across multiple services leads to leakage and effort duplication.

Shield Architecture Diagram

Shield Components

At the core of Shield is a Rule Engine that executes a set of rules against each type of request. Based on the outcome of these rules, Shield will return the decision and might also take additional actions such as blocking an account or a device. Multiple micro-services call Shield to check for suspicious patterns and allow/decline the transaction based on the response. The main components of Shield are:

  • Data Layer: Feeds Shield DB with both real-time and historical data.
  • Shield DB: This is the feature store that contains both real-time and historical features.
  • Rule Processor: The rules are stored on GitHub, and the rule processor will fetch any updates to Shield API every minute.
  • Rule Engine: Parses the rule and fetches the features to evaluate the logical expressions.
  • Blocking API: To block accounts and cancel transactions.

Rules

Our rule engine enables Fraud Analysts and Data scientists to write and deploy rules without the need to write code. It only requires simple SQL knowledge (select * from table) to pull the relevant features from Shield DB. We’ve designed our Rule Engine to cater to fraud’s various requirements. This includes but is not limited to:

  • Structure: Our rules are written in JSON format and follow a simple and well-defined structure.
  • Selection: The request should meet a ShouldRunExpression condition to execute a rule. For example, a rule might be only relevant to a specific city or country. This prevents unnecessary DB calls.
  • Shadow Mode: It simply means that the rule results will be logged, but no action is taken. This is used to test the sanity of a rule on production without actually enabling it.
  • Throttling: Things go wrong; it is part of nature. To prevent damage to the business, we define a threshold to prevent the rule from taking action if the threshold is exceeded. We define this threshold based on historical data and our risk appetite.
  • Whitelisting: The same way we define rules to block transactions, we also define whitelisting rules. This enables us to have a central place to control all whitelisting logic by utilizing the same infrastructure.
Anatomy of a Rule

Why Start with Rules?

From the beginning, we decided to approach fraud by understanding the actor, motive, and mechanism for each pattern. This enabled us to write pattern-specific rules and, in many cases, close the loopholes in our system that fraudsters exploit.

We started with rules because they are easy to write and understand. With no system in place, Machine Learning (ML) would not be the right first step. Rules capture simple fraud patterns while ML can capture higher-order interactions between underlying features. Moreover, rules are fully-interpretable while ML requires explainers to gain an insight into their decision-making process. Another factor is that rules are easier to tune for precision to avoid affecting genuine customers. Therefore, our system started as a rule engine and evolved into a system that employs both logic-based rules and ML models in a complementary fashion to provide a comprehensive approach to detecting fraud. ML models can uncover hidden patterns that we can prevent by writing a simple rule. The model output can also be used as a feature in the rule.

We have different ML models that address Anomaly Detection, Fake Email Detection, and Behavioral Fraud Detection. Anomaly Detection addresses the fact that we do not know all existing patterns. It helps us uncover those patterns and understand them using explainability. Behavioral detection is to address the issue related to identifiers. Assuming nothing is known about the entity, can we detect fraudulent behavior using the interactions with the App? Combining this with rules provides a more comprehensive approach, and we already see the advantages.

Fraud Intelligence

Soon after Shield was born, we realized that relying on our collected information is not enough. Fraudsters are smart and always find sneaky ways to get around the rules we implement. For example, we started by scraping fake mobile numbers available on multiple free websites. This was a reaction to an increase in fraudulent transactions on international numbers. The fraudsters then responded by using fake local numbers from paid websites. We also found a solution to scrape these websites that helped in catching more fraud. Then we started scoring email and IP addresses and collecting intelligence to assess the risk associated with them. We also implemented algorithms to detect fake emails and block these accounts in real-time.
In addition to this, we started assessing the risk associated with each device. This includes checking for cloned Apps on Android, for example, and the integrity of our app.

Each of these topics deserves a separate article, and we will publish content with more details, so keep an eye on this space!

Data Layer

To effectively fight fraud, it is crucial to have access to high-quality data in real-time. We ingest multiple events from the beginning of a customer’s lifecycle. We build deep and comprehensive entity profiles (customers, riders, devices, etc.) from these events that we use in our rules.

There are two types of profiles:

  1. Real-time Profile: contains aggregated features that happened within the last day or week.
  2. Batch Profile: contains aggregated features that cover a longer time frame, such as 30 or 90 days that are created on our Data Platform (BigQuery). This profile also contains information that we don’t have in real-time.

Data Layer consists of event real-time streaming and data batch loading into Shield DB, which is an AWS Aurora PostgreSQL instance.

Real-time Pipelines

Real-time streaming is a traditional SNS ➜ SQS Lambda flow. (Another article will be dedicated to delving into the details, so stay tuned!)

Real-time Streaming Pipeline

Another important component of our real-time pipelines is using Graphs. Graphs are a very powerful tool for detecting fraud. We build different graphs that connect entities (users, riders, devices, etc) and use the features of the graph in the rules (in-degree, shortest path, etc). We considered using a fully-fledged graph database such as Neo4j or Amazon Neptune. However, we decided to use Postgres recursive common table expressions (CTEs) to traverse graphs as it proved to be quite effective.

Batch Pipelines

The second part of the data layer is batch loading. The challenge is loading large amounts of data into a production DB without impacting the performance. This was an interesting problem to solve, and the implementation will have a separate article.

Batch Pipelines

Conclusion

Shield replaced talabat’s manual processes and fragmented logic for fraud prevention with one centralized system. Shield acts as a hub for fraud prevention logic to maximize accuracy and reduce leakage. This helps other teams such as marketing, product, and operations focus on their objectives without worrying about the negative impact of fraud. Our north star is protecting talabat without negatively impacting the experience of our genuine customers. There will be other articles to explain the different components, so stay tuned!

Acknowledgments

Thanks to everyone taking part in building Shield to protect talabat and it’s customers from fraud. Thanks to the fraud team members Alexey and Omar for their contributions and to Yusuf for the guidance along the way.

--

--