Referral Graphs for Intelligent Customer Acquisition

Published in

walkin

6 min readSep 24, 2019

The growth of the mobile app ecosystem in the past few years has been phenomenal and it is still slated to grow. The upsurge of mobile space has changed the way retailers interact with their customers. Apps have provided an efficient alternative to print media to advertise discounts and offers. Naturally, there is more money spent by retailers in the form of incentives to onboard and retain customers on their mobile apps. According to eMarket, $7.1 billion was the estimated spending on mobile app install ads in 2018.

Unsurprisingly, more money in the mobile ecosystem has also attracted more fraudulent users who will attempt to game the system to make money. According to Tune, app-install fraud cost marketers nearly $2 billion in 2017. Whereas a study by Inc claims that 2019 is likely to bring risk worth $13 billion to mobile marketers due to installation frauds.

Mobile frauds can broadly be classified in the following categories

Installation fraud: Users doing app installs just to avail of the incentives for first-timers.
Referral fraud: Customers referring themselves multiple times to get referral incentives by using different identity or using bots to duplicate human referrals
Coupon misuse: Willful violations of the terms and conditions for redemption

From retailers’ perspective, while there is a qualitative awareness that their mobile marketing spends are prone to this risk, the companies are usually either oblivious to the quantum of risk or tackling these frauds is often not a priority. Furthermore, what it takes to address the issue in a data-backed manner is not necessarily aligned with their forte — imagine asking a company which has been serving lip-smacking Pizzas for years to conceive a data science solution to deter tech-savvy, cunning mobile fraudsters. Thus, often the solutions employed by these retailers are adhoc, reactive and also tend to bring collateral damage.

It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.
Arthur Conan Doyle

The case study:

To understand the depth of the problem, we analyzed one month’s app installation data for one of our customers which lead to the following key findings:

36% of customers joined the app using referrals
Close to 90% of those have transacted twice or less in 6 months post-installation
Low usage customers consumed sign-up bonus worth ₹8 Lakh in a month

We observe that app-installation fraud is a serious business concern as it directly impacts close to 30% of total customer acquisition costs.

The Trust Prediction Engine

A solution that mitigates the app-installation fraud problem is a system that predicts a trust score of customers as soon as they install the application. This information can then be used to intelligently control the sign-up incentives. A simple measurable indicator of fraudulent behavior is the ratio of cash used by the customer to the total spending. We refer to this by ‘C-to-S ratio’ and treat this as the trust score which is measured on the scale of 0 to 1.

An estimator of this trust score is obtained by building a model that combines the features of the user’s referral network. To compute the features related to referral network a directed acyclic graph is built with nodes as the customers and directed edges indicating the referral relationship between the nodes. We note that given a node A in the graph, any other node in the graph is a part of exactly one of the following three sets of nodes:

Nodes which are path-connected through an outward edge from node A
Nodes which are path-connected through the inward edge to the node A
Nodes which are not path-connected from node A

We refer to the nodes in the first set as the children nodes of node A and those in the second set as its parent nodes. Neo4j, an open-source graph database is used to construct the referral network with CRUD operations done using Python APIs and Cypher queries. Below is a visualization of a few subgraphs from the entire graph which has close to 5 million nodes and 3.2 million edges.

Correlation analysis and significance testing

The set of parent-nodes up to third-order edges of a customer is referred to as the local neighborhood of a customer. The C-to-S ratio of the local neighborhood is computed using the ratio of the aggregate cash to the aggregate spending of all the nodes in the local neighborhood. This number is chosen as an estimator of the expected C-to-S ratio of the customer in consideration.

A set of 25k customers who installed the app between January 2018 to March 2018 and who had a minimum of three neighbors in the local referral network was sampled randomly. For each of these customers, the Pearson correlation coefficient was computed between their estimator and their realized C-to-S ratio based on their real transactions up to May 2019.

Prediction Error Analysis

An error analysis was done where the estimated C-to-S ratio which is also the trust score is compared with the real C-to-S ratio. The error is defined as the difference between the real and predicted self beans ratio. Observations are as follows:

The error is found to have near-zero mean indicating that the neighborhood beans ratio is an unbiased estimator of the expected self beans ratio.
The best Gaussian probability distribution fit to match the error histogram is with zero mean and the standard deviation of 0.2.

Takeaway: Using the fitted Gaussian distribution, the system is able to give a prediction like there is a 67% chance that the newly joined customer shall have C-to-S ratio in the range 0.1–0.4.

Error histogram and Gaussian distribution fit

Conclusion:

Where does all this take us? To understand the impact of the predicted trust score or the expected self beans ratio of a customer, the actual spending pattern of close to 3100 customers whose trust score was predicted to be less than 0.4 and who did a certain minimum number of transactions post-installation up to May 2019 was done. The results show that 75 percentile of the customers who were predicted to have low trust scores used up the entire sign-up incentives awarded to them while the entire cash spent by them in about 16 months since them installing the app was less than two-thirds of the sign-up bonus. The analysis gives us an insight into the accuracy of the predictions and leading to potential savings in customer acquisition costs impact if incentives policy is suitably adapted.

In conclusion, as we present the utility of using referral network in charting an informed customer acquisition strategy, we also note that the algorithms based on referral graphs also promise in improving the customer churn estimation, in devising smart referral policy and building intelligent customer loyalty programs.

Referral Graphs for Intelligent Customer Acquisition

Correlation analysis and significance testing

Prediction Error Analysis

Conclusion:

Written by Sudin kadam