Automation of Strategy Generation for Anomaly Detection in E-Commerce

SiyuanZ
Walmart Global Tech Blog
7 min readJul 5, 2022
Picture credit: https://corporate.walmart.com/newsroom/innovation/20170807/5-ways-walmart-uses-big-data-to-help-customers

Introduction

Since 2020, the e-commerce and online retail business have surged and continued to grow during the COVID-19 pandemic. Online platforms have become more and more popular as a wider population adopts the online shopping experience. Not only has the pandemic continued to drive e-commerce and online retail sales growth at the fastest rate in history, it has also changed consumers’ shopping behavior significantly. With an aim of building a trustworthy brand and improving customer experience, e-commerce and online retail businesses have a stronger focus than ever to adapt to changes and generate strategies or rules for anomalous orders in a timely fashion.

There are a few challenges in rule strategy generation for anomaly detection in online retail businesses:

  1. Customers’ shopping patterns have become more and more complicated. Thus, strategies must evolve quickly and accordingly to combat the rising anomalies.
  2. Traditional approaches to strategy generation rely heavily on human review. However, it is extremely difficult for human intelligence to catch and understand every pattern change, creating a bottleneck in terms of speed and accuracy.
  3. Data is enormous. Even though there are hundreds and thousands of features available for anomaly detection, it is overwhelming and difficult to exhaust all features manually and pick the most relevant ones to use.

In the following sections, we will introduce a procedure that leverages machine learning techniques and an innovative measurement score to generate strategies without human intervention. Additionally, the procedure combines features in a non-linear fashion to identify anomalous patterns with efficiency.

Methodology

In contemporary risk management practices, strategies are the essential building blocks of a risk defense engine that protects customers as well as the corporations. Technically speaking, strategies are equivalent to mini machine learning models that target ongoing order trends, analyze purchasing behaviors, and uncover anomalous patterns. Generating efficient, long-lasting, and agile rule strategies is a major contributor to the success of risk defense.

Figure 1. Overview of a contemporary risk defense engine infrastructure. The strategy component plays an important role in defending risk and improving customer experience as it adapts to changing purchasing behaviors.

Decision tree, a machine learning algorithm, can analyze and identify patterns out of thousands of input feature variables. Due to its nonlinear nature, decision tree is more flexible than linear algorithms (such as logistic regression). Decision tree not only outputs binary outcome, which allows one to make a risk decision, it also manifests the decision rules explicitly. Hence, the decision rules can be utilized as the strategies and implemented directly in a risk defense engine.

Precision and Recall are commonly used metrics to perform model selection and model evaluation. The two metrics are also quite useful in strategy generation and development. Technically, Precision measures the accuracy of the strategy, and Recall measures the coverage. In addition to Precision and Recall, we introduce a Stability metric to assess a strategy’s robustness by permuting and bootstrapping a training data set.

Precision-Recall-Stability (PRS) score

Before introducing the newly designed, comprehensive PRS score, it is worth noting the commonly used Confusion Matrix, Precision formula, and Recall formula for model evaluation for classification algorithms.

Figure 2. Confusion Matrix
  • Precision = True Positives / (True Positives + False Positives)
  • Coverage or Recall = True Positives / (True Positives + False Negatives)
  • Robustness or Stability

The new PRS score is essentially a weighted harmonic mean of Precision, Recall, and Stability. It is also a comprehensive measurement of accuracy, coverage, as well as robustness. Parameters, ⍺, β, and 𝞬 are used to adjust different weights to each component, depending on the focus. A higher value of ⍺, β, or 𝞬 puts more focus on the corresponding component in the PRS score. Ultimately, the PRS score provides a more extensive and flexible measurement of a decision tree’s performance than normal metrics (such as the F1 score).

Evaluate decision tree using PRS score and AUC score

Normally, a decision tree or a predictive model is trained on a training data set with true labels and feature variables. Area Under the Curve (AUC) is commonly used to report model performance. Once a model produces a numeric score for each data observation in the validation data set, by ranking the scores, we can calculate AUC accordingly. This mechanism can be easily extended to the decision tree, even though it produces factor outcomes instead of numeric scores. For example, we may use probability as a surrogate to score. Note that when we predict with a decision tree model, we go down from the root node to a leaf node, where we predict with majority class. In addition to probability, each leaf node may also return numeric score (such as the PRS score), which is used for ranking the observations.

Assume that we have a trained a decision tree to predict 2-class outcomes: P (positive) or N (negative) with majority class labels as follows:

Figure 3. A trained decision tree to predict positive or negative outcomes. The decision tree has 5 leaf nodes. Orange leaf nodes predict positive outcomes, and blue leaf nodes predict negative outcomes.

Meanwhile, Table 1 shows a validation data set ranked in descending order of PRS score. The validation data set contains 4 score bins: 0.86, 0.66, 0.31, and 0.16. We have also used the true labels and predicted outcomes to calculate the True Positive Rate (TPR) and False Positive Rate (FPR) for each score bin.

  • TPR = True Positives / Total Positives
  • FPR = False Positives / Total Negatives

With TPR and FPR, we can draw a Receiving Operating Characters (ROC) curve, shown in yellow in Figure 4. Note that each point on the curve represents each bin. Given the curve, we can calculate the area under the ROC curve, which is the AUC score. In this case, the AUC score is 0.73.

Figure 4. ROC curve

Once the best decision tree model has been selected based on the AUC score, we can easily pick the strategies out of the branches of the tree with the highest PRS scores. In this case, the best strategy is the decision rule that leads to the leaf node with PRS score = 0.86. That is, the strategy predicts a positive outcome when the weather is rainy but not windy.

A real-life use case of for bot detection

A real-life use case of this process prints the following output from the decision tree model. The best strategy is the highlighted branch in the figure below. Weights are the number of negative samples and the number of positive samples in the branch.

Then, we translate this output into JavaScript language to use in the decision engine. Essentially, we decline orders due to high anomaly risk (bots attack) if they meet all four criteria:

  • string distance is large between account name and ship to name
  • count of distinct credit cards from the order IP address ≥ 1
  • ship address is new (0 days on file)
  • device is new (≤ 1 days on file)
if (levd_cust_shiptp_name > 8 && 
ip_cnt_cc_150d > 1 &&
cust_addr_tof <= 0 &&
cust_device_tof <= 1) {
recommendation = 'Decline'}

Feedback loop to refine strategy generation

It is inevitable that every decision rule or strategy may generate both false positives and false negatives. Therefore, a feedback loop mechanism is commonly employed to bridge the gap.

The feedback loop helps inject domain knowledge and in-depth understanding from human agents. In the meantime, real-life anomalies will continue to provide false negative signals. For the next round of strategy generation, the model can take the false positive and false negative signals during the training phase and add more weight to the falsely decisioned cases, improving the accuracy of the strategy decision.

Conclusion

The automation of strategy generation overcomes the bottleneck of relying on human eyes to exhaust a large set of feature variables and capture rapidly changing patterns. The table below lists the advantages of the proposed strategy generation process over the traditional process.

                    | Approach |    Coverage    |     LOE     |
| -------- | -------------- | ----------- |
Proposed process | Auto | ~1000 features | 3 hours |
Traditional process | Manual | ~50 features | >= 1 day |

To defend a typical anomaly case, human review can take a day or even a week to come up with a reasonable strategy, but this automation can reduce that time by 80%. Armed with machine learning algorithms and a comprehensive metric evaluation, the automated strategy generation has achieved greater efficiency, coverage, and adaptiveness.

--

--