ANOMALY DETECTION

Anomaly detection is all about finding patterns of interest (outliers, exceptions, peculiarities, etc.) that deviate from expected behavior within dataset(s). As an application domain within anomaly detection, fraud detection dominates the banking industry. Fraud detection uses anomaly detection to uncover behavior intended to mislead or misrepresent an actor.
The ultimate end goal or output of anomaly detection is not just an algorithm or working model but also the business outcome and kicking off the necessary processes that follow. For example, it would not be good enough to simply identify bad actors, fraudsters, fraudulent transactions, or network intrusions; the full AI system should also take actions based on these identifications, like escalate cases to a fraud investigation team, block accounts, or alert the proper teams to nefarious actions.
In addition, anomaly detection requires a system that is agile and constantly learning because:
• The very nature of the use cases for anomaly detection means fraudsters or other bad actors are specifically and deliberately trying to produce inputs that do not look like outliers. Adapting to and learning from this reality is critical.
• As spending trends and an increasingly global world continue to change the face of banking, datasets will shift over time, so a system needs to evolve along with its users.
It is important to note that, despite the most common use cases being the detection of fraud or system failure, anomalies are not always bad — that is, they do not always have to indicate that something is wrong. Anomaly detection can also be used, for example, to detect or predict slight changes in customer or user behavior that may then result in a shift in selling, development, or marketing strategy, allowing more accurate market predictions and the ability to stay a step ahead of new trends.
TYPES OF ANOMALIES THAT MAY BE DETECTED
1. Point anomalies
Point anomalies are simply single, anomalous instances within a single larger dataset. For example, a transaction representing $1 trillion would be a point anomaly, as that would be more money than even the richest conglomerates make in a year. Anomaly detection systems often start by identifying point anomalies, which can be used to detect more subtle contextual or collective anomalies.
2. Contextual (or conditional) anomalies
Points that are considered anomalous in a certain context. A good example is a transaction again; while $10,000 is considered to be within the range of possible transaction amounts, if it is outside a credit limit then it is clearly anomalous.
3. Collective anomalies
When multiple related datasets or parts of the same dataset taken together are anomalous with respect to the entire data set (even when individual datasets do not contain anomalies). For example, say that there is data from a credit card making a purchase in the US, but also a dataset showing money taken out of ATMs in France at the same time.
A collective anomaly may occur if no single anomaly happens in any one dataset, but all datasets measuring various components taken together signal an issue.
HOW
There are several particularities to bear in mind when working with anomaly detection:
1. CHOOSE AND UNDERSTAND THE USE CASE
The first step in successful anomaly detection is to really understand what kind of a system the line of business needs and to lay out a framework for the requirements and goals before diving in. These are important preliminary discussions because not all anomaly or fraud detection work is the same; exactly what qualifies as an anomaly and the subsequent processes kicked off by anomaly detection vary vastly by (and even among) use cases.
2. GET THE DATA
Having as much data for anomaly detection as possible will allow for more accurate models because one never knows which features might be indicative of an anomaly. Using multiple types and sources of data is what allows systems to move beyond point anomalies into identifying more sophisticated contextual or collective anomalies. In other words, variety is key.
3. EXPLORE, CLEAN, AND ENRICH DATA
When doing anomaly detection, this stage is even more important than usual, because often the data contains noise (usually errors, either human or not) which tends to be similar to the actual anomalies. Hence, it is critical to distinguish between the two and remove any problematic data that could produce false positives.
4. GET PREDICTIVE
There are two primary architectures for building anomaly detection systems:
• Supervised anomaly detection
You can use if you have a labeled dataset where you know whether or not each datapoint is normal or not.
• Unsupervised anomaly detection
When the dataset is unlabeled (i.e., whether or not each datapoint is an anomaly is unreliable or unknown).
When using a supervised approach, apply a binary classification algorithm. Exactly which algorithm is less important than making sure to take the appropriate measures regarding class imbalance (i.e., the fact that for anomaly detection, it is highly likely that you have far more “normal” cases than anomalous ones).
When using an unsupervised approach, there are two ways of training algorithms:
• Novelty detection
The training set is made exclusively of inliers so that the algorithm learns the concept of “normality” (hence the prefix “one-class” found in some methods). At test time, the data may also contain outliers. This is also referred to as semi-supervised detection.
• Outlier detection
The training set is already polluted by outliers. The assumption is made that the proportion of outliers is small enough, so that novelty detection algorithms can be used. Consequently, those algorithms are expected to be robust enough at training time to ignore the outliers and fit only on the inliers.
5. VISUALIZE
Visualizations are especially useful in the process of building and testing anomaly detection models because sometimes they are the clearest way to see outliers, especially in very large datasets.
6. DEPLOY AND ITERATE
To have a real impact with a fraud or anomaly detection system, the model should be scoring data real-time in production. Fraud and anomaly detection in banks are generally extremely time-sensitive, so going to production to make predictions on live data rather than retroactively on test or stale data is more important than ever.
But putting a model in production isn’t the end. Iteration and monitoring of fraud — and any other anomaly — detection systems is critical to ensuring that the model continues to learn and be agile enough to continue detecting anomalies even as environments and behaviors change.
LOOKING AHEAD
In the years to come as more and more use cases depend on anomaly detection, expect to see an overall streamlining of anomaly detection processes as organizations scale and improve the level of trust they are willing to put in AI-driven systems. This means more and more organizations investing in the right architecture to retrieve data critical for anomaly detection work, the means to process it quickly, and apply models for the biggest impact and business value.
This does not mean that trust is inherent, or that once stakeholders trust one model the rest will naturally follow; transparency and clear visualizations will always be critical to ensure adoption and integration. But it does mean that the seeming impossibility of predicting (and thus preventing) disastrous fraudulent charges (in addition to conserving organizational time and resources) is on the horizon. However, anomaly detection isn’t the only opportunity for AI in banking. There are numerous challenges facing banks that anomaly detection alone is poorly situated to solve.
