Fraud Prevention: Exploration-Exploitation Tradeoff in AI-based Systems

Danny Butvinik

Published in

Analytics Vidhya

8 min readMay 14, 2022

Fraud Prevention: Exploration-Exploitation Tradeoff in AI-based Systems

Introduction

Fraud or criminal deception is a serious ongoing challenge and it will always be a costly problem for financial institutions. As regulators and financial authorities introduce new technology to detect and prevent financial crime, criminals develop more sophisticated methodologies to evade legal scrutiny and commit offenses. AI-based solutions minimize these losses by incorporating machine learning systems that use immense volumes of financial data. Machine learning models detect peculiar patterns, behavior, or velocities associated with numerous factors in a high-dimensional, sparse datum.

However, fraudulent patterns are constantly evolving, since fraudsters are always ready to adapt by emerging their behavior as soon as new fraud prevention strategies develop. Not only fraudulent patterns are going through mutation, but also the financial data itself changes over a period due to a complex environment that encompasses the global economy, geopolitics, natural disasters, pandemics, wars, and other large-scale events.

Ultimately, machine learning models are just an abstraction of reality. If the underlying reality changes constantly and the model does not keep pace with it then model drift is inevitable. It is inescapable that the performance of many machine learning models for fraud detection will decline over time due to human behavior and ever-changing data. Contemporary machine learning models do not generalize well to new environments without explicit training examples. Model performance may start to degrade as data and the “ground truth” change over time. These are critical weaknesses that must be accounted for in artificial intelligence-based systems.

To address these changes (aka shifts in distribution or change in statistics) in the data and diverse types of fraudulent patterns, AI-based solutions should obtain the capacity to overcome concept drift. Concept drift means that the statistical properties of the data, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Specifically, identifying concept drift is the process of detecting changes in the relationships within datasets. This could be continuous change over time, periodic or recurring changes as with seasonal data or sudden sweeping changes.

Detect concept drift ends by alerting the model about the change in the data and updating the model accordingly. Each time the model updates itself — it goes through the process of exploration. When the model does not update itself and proceeds to run — it goes through the process of exploitation. The optimal solution for every learning system is to find the optimal tradeoff between how much we want to update our model and how much we let it exploit what it has already learned. It calls the Exploration-Exploitation tradeoff.

**Figure 1**: High-level scheme of exploration-exploitation mechanism. The model receives an instance to predict, then it decides whether to change itself (adapt itself to the drift) or to continue to exploit (predict) based on the already gained knowledge. Image credit: Author

Concept Drift

Capturing evolving trends and patterns in real-time within financial data is critical for a robust and efficient fraud detection system. As new goods are imported and novel frauds arise, a drift-aware fraud detection system is needed to detect both known frauds and unknown frauds within a limited budget. The performance of machine learning models will inevitably decline over time due to their inability to adjust themselves to concept drift.

Concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Identifying concept drift is the process of detecting changes in the relationships within datasets. This could be continuous change over time, periodic or recurring changes as with seasonal data or sudden sweeping changes.

This is particularly relevant for models related to human behavior (fraudsters). Contemporary machine learning models do not generalize well to new changing environments without explicit training examples. Model performance may start to degrade as data and the “ground truth” change over time. These are critical weaknesses that must be accounted for in AI-based systems for fraud detection.

To overcome the serious problem of changing statistics in the financial data, a concept drift detection mechanism is needed to be developed. Identifying concept drift is the process of detecting changes in the relationships within datasets. The concept drift is continuous change over time, periodic or recurring changes as with seasonal data or sudden sweeping changes.

Exploration-Exploitation Tradeoff

In 1991, Professor James March from Stanford University extended professor Robert Duncan’s theory from 1976 where he articulated the challenge faced by all organizations: navigating the tension between innovation and efficiency, by defining “the relation between the exploration of new possibilities and the exploitation of old certainties in organizational learning.”

This is how he described the difference between exploration and exploitation:

Exploration: in this phase, the focus is on experimentation, risk-taking, discovery, and innovation. Paradigms that excel at this stage are usually more flexible and more comfortable with uncertainty.

Exploitation: the focus is on execution, refinement, and efficiency. Paradigms that are most successful at this stage are usually good at weighting options and making optimal choices.

The following example exhibits well the exploration-exploitation notion in real-life: while startups are in the exploration phase, they usually switch to the exploitation phase when they become mature businesses. As we’ll see, this switch — especially when combined with the unwillingness to switch back to exploration mode when needed — can be a costly mistake.

**Figure 2**: The illustration exhibits the opposite phases of the exploration-exploitation concept. When the learning system is exploring, it obtains new knowledge that will serve it further on for efficient exploitation; in the phase of exploration, the exploitation process decreased since both instruments cannot be in a synchronic state. A-synchronicity underlines the suggestive to operate on both strata to gain optimality. Image credit: Author

The exploration-exploitation trade-off is a fundamental dilemma whenever you learn about the world by trying things out. The dilemma is between choosing what you know and getting something close to what you expect (exploitation) and choosing something you aren’t sure about and possibly learning more (exploration).

In artificial intelligence-based systems that incorporate machine learning models, exploration includes elements captured by terms such as search, variation, risk-taking, experimentation, play, flexibility, discovery, and innovation. On the other hand, exploitation includes such elements as refinement, choice, production, efficiency, selection, implementation, and execution. The essence of exploitation is the refinement and extension of existing competencies, technologies, and paradigms. Its returns are positive, proximate, and predictable. The essence of exploration is experimentation with new alternatives.

By only exploring (without exploiting), learning systems expose themselves to new data, new patterns, and trends, and expand their existing knowledge. Exploration has a lot of advantages for any learning system as it broadens the horizons of perspectives and experience. However, acting by only exploring returns uncertainty, riskiness, and very often has negative effects.

In artificial intelligence-based systems that incorporate machine learning models, exploitation includes such things as refinement, choice, production, efficiency, selection, implementation, and execution. The essence of exploitation is the refinement and extension of existing competencies, technologies, and paradigms. Its returns are positive, proximate, and predictable.

By only exploiting (without exploring), learning systems cannot overcome the omnipresent problems of concept drift. Hence, it performs only the process of refinement and implementation of what is already known.

Riskiness Curve

When performing exploration and exploitation in a non-balanced way, the risk for performance degradation of machine learning models increases as time passes. There is a relationship between the level of the balance in exploration-exploitation and the rate of degradation in performance is not linear. Due to the unpredictable behavior of concept drift, the uncertainty level in artificial intelligence-based systems increases while there is a disbalance between exploration-exploitation over a period.

**Figure 3**: Relation between the steepness of a curve and the level of the balance in the exploration-exploitation. Image credit: Author

**Figure 5**: The plot shows an optimal case where the balance between exploration and exploitation is 1. The optimal balance assures linearity in the riskiness curve and its stationary behavior. The fluctuations on the curve are bounded which means that there is bounded riskiness over a period. That would imply that the likelihood of the unbounded degradation on model performance is low. Image credit: Author

In conclusion, to acquire stability in artificial intelligence-based systems and minimize uncertainty levels, machine learning models should adopt the policy of optimal tradeoff between exploration and exploitation under constantly evolving concept drift.

The tradeoff notion manifests the following idea: if only to exploit, eventually there will be a degradation in the quality of fraud detection due to statistics changes in financial data distribution; if only to explore (update model very often), there will be constantly changing classification/regression decision borderline that might miss existing patterns that not yet to evolve. The tradeoff is needed to perform optimally.

The Exploration-Exploitation Tradeoff in Fraud Detection: NICE Actimize

Using innovative technology to protect institutions and safeguard both consumers’ and investors’ assets, NICE Actimize identifies financial crime, prevents fraud, and provides regulatory compliance. It provides real-time, cross-channel fraud prevention, anti-money laundering detection, and trading surveillance solutions that address payment fraud, cybercrime, sanctions monitoring, market abuse, customer due diligence, and insider trading. AI-based systems and advanced analytics solutions find abnormal behavior earlier and faster, eliminating financial losses from theft to fraud, regulatory penalties to sanctions. As a result, organizations reduce losses, increase investigator efficiency, and improve regulatory compliance and oversight.

Nice Actimize research data science teams better understand how exploration and exploitation can be conducted and perform in-depth large-scale applied research on early fraud detection in real-time within a complex environment.

– — –

Join my thriving community of 25,000+ subscribers in The AI Vanguard newsletter! 🔥 If you’re developing an AI or data-driven product/service, seize the opportunity to sponsor an upcoming issue and showcase your business. 🚀 Reach out to dany.butvinik@gmail.com for sponsorship details.

– — –

Reference

[1] Arash Bahrammirzaee. A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert systems, and hybrid intelligent systems. Neural Computing and Applications, 19(8):1165–1195, June 2010.

[2] Dianmin Yue, Xiaodan Wu, Yunfeng Wang, Yue Li, and Chao-Hsien Chu. A review of data mining-based financial fraud detection research. In the 2007 International Conference on Wireless Communications, Networking and Mobile Computing. IEEE, September 2007.

[3] Jarrod West and Maumita Bhattacharya. Intelligent financial fraud detection: A comprehensive review. Computers & Security, 57:47–66, March 2016.

[4] Argyris, C., D. A. Schön. 1978. Organizational Learning: A Theory of Action Perspective. Addison-Wesley, Reading, MA.

[5] Bierly, P., P. S. Daly. 2001. Exploration and exploitation in small manufacturing firms. 61st Annual MeetingAcad. Management, Washington, D.C. (August 3–8).

[6] Littlestone, N. and Warmuth, M.K. (1994) The weighted majority algorithm. Information and Computation 108 (2):212–261.

[7] R. Stengel. Stochastic Optimal Control: Theory and Application. John Wiley and Sons, 1986.

[8] Mizik, N. and Jacobson, R. (2003), “Trading Off Between Value Creation and Value Appropriation: The Financial Implications of Shifts in Strategic Emphasis” Journal of Marketing, Vol. 67 №1, pp. 63–76.

[9] Nair et al. 2018 Nair, A.; McGrew, B.; Andrychowicz, M.; Zaremba, W.; and Abbeel, P. 2018. Overcoming exploration

in reinforcement learning with demonstrations. In IEEE International Conference on Robotics and Automation (ICRA).

[10] Bellemare et al. 2016 Bellemare, M.; Srinivasan, S.; Ostrovski, G.; Schaul, T.; Saxton, D.; and Munos, R. 2016. Unifying count-based exploration and intrinsic motivation. In Advances in Neural Information Processing Systems (NeurIPS).

Written by Danny Butvinik