Meta-Learning Structure For Fraud Detection I

Published in

Analytics Vidhya

9 min readAug 11, 2021

In traditional Machine Learning domains, we usually take a huge dataset that is specific to a particular task and wish to train a model for regression/classification purposes using this dataset. Which is radically far from how humans take advantage of their past experiences to learn very quickly a new task from only a handset of examples.

This is a very involved process. It contrasts with how humans take in new information and learn new skills. Human beings do not need a large pool of examples to know, we learn very quickly and efficiently from a handful of examples. Taking inspiration from how human beings learn, meta-learning attempts to automate traditional machine learning challenges. It seeks to apply machine learning to learn the most suitable parameters and algorithms for a given task.

It distributes related tasks — and uses this experience to improve its future learning performance. This ‘learning-to-learn can lead to a variety of benefits such as data and compute efficiency, and it is better aligned with human and animal learning, where learning strategies improve both on a lifetime and evolutionary timescales.

The performance of a learning model depends on its training dataset- the algorithm and the parameters of the algorithm. Many experiments are required to find the best performing algorithm and parameters of the algorithm. Meta-learning approaches help find these and optimize the number of experiments. This results in better prediction in short.

What is Meta?

Meta refers to a level above.

Meta typically means raising the level of abstraction one step and often refers to information about something else.

For example, you are probably familiar with “meta-data,” which is data about data.

You store data in a file and a common example of metadata is data about the data stored in the file, such as:

The name of the file.
The size of the file.
The date the file was created.
The date the file was last modified.
The type of file.
The path to the file.

Now that we are familiar with the idea of “meta,” let’s consider the use of the term in machine learning, such as “meta-learning.”

What is Meta-Learning?

Formally, it can be defined as using metadata of an algorithm or a model to understand how automatic learning can become flexible in solving learning problems, hence improving the performance of existing learning algorithms or to learn (induce) the learning algorithms itself.

Meta-learning is an advanced field of artificial intelligence where automatic learning algorithms are applied to acquire learning experience for a set of learning algorithms to improve learning performance. One of the popular meta-learning methodologies is based on cross-validation, especially for selection processes among different machine learning models. However, the challenge is that it is very time-consuming to do cross-validation among models in large data sets, especially in financial big data with high noise.

Meta-learning impacts the hypothesis space(set of all hypotheses that may be returned by a machine learning model) for learning algorithms. This might be through the tuning of hyperparameters or the selection of features. It may also change an algorithm’s learning rules by altering how the algorithm searches the hypothesis space.

Meta-learning provides an alternative paradigm where a machine learning model gains experience over multiple learning episodes. Very simply defined, meta-learning means learning to learn. It is a learning process that applies to understanding algorithms to metadata. Metadata is data that describes other data.

It takes advantage of the metadata like algorithm properties (performance measures and accuracy), or patterns previously derived from the data, to learn, select, alter or combine different learning algorithms to effectively solve a given learning problem.

Each learning algorithm is based on a set of assumptions about the data, which is called its inductive bias.

The process of learning to learn or the meta-training process can be crudely summed up in the following diagram.

The three main steps in meta-learning.

A dynamic inductive bias: Inductive bias is the set of assumptions a learning algorithm uses to make predictions. This is when the algorithm is given inputs it has never come across. Dynamic bias induction refers to where bias is constructed as a function of the learning task. This simply means that the inductive bias of a learner is altered to match a given task. Essential aspects of the learner can be changed to achieve a dynamic inductive bias. These aspects include the representation of the hypothesis or parameters.
Extracting useful knowledge and experience from the metadata of the model: Metadata consists of knowledge about previous learning episodes and is used to efficiently develop an effective hypothesis for a new task. This is also a form of Inductive transfer.
AI can master some really complex tasks but they require massive amounts of data and are terrible at multi-tasking. So it’s important for AI agents to “learn how to learn” to gather more knowledge and become defter.
Inclusion of a learning sub-model.

Approaches to Meta-Learning Algorithms

Optimized meta-learning

A hyperparameter is a parameter whose value is used to control the learning process. It is a parameter that is defined before the start of the learning process. Hyperparameters have a direct impact on the quality of the training process. Hyperparameters can be tuned. An example of a hyperparameter is the number of branches in a decision tree.

A good number of machine learning models have many hyperparameters that are optimizable. We mentioned that hyperparameters have a great impact on the training process. It means that the process of choosing hyperparameters dramatically affects how well an algorithm learns.

However, with the ever-increasing complexity of models, more so neural networks, a challenge arises. The complexity of models makes them increasingly difficult to configure. Consider a neural network. Human engineers can optimize a few parameters for configuration. This is done through experimentation. Yet, deep neural networks have hundreds of hyperparameters. Such a system has become too complicated for humans to optimize fully.

There exist many ways to optimize hyperparameters. We shall give a simple definition of a couple of methods and cover them in detail in a future article.

Grid Search: This method makes use of manually predetermined hyperparameters. The group of predetermined parameters is searched for the best-performing one. Grid search involves the trying of all possible combinations of hyperparameter values. The model then decides the best-suited hyperparameter value. However, this method is referred to as traditional since it is very time-consuming and inefficient.

Random Search: Grid search is an exhaustive method. It involves the tying of all possible combinations of values. The random search method replaces this exhaustive process with a random search. The model makes random combinations and attempts to fit the dataset to test for accuracy. Since the search is random, there is a possibility that the model may miss a few potentially optimal combinations. On the upside, it uses much less time compared to grid search and often gives ideal solutions. Random search can outperform grid search. This is under the condition that a few hyperparameters are required to optimize the algorithm.

We shall cover these two and other methods of optimization in a different article. But for now, to learn more about grid search and random search, check out this conceptual guide to hyperparameter tuning.

Few-shot Meta-learning

Deep learning algorithms are great at carrying out one task by using a sizeable dataset. Even so, it is desirable to be able to train a neural network to learn multiple tasks using a handful of data examples per task. Few-shot meta-learning algorithms help us fulfill this desire.

The purpose of few-shot meta-learning is to train a model that can rapidly adapt to a new task. This is to be achieved using a handful of data points and iterations in training. A meta-learning stage is used to train a model on a given number of tasks. The expectation is that the trained model will quickly adapt to new tasks with a few trials or training examples. Entire tasks are taken as training examples in meta-learning.

An example of few-shot meta-learning is the use of memory-augmented neural networks.

Gradient descent minimizes a given function by moving towards the direction of the steepest descent iteratively. It is used to update the parameters of a model. Traditional gradient descent networks need tons of data to learn. The training process is extensive and iterative. The models have to learn their parameters again to quickly add new information when exposed to new data. This is a very inefficient process.

Compared to conventional models, neural networks with augmented memory capacities can speedily encode and get new information. Memory-augmented neural networks can make sense of new data. They can leverage the data to produce highly accurate predictions. This is using only a few training examples. An example of architecture with augmented memory is the Neural Tuning machine. Neural Tuning Machine refers to an algorithm with the ability to store and get information from memory. The NTM augments a neural network with external memory. The link above provides a detailed description of NTM architecture.

Model agnostic meta-learning

Model agnostic meta-learning (MAML) refers to a framework that applies to any model that is trained using gradient descent. We can argue that this is similar to or a variation of few-shot meta-learning. Like few-shot meta-learning, the goal is to learn a general model that can simply undergo fine-tuning for several different tasks. This includes a scenario where training data is insufficient. Let’s visualize the MAML framework.

MAML approach

From the image, the symbol theta represents the parameters of the model. The thick black line represents the meta-learning stage. If we have tasks 1, 2, and 3 that differ from each other, a gradient step is taken for the three of them. The grey lines represent this.

MAML gives a good initialization of the parameters of a model. As a result, attaining quick and optimal learning on a new task with a handful of gradient steps. More on model-agnostic meta-learning can be found in this paper. The paper also offers an in-depth explanation of the image above.

MetaFraud: Meta-Learning for Fraud Detection

Fraud is a significant business risk that must be mitigated. A well-designed and implemented fraud detection system, based on the transactional data model of operational systems, can significantly reduce the chance of fraud occurring within an organization. The sooner that indicators of fraud are available, the greater the chance that losses can be recovered and control weaknesses can be addressed. The timely detection of fraud directly impacts the bottom line, reducing losses for an organization. And effective detection techniques serve as a deterrent to potential fraudsters.

Financial fraud can have serious ramifications for the long-term sustainability of an organization, as well as adverse effects on its employees and investors, and on the economy as a whole. Several of the largest bankruptcies in U.S. history involved firms that engaged in major fraud. Accordingly, there has been considerable emphasis on the development of automated approaches for detecting financial fraud. However, most methods have yielded performance results that are less than ideal. In consequence, financial fraud detection continues as an important challenge for business intelligence technologies.

Given increased regulatory requirements and compliance demands, the decision is no longer if an organization should implement a complete fraud detection and prevention program, but rather how quickly technology can be leveraged in detecting financial fraud. The use of technology is essential for maximizing the efficiency and effectiveness of fraud detection and Meta-Learning is here as a game-changer.

In another article, we’ll be exploring the framework for detecting financial fraud with learning-to-learn.

Stay tuned!

Connect with me on LinkedIn: https://www.linkedin.com/in/grace-kolawole/ and Twitter: https://twitter.com/Graceblarc_