Building Trust in a Digital World: The Role of Machine Learning in Behavioral Biometrics

Javier Liébana
Feedzai Techblog
Published in
12 min readJun 21, 2024

In the world of financial services, the bank or financial institution’s relationship with the customer relies on digital trust, which is anchored in two fundamental principles. First, it must ensure the person engaging through digital banking channels is genuinely the individual they claim to be. Second, it must confirm that this person is authorized to complete the intended financial transaction.

Addressing these crucial requirements is the core mission of Feedzai’s Digital Trust solution. The solution collects and analyzes comprehensive user behavioral data, scrutinizes device information for potential threats, such as malware attacks, and evaluates contextual factors like network, operating system, or browser information to gain a complete understanding of the user’s environment. However, the high volume and heterogeneous nature of the collected data, among other challenges, makes detecting potential fraudulent sessions with high accuracy a formidable endeavor.

In this blog post we explore how a new machine learning (ML) model that performs a continuous evaluation of collected data and leverages insights from previous frauds to vastly improve Digital Trust’s fraud prevention capabilities. We will start with an introduction to technical details behind our Digital Trust solution, going into the challenges of fraud detection and prevention. We continue by explaining how we can apply ML to boost fraud detection and how we deployed the new Fraud model to dozens of Feedzai customers.

Table of Contents

1. Digital Trust data collection
- 1.1 The user journey
- 1.2 Behavioral biometrics data
- 1.3 User’s behavior
- 1.4 Device and network data
2. Challenges to detect fraud in Digital Trust
3. Machine Learning for Digital Trust
- 3.1 The holistic approach
- 3.2 New Fraud Model
4. Deploying the model
5. In summary

Digital Trust data collection

To better identify the challenges that are typically faced when designing a fraud prevention system based on Digital Trust, first we need to understand the user’s journey and the data that the system is able to collect.

The user journey

The user’s journey commences the moment they register on the bank’s website or application. During this registration phase, the user enters their data, triggering the initiation of Digital Trust’s protective measures, even if the user isn’t fully registered yet.

At this instant, the Digital Trust system can already:

  • Detect whether the device used is associated with previous frauds;
  • Identify if the network originates from a location with elevated fraud rates;
  • Notice any deviations in the user’s behavior, such as pasting their name instead of typing it.

Based on these illustrative examples, we can promptly inform the financial institution and issue an alert regarding the creation of a new account if malicious intent is suspected.

Likewise, when a user accesses either the website or the mobile application, they undergo a login procedure. During this phase, we can analyze the user’s typing patterns and how they compare to previous sessions. We can also determine if the device used by the current user has been shared by previous users. As the user progresses through various activities, such as reviewing past transfers and initiating new transactions, Digital Trust continuously monitors their actions, proactively identifying and reporting any suspicious behavior.

To perform the analyses and detections, the Digital Trust system collects different types of data.

Behavioral biometrics data

Regarding biometric data, Digital Trust can handle various types of data depending on the device. For desktop-like devices, the system collects biometric events, including keystrokes, which involve monitoring the duration of key presses, the specific type of key (e.g., a number, letter, or special character), and the typing speed. The system also tracks mouse-related events, encompassing mouse movements and mouse clicks. These biometric events allow the Digital Trust system to verify user identity by comparing current patterns with previous ones, detect the use of Remote Access Trojans (RATs), identify bot usage or behaviors that are not humanly possible, among other unusual behaviors.

In the case of mobile-like devices, the interaction differs, and consequently, so does the biometric data collected. On these devices, primary interaction occurs on the device’s screen. The Digital Trust system captures touch events, recording position and pressure strength, as well as typing on the virtual keyboard and screen gestures like strokes or pinch movements. Additionally, the mobile devices usually have other sensors, such as orientation of the screen or gyroscope movements, providing more usage information. Once more, this serves as valuable data for confirming user identity and behavior verification.

User’s behavior

More information is derived from user operations on the bank’s digital platform. Specific sets of views within the platform may be linked to critical actions, such as initiating a transaction or altering the contact information needed for two-factor authentication.

Conversely, actions like adding a new frequent beneficiary or adjusting particular settings may signify an atypical user journey. These signals must not only be reported to the bank or financial institution analysts but also incorporated as features within the Digital Trust detection systems. This information is harnessed by Digital Trust to proactively mitigate and prevent malicious attacks.

Device and network data

Another source of information involves data derived from the user’s device and connection. On the one hand, in the case of accessing the bank’s website, Digital Trust gathers the information related to the browser and its configuration. On the other hand, in the case of using the bank or financial institution’s application, the Digital Trust integration extends this data collection with software details like the operating system version and default language, as well as physical data such as battery availability or the current telephone line status (e.g., if the user is making a call simultaneously).

Regarding network information, we can deduce specific details from the user’s connection, such as the current Internet Service Provider (ISP), network type, or the user’s approximate location. All this information empowers our detection mechanisms to identify both unusual behaviors or configurations and establish typical fraudulent patterns.

In conclusion, Digital Trust seamlessly integrates with financial institutions’ digital platforms, continuously monitoring user behavior and digital environments upon login to detect anomalies and suspicious activities. It gathers diverse data, including biometric data from keystrokes and mouse movements, and leverages this information to verify user identity and identify unusual behavior. User operations on the platform, such as critical actions or atypical user journeys, provide additional signals for the system to proactively prevent malicious attacks. Data derived from the user’s device and connection further aids in recognizing unusual behavior or configurations and establishing typical fraudulent patterns.

Challenges to detect fraud in Digital Trust

Several challenges related to the data and other details exist when faced with the task of developing a fraud detection system in Digital Trust.

First and foremost, minimizing potential user friction is paramount. On the device side, information collection should not impact the user experience. Simultaneously, device resources such as CPU time, bandwidth, or battery usage should be minimally affected. On the detection side, non-fraudulent users should operate without impediment, therefore keeping false positives to a minimum, while detecting as many fraudulent cases as possible.

Another significant challenge is the volume and diversity of information. As explored before, data sources encompass static user information (e.g., the current device), dynamic information (e.g., network connectivity or available device battery), biometric data (e.g., keystrokes, mouse movements, or mobile gestures), and behavioral data (e.g., time spent on each view and the sequence of user actions).

This heterogeneous nature of the data, coupled with varying data scales (for example, a user may have a single device but generate thousands of mouse events), creates a complex scenario. This is where data feature engineering assumes a pivotal role. The primary objectives are twofold: maximize detection capabilities while minimizing processing, networking, and user interference costs.

Another challenge is the requirement of continuous evaluation. When dealing with transactional fraud, the precise moment for making a prediction is clear: it must occur when the user initiates the payment, at which point the available information is leveraged to produce a prediction. However, Digital Trust is designed to protect both the user and the financial institution throughout the entirety of the user’s session. This necessitates the evaluation of the session at each stage of the user’s journey, striking a balance between alerting as early as possible to preempt potential malicious actions and the risk of alerting with incomplete session information.

Finally, there is the issue of class imbalance. Our detection system must deal with millions of instances in the legitimate class, as opposed to just hundreds or a few thousands of attacks. This places substantial pressure on the detection models: both during the inference phase and during the training period. To address this issue, our Data Scientists apply different advanced mitigation techniques.

Machine Learning for Digital Trust

The holistic approach

As we’ve discussed in previous sections, the identification of fraudulent and high-risk activities poses a formidable challenge. The volume of data, coupled with pronounced class imbalances, with varying time resolutions and data modalities, requires employing advanced techniques.

It’s clear that there is no one-size-fits-all solution to these complex issues. The strength of the Digital Trust solution lies in its holistic combination of diverse feature sets, combined to enhance detection efficacy. As an example, when detecting Remote Access Trojans, the system combines behavioral biometric data alongside the ability to identify the installation of malicious software. The combination of these features ensures a comprehensive security framework that adapts to the specific detection needs.

Traditionally, the solutions in this space are focused only on very basic and simple rules for each of the detection signals. The problem of this approach, even if it initially offers simple explainability, is its limited capacity. The solution needs to be deployed on different geographies with their own particularities, it has to cover multiple use cases, and it has to deal with the evolution of the fraud. All of these issues create the need for more complex rules. At that point, any benefit of the rule system starts to crumble: rules become too hard to maintain, too complicated to understand, and even its computational performance may be affected.

In our case, Digital Trust leverages a wide spectrum of techniques: from expert defined rules, basic yet effective heuristics, to state-of-the-art ML models. The rules encapsulate the expert knowledge of fraud detection; heuristics allow simple and agile solutions for simple cases; and ML models are the key for the detection of more complex and data-intensive scenarios. This multifaceted approach enables us to address the complexity of the threat landscape.

Upon the completion of individual detections, this information has to be indicated in the risk associated with the session. To facilitate this, the Digital Trust system offers the capability to formulate a customized risk strategy that aligns with the specific needs and preferences of each institution. Banks and financial institutions can establish their own ruleset, building upon the default rule set provided by Digital Trust experts. This dynamic adaptability ensures that the solution remains agile in addressing evolving threats and regulatory requirements while maintaining a strong foundation of security.

New Fraud Model

In advancing our custom risk strategy, we at Feedzai have successfully developed and seamlessly integrated a new Fraud ML Model for Digital Trust. This latest model utilizes the comprehensive and heterogeneous array of signals and data from Digital Trust, combined with several advanced techniques developed over the years at Feedzai, to detect and alert potential fraudulent sessions.

The model is able to use combinations of the multiple signals available, which can be more informative than looking at each signal separately. For example, we can process the IP information to detect the origin of the network requests, understanding the spatial location of that user; this information can be combined with other risk indicators, such as Account Take Over (ATO) indicator; in this case, a connection from an unusual country and an ATO indicator is potentially riskier than each of them separately.

Besides the different types of signals leveraged by the model, we use advanced sampling techniques to deal with the high volume and extreme class imbalance, as well as lightweight and efficient machine learning algorithms to ensure both the high detection performance and the low latencies required to make multiple predictions for tens of millions of Digital Trust sessions per day. Our model continuously evaluates the current users’ sessions and, once it detects a potential fraudulent session, it promptly reports this information.

The reporting of this detection is done through a risk indicator called “Potential Fraud Risk”. This indicator may be combined with other custom detections specified by the financial institutions to not only improve the detection efficiency but also to tailor it to their unique business scenarios. The new Fraud Model fuels the Potential Fraud Risk indicator that integrates seamlessly with other existing detection systems.

This new model is another example of Feedzai’s all-encompassing and holistic ML methodology that effectively utilizes heterogeneous Digital Trust data and complements other existing risk indicators. Since this model was deployed, we have seen a significant uplift in fraud detection performance across a diverse set of financial institutions.

Deploying the model

Creating and validating the new Fraud Model is only the first part of Feedzai’s Research work. Deploying it and ensuring that it satisfies the strict latency requirements is, in itself, an often underlooked task. This section describes the major challenges of serving the new Fraud Model and our solutions.

As previously mentioned, the Fraud Model must continuously evaluate every session for each user of our Digital Trust customers. This results in an enormous volume of evaluations. Unlike typical transactional fraud detection models, which assess transactions individually, Feedzai Digital Trust continuously evaluates the potential risk of every session, significantly increasing the number of requests to our models. The new Fraud Model reviews nearly ten thousand requests per second.

Furthermore, the inference process must be swift: if a session appears suspicious, the fraud indicator must be communicated to the bank immediately. This requirement places significant demand on our system to minimize latency and overhead.

To meet these demands for high throughput and low latency, we developed a completely new ML Infrastructure that embraces cloud technology and is capable of managing this load with ease. Our new system is designed to scale horizontally and to serve the model and adjust resource consumption as needed. This new infrastructure enabled the deployment of the new Fraud Model, ensuring the high throughput and low latency required for continuous fraud risk assessment.

Our new ML Infrastructure leverages Kubernetes to orchestrate our services more effectively, ensuring that the Fraud Model is always available, even under varying loads. Kubernetes facilitates seamless scaling, allowing us to dynamically adjust the number of instances based on real-time demand, thus optimizing resource utilization. Moreover, its self-healing features automatically restart failed containers, replace them, and reschedule them to new hosts if needed, ensuring high availability and reliability. This integration not only streamlines deployment processes but also significantly enhances the performance and resilience of our fraud detection services, enabling us to maintain low latency and high throughput, critical for real-time fraud risk assessment.

Alongside Kubernetes, our new ML Infrastructure enables on-demand, independent deployment of models. This modular strategy greatly enhances our system’s adaptability and speeds up the development cycle. When a new model iteration is ready, it can be deployed without affecting the operation of existing models, minimizing potential disruptions.

From the instrumentation point of view, for real-time monitoring and logging, ML Infrastructure integrates with standard open-source tools such as Prometheus for metrics collection and Grafana for data visualization, alongside centralized logging solutions. These tools provide us with granular insights into system performance and behavior, enabling proactive issue resolution and optimization of the experience.

In addition to these monitoring and visualization tools, our infrastructure includes a comprehensive Event Collection system designed to capture and store all interactions within our platform. This system logs every request and response, encompassing both the incoming features and the resulting inferences. The stored data becomes a valuable asset for ongoing model monitoring, offering a view of model performance in real time.

Furthermore, this repository of interaction data is key for future model retraining efforts. By analyzing historical interactions, we can identify patterns, anomalies, and areas for improvement, ensuring that our models evolve in line with changing patterns and emerging threats. This capability not only enhances model accuracy over time but also contributes to a continuous cycle of improvement and adaptation, keeping our systems at the forefront of fraud detection technology.

Together, these technologies form a robust infrastructure that supports our Fraud Model’s high-performance requirements, ensuring that our Digital Trust solution is both efficient and reliable.

In summary

In this blog post, we have explored the intricate landscape of Digital Trust and its role in ensuring secure online financial transactions and how Feedzai’s new Fraud Model further enhances its fraud detection capabilities.

Our solution stands out for its holistic and adaptable approach. It combines various and diverse data sources, including biometric and behavioral data, device and connection information, and user interactions, with advanced machine learning techniques. This multifaceted strategy not only aids in the robust detection but also enhances the overall security of the digital banking experience. Moreover, Digital Trust’s capability to create customized risk strategies allows financial institutions to tailor their security measures, ensuring a dynamic and responsive defense against evolving threats.

--

--