Feature Engineering: 8 Critical Capabilities for Fraud and Risk Management

Jeremy Chen
DataVisor
Published in
5 min readMay 27, 2022

This might sound like a truism for data scientists, but it can’t be stressed enough: Not all data sets are created equal and, most importantly, no two use cases are identical.

Data scientists spend a substantial amount of time transforming data and making it useful. The processes of data transformation are also referred to as feature engineering because logical operations and other functions are applied to features in order to transform the data set with the goal of making it as useful as possible for the specific project at hand.

Feature engineering (a.k.a. feature building or signal creation) is truly part art and part science because it is the perfect intersection of technical expertise and domain knowledge. Without a thorough understanding of the problems that the algorithm is trying to solve and all the factors that play a part in them, it would be impossible to know what questions to ask the data.

In order to be successful within fraud teams, data scientists must be empowered with advanced tools for feature engineering. This article focuses on the eight most important capabilities that they require.

  1. Real-time feature computation on demand

Modern fraud data scientists must be equipped with tools that allow them to compute features on the spot to avoid an issue referred to as staleness. Staleness occurs when data takes time to refresh, and features are calculated in old sets that no longer reflect reality. Staleness causes poor performance of the ultimate fraud detection actions and poses a serious concern for organizations.

In cases such as where bot attacks are present, systems must be able to capture and compute real signals within milliseconds and at an event level in order to stay ahead of fast-moving attackers. Vendors that offer “near real-time” capabilities fail to deliver the detection precision required by modern businesses that risk falling prey to more sophisticated fraudsters.

2. Native integration between features and rules and models

Feature computation is critical, but good features are of no use unless they are properly integrated with the models and business rules that ultimately deliver the detection capabilities.

Systems where rules and models are built in unison from the start outperform their peers by delivering detection capabilities without the errors and delays that may result from integration issues between data mapping and data pipelines at the future serving layer and the business rules and model training environment.

3. Fraud and compliance functionalities by design

Marketing, logistics, and other specialists use feature engineering too, but their use cases and goals are not the same as fraud leaders’. Fraud teams face unique challenges, and having a feature engineering platform that is purpose-built with the sole purpose of overcoming them can mean all the difference in the world.

Sure, off-the-shelf products can be adapted into fraud environments, but it generally makes more sense to allow fraud experts to focus on what really matters instead: developing and improving their fraud and risk strategies.

For example, having fraud-specific feature packages that are ready to use from day one can enable fraud teams to run their detection models and rules logics with speed and confidence from day one, instead of having to build everything from scratch.

When choosing a feature engineering platform, it is important to look for a provider that has deep domain expertise to build on top of years of experience and avoid all the missteps and course corrections that product development naturally entails.

4. Multidimensional and native graph features

Real life is complex and if data sets intend to reflect it, they must be too. Modern fraud teams need:

  • Multidimensional graph features that can leverage multiple vectors including history, geolocation, relationships between users to combat sophisticated crime rings.
  • Graph databases that can use graph structures for semantic queries with nodes, edges, and properties to represent and store large quantities of data.
  • Intuitive graph structures that can portray the complex relationship between entities in innovative and actionable ways that make feature engineering more efficient.

5. Fast backfill and deployment without IT

In most modern organizations, IT teams’ work is spread across different functional areas. Fraud data scientists must be empowered to deploy the features they have prepared and warm them up through backfills without having to depend on IT support. The result: Agile fraud teams.

6. Business-user friendly UI with governance

No matter how much modern feature engineering solution developers strive to create simple products, the fact of the matter is that their product is a technical one that is built for a technical audience: fraud data scientists. These specialists often have many years of experience and are comfortable with concepts and terminology that is not necessarily shared by other members of their teams.

In order to fight fraud more effectively, modern enterprises need feature engineering solutions that are delivered in tandem with business-user friendly interfaces, especially in connection with governance and performance monitoring capabilities.

7. Data scientist language support

Trust modern solutions that data scientists can use in the programming languages they are already proficient in.

With data scientists’ salaries at an all-time high, it makes little sense to hire highly skilled workers and put them through the learning curve of new programming languages before they can start delivering value for their teams.

Furthermore, firms concerned with vendor lock-in should steer clear of feature engineering solutions that use proprietary programming languages because this characteristic acts as a clear increase in switching costs.

Prefer solutions that support Python, Java and SQL to define features and save yourself and your data scientists a lot of effort.

8. Consistency in computation results across development stages and between batch and real time

Modern data science teams need reliable systems that deliver results their entire teams can trust, and nothing kills trust more than lack of consistency. It is critical that feature computation be supported with the same technology and quality in a development environment or in production. Offline or in real time too.

Conclusion:

All in all, feature engineering plays such a relevant role in modern fraud prevention because the data it transforms must lay the foundation of the detection efforts and the ultimate decisions taken to alleviate fraud. Investing in the right solution for fraud-specific purposes can really pay off, especially when thinking in the long term and at scale.

My team at DataVisor develops cutting-edge technology to deliver on these capabilities and bring customers’ fraud and risk management strategies to the next level with a focus on addressing real business challenges.

Many thanks to Eduardo Guraieb for your original ideas for this article.

--

--