Why AI for AML has moved from speculation to production

In 2016, there was much more skepticism of using machine learning in anti-money laundering. Now, in 2019, we constantly receive RFP/RFIs for KYC/AML risk screening solutions with embedded machine learning. Regulators already review machine learning systems in production. The Joint Statement released last year clearly stated that AI and digital identity are the future of the compliance industry and encouraged innovation by de-risking experiments with technological vendors within regulatory sandboxes.

At Merlon, we’ve been building and running mission critical machine learning risk screening systems for AML/KYC for a few years. We’ve learned a lot about the issues that arise, and joined a panel hosted by Penny Crossman from American Banker back in March to share our thoughts about AI In Regtech. Below we’ve unpacked the thoughts in this panel in greater detail to share our experiences of production machine learning models for KYC/AML at tier 1 global banks.


Why AI for AML has moved from speculation to production

Big market, perfect timing

About $4.38T (or 5% of global GDP) is annually laundered, according to the UN Office of Drugs and Crime. It’s estimated that the global financial services system spends as much as $100B each year on financial crime compliance, but is still suffering $35B in annual regulatory fines. The escalating operating costs, fines, and risk have finally grown to the point where stakeholders are catalyzed to take new action.

The good news is that the high risk, time allocated, and rising costs all stem from the same problem, and it’s fixable — inefficient legacy risk screening engines: decades-old tech underlies the query engines and rules engines used in KYC and transaction monitoring.

The drive to embrace ML has been faster than expected

When we started Merlon back in 2016 there was much more skepticism of using machine learning in anti-money laundering. Now, in 2019, we receive lots of RFP/RFIs for risk screening solutions with embedded machine learning in KYC/AML with an efficiency business case.

We underestimated how fast the change would come. The mainstream view is that regulators are risk averse, not tech savvy, and therefore afraid of machine learning systems. We found the actual reality to be that regulators are thoughtful, forward looking, already review machine learning systems in production, and encourage and provide sandboxes for ML models.

The Joint Statement released last year is a great example of how the US banking regulators see their role in promoting innovation in the financial crime compliance area. They clearly stated that AI and digital identity are the future of the compliance industry and committed themselves to encourage the innovation by de-risking experiments with technological vendors and accepting failures during the pilot programs. This trend is seen across the globe with agencies like FCA, FINMA or JFSA establishing regulatory sandboxes and fintech licenses to accelerate deployment of innovative compliance solutions in top tier banks.

How the solution landscape is playing out

There’s a large set of standard point solution vendors that cover all forms of KYC and monitoring. These include players like Dow Jones, Thomson Reuters, LexisNexis, Accuity, Actimize, and so on. There are also some newer startup solutions with embedded ML, like Signal8, Rapid7, Arachnys, Quantexa, DDIQ, Thetaray, ComplyAdvantage and many others.

Some of these are pure SaaS and run only in the cloud, and that doesn’t work for big banks right now, but over time winning players in on prem and SaaS will cross-over and everything will be hybrid cloud.

As a lightweight landscape view, the table below shows four primary types of AML risk screening, and the ML models, data vendors, and tech vendors associated with each type of screening.

Why risk screening models offer the biggest wins

Risk models drive workloads

There are many different risk models in KYC and AML, they depend a lot on classical machine learning and NLP problems like entity extraction, entity matching, named entity linking, carpet classification, learning to rank and filter various search queries on top of content and alerts from rules engines.

During client onboarding, ongoing monitoring, payment filtering or monitoring transactions, the workload that analysts and investigators deal with is generated from query engines or rules engines. If the models behind these screening engines are naive, they generate a lot of false positives for analysts to review.

Each risk screening problem is an opportunity for false positives to get through, and the largest opportunity to show efficiency comes in the most complex problems with a composition of models. An example of this is negative media screening where the media screening models break down into many other models — entity matching, topic classification, and modeling relations and accusations.

Model composition drives the biggest wins; the Adverse Media screening example

In negative media screening, the objective is to discover whether the screened entity is involved in some kind of financial crime risk. This mission objective spawns many interesting NLP problems. Some of those are generic like entity detection & resolution, and some are vertical-specific like financial crime risk identification and involvement detection. Our system identifies risky entities (those that are implicated by the surrounding text) directly in unstructured text and this allows us to focus directly on the most important parts of text. Our machine learning models identify text locations related to financial crime risk and place the risk into our FCC taxonomy of approximately 30 crime types. Then we determine the level of involvement of the entity in the financial crime risk — in other words the stage of allegation to which the financial crime risk has progressed with respect to the entity.

People and Infrastructure; Risk models in production

Challenges with AI-powered AML in Production

Slowness in rollouts of new machine learning models can be due to model testing and validation, training and education, or deploying the tech into an on-prem legacy tech environment.

When machine learning is used for serious production financial crime risk models, the models cannot be just rolled out whenever you make changes. The models must be white box, full, with reproducible results by both internal bank analysts and third-party audit firms, and a rigorous record of all model evaluation must be distributed to the bank with each release. These rollouts also present the additional complexity of describing the model updates and any influences on the UI to the financial crime analysts.

Financial crime models are rolled out less rigorously and more slowly when compared to the rigor behind mission critical model rollouts at places like Google, or at hedge funds, or what you’d see for an underwriting model in lending or insurance. This stems in part from the background of financial crimes as part of legal risk rather than more as part of a quantitative risk modeling group. These so-called ‘model validation’ projects can cost hundreds of thousands of dollars in external audit fees from the Big 4, slow down the production release cycle, and provide too little evaluation data to provide real statistical confidence in the performance and properties of the risk models.

Finally there are also challenges with deployment. Cloud on the deployment at large financial institutions doesn’t look feasible in the near-term. This requires complex and thoughtful architectures where machine learning models are always trained in the best environment possible and the complexity of models trained in opaque environments ( — where the Data Engineers don’t have visibility) — is reduced to a minimum. We underestimated how slowly this deploy maturity would change to kubernetes from old school. It seems that increasingly more banks are moving over to things like kubernetes, but it’s not clear that’s available for the mainline mission-critical systems. We’ve had to deploy using some pretty old school tech.

Leveraging golden annotation data is a major challenge with machine learning models for risk screening. There are many issues with leveraging internally annotated data from within the bank, and we have often times found the need to hire independent analysts to produce golden datasets externally for training machine learning models for financial crime. This works well for many problems and KYC where you don’t need proprietary transaction data and can easily look up millions of identities around the world for testing. With transaction data on the other hand, the challenges are training models with PII data which cannot leave the institutional boundary. One area where we have some complexity is with arriving at a canonical definition of materiality of risk. Some banks may define a particular news article as being materially risky about a person, whereas another bank may disagree.

‘It’s a people problem, not a tech problem’

Some people say that automation and machine learning aren’t going to help solve money laundering, since it’s the humans in the loop that are corrupted, such as in the BNP case. We disagree. Automating more workflow and more of the risk screening being ML-powered rather than human-powered leaves much less corruptible surface area. You can’t bribe models and code, you can only bribe people.

Will Automation drive job loss?

Yes, there will be increases and efficiency in risk models, and with it, job loss based on the productivity gains. A lot of high quality financial crime analysts will remain, and ML will bring much more efficient risk screening tools and automate their workflow. They’ll spend more of their time looking at risky content and doing work that looks more like investigations then menial checking of results from the risk screeners.

It’s important to remember, though, they were operating from an inflated baseline. As the scrutiny came down, the systems weren’t efficient enough, so the headcount mushroomed out of control. So what we are seeing now is really more return to something that looks normal, what is much better at finding risk, running on top of more efficient systems.

Merlon Intelligence

Written by

Reduce false positives in KYC risk screening by up to 80% with the only battle-hardened AI-enabled engine that’s rolled out globally with Tier 1 banks.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade