Unlock Hidden KYC Connections Using Graph Analytics

Vladimir Salin
Feb 10, 2018 · 6 min read

Complying with AML/KYC requirements is no longer a mere bureaucratic formality for the sole purpose of obtaining an operating license from the relevant competent authorities. Nowadays, regulators expect a demonstrable risk-based AML approach regardless of business size or nature, as long as it involves an asset of intrinsic value. Therefore, AML/KYC compliance has become an essential step for any startup or established business in the FinTech area. Blockchain and crypto-based startups targeting Initial Coin Offerings (ICOs) are not an exception here.

Image for post
Image for post

The AML/KYC market is aflood with products purporting to offer simple straightforward KYC checks. However, they often and summarily neglect vital information about closest relatives or business partners of a sanctioned or politically-exposed person. How does it sound, ethically and legally, if your ICO token is purchased by the spouse, the son, or the business associate of a well-known terrorist? We can agree that such short-sighted screening approach is not only inadequate but also would not survive scrutiny of any regulator, especially when this type of data is more than often available in the public domain.

For a handful of screening requests, a skilled team of analysts would be able to paint an accurate picture of such complex relationships and associations. However, this approach does not scale well to handle tens, let alone, hundreds or thousands of screening requests per day. In this article, we will explore how such a comprehensive screening process can be automated and made more ‘intelligent’. To implement a reliable solution, we would need a tech stack that is able to manage our sanctions lists, ‘understand’ associative information therein, and quickly search highly-connected datasets.

Graph Model

Suppose we have a list of sanctioned entities that includes a good mix of individuals and companies. Draw a circle around each of the entities at hand. Then draw lines between entities to denote the apparent relationships between these entities e.g.

Image for post
Image for post
A sample graph model

Person A is of Company X. Let’s also add direction to these lines to show afferent (i.e. outbound) relationships. Et voila, we have a directed graph structure! Now, we are able to explore/traverse paths and ultimately expose any hidden or indirect connections.

However, in today’s world of complex geopolitics, we cannot overlook the fact that certain countries are perceived to be riskier than others. To construct a more realistic model, let’s introduce countries as well as any applicable relationships to our graph.

Image for post
Image for post
A sample model extended with country connections

Technology Stack

To determine a reliable graph-based solution, we require a tech stack that would enable us to effortlessly hold and query millions of nodes. In effect, an adequate sanctions/watchlist dataset could easily consist of millions of entities and their respective associations. When converted to a graph and supporting reference data, such as countries, is added, the graph size could easily swell to 10s, if not 100s million elements.

A quick internet search for state-of-the-art graph databases will reveal that there are not many readily-available enterprise-grade solutions that are able to handle such volumes. We compared four of the leading and commonly-used solutions, namely, DataStax, Neo4j, OrientDB, and Titan. Hereunder, we present a summary of our findings:

Rough comparison for DB engines we did for our case

Some of our key takeaways are:

  • DataStax has a proven track record among large enterprises.
  • Neo4j is somewhat a new comer. However, it already has a good presence on various platforms including Docker containers. It also available on leading Infrastructure as a Service (IaaS) platforms such as Microsoft’s Azure Cloud Marketplace and Graphene’s managed service.
  • Neo4j has great support among popular frameworks such as the Spring Framework, Grails, Django, NodeJS and so forth.
  • OrientDB and Neo4j are increasingly offering similar capabilities and performance. However, Neo4J’s intuitive Cypher query language made it our choice.

Proof of Concept

Let’s see how our proposed graph model can be implemented using Neo4j. First, let’s create a couple of watchlist entities using the following Cypher statements:

Let’s then add relationships to the newly-created nodes:

In a very similar fashion, let’s complement our graph with a couple of country relationships (for brevity, we assume that some countries are already present in the graph with their respective names and ISO codes):

Let’s check out how the complete graph looks so far:

Image for post
Image for post
Neo4j representation of the graph model

We can clearly see that we have been successful at replicating our target graph model. Let’s now see how we can go about screening an individual and identify any relevant connections:

Image for post
Image for post
Neo4j results for querying Roe

Now, it is quite clear that there is a sanctioned individual, named “Jane Roe”, whose was Iran and thus would require Enhanced Due Diligence.

Real Data

Needless to say, the aforementioned example is purely for illustrative purposes. However, it is not far off from a real life scenario. With a comprehensive dataset, such as SwiftDil’s extensive database, simple queries can unlock the most obscure and opaque relationships and connections:

Uncover Closest Associates

Image for post
Image for post
Neo4j results for querying closest associates

Highlight Country Connections

Image for post
Image for post
Neo4j results for querying the countries connection

Conclusion

In this article, we have covered simple and yet very powerful graph concepts and we explained how they can be applied to KYC screening. These concepts can be evolved and the graph model can be further improved based on business needs. For instance, a scorecard could be built on top of Neo4j to improve the accuracy of matches and widen the scope of risk indicators taken into account. An advanced scorecard may also leverage Machine Learning and advanced text/phonetic search algorithms. At SwiftDil, we have employed state-of-the-art techniques to implement a powerful scorecard around Neo4J, which has yielded to unparalleled matching rates.

References

  1. How to import a Bitcoin Blockchain into Neo4j
  2. Wrangling 2.6TB of data
  3. The ICIJ Releases Neo4j Desktop Download of Paradise Papers

Originally published at blog.swiftdil.com on February 10, 2018.

SwiftDil

One-stop AML and KYC compliance service

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store