What’s in a name? A whole lot when it comes to name matching

Anup Gunjan
Tookitaki
Published in
7 min readMar 6, 2024

In 2012, HSBC was fined $1.9 billion by U.S. authorities for allowing Mexico’s Sinaloa and Colombia’s Norte del Valle drug cartels to launder over $881 million through its American operations. Among other violations, the bank also failed to flag transactions involving names of cartel members and drug traffickers on the U.S. government’s blacklist.

Insufficient name screening can have severe consequences, including hefty fines, damage to reputation, and loss of customers. Failing to detect sanctioned individuals may result in the firm being deemed negligent and fined. Moreover, mistakenly identifying a non-sanctioned customer as sanctioned can lead to a poor customer experience, potentially prompting them to take their business elsewhere.

In today’s world, where sanctions and watchlists are tailored to target specific individuals and entities with precision, the importance of accuracy in AML screening cannot be overstated. To meet the increasingly stringent Anti-Money Laundering regulations and address the expanding list of sanctions, organizations must leverage the most advanced name-screening technology available for their sanctions and AML screening processes.

In this article, we delve into the critical name-matching challenges faced by financial institutions. Are these challenges solely attributed to data issues, or do they extend to broader issues concerning accurate matching? Can advanced technologies such as fuzzy matching and artificial intelligence (AI) provide viable solutions to enhance name-screening processes? These questions prompt us to explore the intricacies of name matching within the context of Anti-Money Laundering (AML) compliance and sanctions screening, offering valuable insights for organizations navigating regulatory landscapes.

Fred Wilson vs Fred Weel Sun

Names can be surprisingly intricate. Matching names becomes particularly challenging when dealing with a vast global dataset sourced from government watchlists and adverse media spanning over 180 countries. Factor in the diverse linguistic, cultural, political, and semantic variations, and it quickly transforms into a complex puzzle with minimal chances of successful matches, even when there’s a desire to find them.

Now, consider the added complexity when your main objective is to identify individuals who actively seek to evade detection. Name matching poses distinct hurdles in such scenarios. Spelling variations, initials, nicknames, titles, and names in different languages and scripts all contribute to the complexity. For example, while “Bill” and “William” refer to the same person, they may not be identified as matches through exact name matching, highlighting the need for more sophisticated matching techniques.

Naming conventions and cultural nuances

In certain cultures, names carry deep significance and are passed down through generations, resulting in common names shared by many individuals. For instance, Vietnamese names typically comprise three parts: the family name (surname), the middle name, and the given name. Consider the name “Nguyen Van Anh,” where “Nguyen” represents the family name, “Van” the middle name, and “Anh” the given name. In such traditions, an additional name component may not provide as much identifying information as it typically would.

Let’s delve into specific examples:

Filipino Names:

  • Variation in Surnames: Filipino names include the mother’s maiden surname followed by the father’s surname, which can lead to variability in surnames across individuals within the same family. For example, one individual may have a different mother’s maiden surname compared to their sibling.
  • Spanish Influence: Many Filipino surnames have Spanish origins due to centuries of Spanish colonization, adding another layer of complexity to name matching.
  • Middle Names: Some Filipinos may have multiple middle names, further complicating the identification process.

Saudi Names:

  • Commonality of Names: Arabic naming conventions often result in a higher frequency of common names, making it challenging to distinguish individuals based solely on given names or father’s names.
  • Patronyms: The inclusion of the father’s name as part of the individual’s name (e.g., “ibn Abdullah”) can lead to similarities among individuals sharing the same father’s name.
  • Tribal Lineage: In formal contexts, additional tribal lineage may be included, increasing the length and complexity of names and potentially causing difficulties in accurate matching.

Is Fuzzy matching the answer

Languages, scripts, writing standards, and cultural pressures differ between regions and change over time. These challenges highlight the importance of implementing robust and flexible name-matching algorithms that can accommodate variations in naming conventions across different cultures. Additionally, thorough verification procedures and context-aware matching techniques may be necessary to ensure accurate identification and avoid false positives or mismatches.

While Fuzzy Algorithms can help with some of the real-world challenges like typos, incomplete strings etc., some issues like transliteration problems, nicknames, and spelling differences can’t be mitigated with any fuzzy algorithm.

By increasing fuzziness, legacy screening solutions produce more than 98% false positives, vastly reducing the efficiency and effectiveness of compliance programs. One of the major reasons is that most name-matching systems use lists and rules which struggle with name variety. Many systems have no cross-lingual name-matching capabilities. Hence increased fuzziness generates higher false positives.

Moving beyond fuzzy matching alone

What is needed is combining various fuzzy and name-matching algorithms for fit-for-purpose naming variations. Let us look at various algorithms which can provide better matching results for specific naming techniques:

  • Phonetic encoding: This technique focuses on how names sound rather than how they are spelt. It uses algorithms like Soundex or Metaphone to create codes that represent the pronunciation of names. For example, in the Philippines, where names may have Spanish roots, this method helps match names like “Rodriguez” and “Rorigez” even if they are spelt differently. However, these algorithms are mainly designed for English names, so they may not work as well for other languages.
  • Pattern matching: This method compares names to find similarities, even if they have slight differences in spelling or word order. It’s useful for distinguishing between similar names. For instance, in Saudi Arabia, where many names are common, this method helps tell people apart, like “Ahmed” and “Mohammed.”
  • Token-based matching: This technique breaks names into smaller parts (tokens) and compares them. It’s handy for names with multiple parts or variations. For example, in the Philippines, where people often have multiple middle names, this method helps identify individuals even if their names are arranged differently.
  • Probabilistic matching: This approach uses statistics to figure out how likely it is that two names belong to the same person. It looks at different features of names, like how often certain letters or patterns appear, to estimate similarity. For instance, in Saudi Arabia, where names often include a father’s name, this method helps differentiate between people with similar names.

Advanced algorithms enable your system to accurately match names regardless of different scripts and languages. These algorithms consider various linguistic and contextual factors, ensuring the precision required for global screening. Fuzzy matching can be used along with specialised algorithms to improve the accuracy of near matches.

Coming back to the application of name matching to sanctions and the customer screening process, an effective system must also be highly flexible and configurable. The business requirements and risk appetite of different industries and customers are unique and require the name screening system to be able to customize its output accordingly.

Consider the following:

  • Prioritizing one name component (e.g., surname) over another (e.g., given name) in the matching process.
  • Implementing new matching rules, such as considering “Madonna” as a 99% match with “Ciccone” for a more accurate identification.
  • Assigning higher significance to unique or uncommon names compared to frequently occurring ones. For instance, in a comparison between “Zephyr Smith” and “John Smith,” “Zephyr” would carry more weight.
  • Considering additional identity attributes such as date of birth and nationality to enhance the accuracy of matching.
  • Managing multiple data fields within a specific category, such as incorporating various aliases or nicknames for comprehensive screening.

Finally, the system should provide a detailed scoring and explanation of the score for the investigators to confidently use the results and flag cases.

Identity Matching: Moving Beyond Simple Name Matching

Name-based algorithms are essential but have their limitations. The Limits of Name Matching

Most conventional screening solutions offer no provision for matching multiple attributes such as Date of Birth (DOB), address, country, alias, ID, date of incorporation for corporate entities, phone numbers, and more. As businesses deal with larger data volumes, matching goes beyond names to factors like address and DOB, which is crucial due to name commonality.

Advanced screening systems leverage proprietary search and indexing engines. This approach empowers organizations to merge evidence from multiple matching attributes, resulting in a highly robust, scalable, and intuitive screening process. Such systems can also tackle common challenges such as:

  • DOB Variations: Variations in date formats, such as the European (Day/Month/Year) versus American (Month/Day/Year) conventions, present challenges. Determining the proximity of two dates can be intricate; for instance, distinguishing between 1946 and 1956, in which only one digit differs.
  • Date Complexity: Dates come in diverse formats, with possible variations such as the inclusion of day names or abbreviations for months and days.
  • Address Dilemmas: Address matching introduces its own set of complexities, including differences in formatting, variations in abbreviations, and the presence or absence of postal codes.

FinCense — A Holistic Screening Solution

As the compliance landscape continues to evolve, the need for identity matching is evident. At Tookitaki our flagship solution FinCense also intelligently fuzzy matches seven different customer attributes — name, date, address, nationality, alias etc.

FinCense uses AI and statistical models with carefully curated and tested algorithms, to match names within a language or between different languages such as English, Arabic, Chinese, Korean, Russian, and Persian. It can handle even vague and inaccurate data: partial dates, swapped month and day, names typed into the wrong field, or 2+ word names split between database fields.

In our next instalment, we will dive deep into the additional requirements of building a comprehensive screening program.

You can read my analysis of the key regulatory challenges associated with screening here: Effective Strategies for Sanctions Screening. How to comply?

Stay tuned for the next chapter in our journey towards mastering the art of sanctions compliance in the digital age.

If you are looking to eliminate manual effort from your compliance operations, reach out to us!

Also, you can download our ebook on Navigating Screening Challenges — How AI is transforming screening, which covers key screening components, current trends, and challenges in AML screening.

--

--

Anup Gunjan
Tookitaki

Navigating financial crime compliance | Keeping an eye on how tech is impacting everything around us - let's dive in together.