Matching Algorithms

Peter Bruce
7 min readMay 30, 2019

Some applications of machine learning and artificial intelligence are recognizably impressive — predicting future hospital readmission of discharged patients, for example, or diagnosing retinopathy. Others — self-driving cars, for example — seem almost magical. The matching problem, though, is one where your first reaction might be “What’s so hard about that?” For example, to take the application of finding duplicates, if a customer by the name of Elliot Sanderson places an order at a web site, the task of matching him up to the Elliot Sanderson already on the customer list would seem easy.

A bit of reflection quickly reveals the difficulty involved — suppose it is a common name, like Robert Smith? Or suppose the name matches but the email doesn’t? Or suppose the first name is missing and there’s only an initial?

Matching two potentially identical individuals is known as “entity resolution.” One company, Senzing, is built around software specifically for entity resolution. Other matching problems seek compatibility between two different people or entities. Both are best done using machine learning rather than simple rule-based logic. The best-known compatibility matching problem? Online dating! Entity resolution is used in

  • Marketing (merging duplicate customers into the same record)
  • Law enforcement (is person “X” the same as the known criminal “Y”)
  • Financial compliance, transportation security (is person “X” on a watch list)

--

--

Peter Bruce

Founder and Chief Academic Officer, The Institute for Statistics Education at Statistics.com. Online courses, certificates, degrees in analytics & data science.