Anti-Fraud, Spam Filters, & Anti-Virus Systems will never be as Accurate as Facial Recognition — and that should raise some concern

Photo by Alessio Ferretti on Unsplash

For as long as weapons have existed, there has been defensive systems developed concurrently. The sword caused the shield, the nuke caused the bunker, and more recently the virus has started the McAfee.

For those not as familiar with cyber security vernacular, an Intrusion Detection System (I.D.S.) is responsible for listening on a particular network for signs of hacking activity. These systems rely on historic data, a.k.a. evidence from known attacks.

These systems are initially defeat-able. Criminals and state sponsored hackers use novel new exploits, called 0-days (zero days since the manufacturer has been aware of the bug) to break into software. Hackers find novel ways of disguising their tracks, like hacking IoT devices such as printers to leverage their way into networks.

As a result of this digital cat and mouse game over past decade, the White Hats of the world in anti-virus and anti-fraud have a huge trove of data that describes in multiple dimensions.

The organized centralized good guys finally have a weapon the mostly isolated silo’d bad guys don’t — Big Data.

Photo by Markus Spiske on Unsplash

Machine Learning and AI are the newest methods sophisticated and developed defenders use to identify malicious intent.

When it comes to anti-fraud systems, Machine Learning can be used to spot fishy transactions with pattern recognition. By building “Decision Trees” from historic fraudulent transactions, Data Scientists can mine for “features” that describe to an model how a bad guy acts.

With enough data and enough good features, an Machine Learning Model can successfully predict money laundering, theft and account hijacking instantaneously. Rather than waiting for you to report your Credit Card as stolen, banks and credit card companies are freezing credit cards and alerting customers promptly.

This is a truly remarkable feat. However, this does not mean that these systems are not fallible and without weakness.

The first thing to understand about Machine Learning is that it can only learn from historic data. If you feed it information about the characteristics of recalled cars versus reliable cars, it can possibly predict if a new model will fail in the same fashion as the past cars.

To continue the car example, as cars improve and evolve with more sophisticated computerized systems they will undoubtedly occasionally fail in new and novel ways. Will an Machine Learning algorithm be able to predict these new cases? Most likely not.

Photo by Pritesh Sudra on Unsplash

Variance — A.I.’s worst Nightmare

This concept or “varying data” is called Variance. Unfortunately for the good guys on our side, crime is a highly variant industry.

Criminals and hackers change constantly to stay ahead of the game. Circumstances within networks also change. Spammers always find new ways to defeat detection because there are so many ways to convey an a message.

Variance adds outliers to a model. An unexpected pattern or instance disrupts the existing paradigm and causes the model to be incorrect even though it’s been trained on large amounts of past data.

This is why McAfee, Kaspersky, Microsoft Defender and company have never managed to “solve” viruses. The game will continue as long as people continue to use computers. The available permutations and attack vectors are infinite because the software is constantly changing an evolving.

Quite literally new unpatched bugs are released into the wild everyday. It’s not possible for a single model to predict all of the new holes that emerge. It may be astonishingly fast at filtering many circumstances to reach a conclusion, but the game changes and without an update a model will become obsolete.

But there is data that is not nearly as variant as digital signatures. It’s rooted into our very DNA.

Sure your face is a unique one. No one quite has the same exact position of skin tone, eye positioning, jaw structure or nose shape. However, the range of these features are limited to the human genome.

Because of this relatively low level of variance, it’s much easier for Data Scientists to create shockingly accurate Facial Recognition software.

It helps that we’ve been posting pictures of ourselves and friends and labeling the data for companies like Facebook for years by “tagging”. These massive data warehouses now have the stockpile of ammunition they need to create the most accurate facial recognition systems known to man.

These systems are based off of data that stays within a certain defined set of characteristics — unlike your anti-virus that needs to be constantly updated with new rules.

Unlike code which is malleable in real time, changes in the gene pool rely on natural selection and mutations. Once your face has been recognized, it will take a physical covering or plastic surgery to trick the A.I. from recognizing you from an image.

Although on the surface, it may seem that building a model around transactions and user activity that is inherently digital should be easier than building a model for images, in the long run it is not the case.

We are at a very interesting time, when we’re still learning the limitations and capabilities of A.I. While the future certainly is bright, we have some serious ethical decisions that need to be made sooner rather than later.