AI in InfoSec

This is a summary of the talk Clarence Chio gave at South Park Commons Speaker Series titled “AI in InfoSec”.

Teaser Intro
Full Video

For a long time, the exceptionally low tolerance for errors and inherent difficulty in collecting data have discouraged the use of artificial intelligence away in information security. In recent years, however, technologies such as AI, machine learning, deep neural networks, and big data are increasingly becoming the hottest keywords in the security industry.

Conventional Security Solutions

The primary detection mechanism of most conventional security solutions is signature/string matching with manually crafted rulesets. The biggest drawback to this approach is that it requires a new rule for every new threat. Expert-defined heuristics were added later to allow for more preemptive defense but human intervention was still necessary to make final decisions.

Even though the need to have up-to-date rulesets and heuristics gave the security industry the ability to extract recurring payments from customers, the advent of metamorphic malware that can transform itself to avoid detection meant that a more adaptive solution was needed.

Enter Machine Learning

Typical syntactic signature matching fared miserably against adversaries that are polymorphic. Furthermore, as security solutions continued bloating to defend against thousands of different attack mechanisms, so did the system complexity and bugs, to a point where it started to exceed analytical capabilities of the maintainers. Fortunately, machine learning happens to excel at precisely this task — pattern matching, detecting anomalies, and information mining in complex space.

Unique Challenges of AI in Security

We have all used successful applications of artificial intelligence in consumer products like virtual assistants that can comprehend and respond to human voice, and photo/video apps that can recognize faces and other objects. However, the use of AI in security poses unique challenges.

Errors are extremely costly in security. Siri failing to understand your query accurately or a photo app wrongly suggesting a wrong person to tag do not have any serious repercussions. People, in fact, have come to expect such errors to happen often. However, in security, errors, such as incorrectly granting access to an attacker can have dire consequences.

Explainability is also very important in Security. It is important to know why someone was denied access or some request was authorized. This is not an issue for most consumer applications; people do not care how the a photo app concluded that the picture you uploaded shows a cute cat wearing a hat.

The lack of training data is also a huge challenge. The amount of data security researchers get to collect is a drop in the bucket compared to the millions of text messages, images and personal information companies people willingly provide to companies like Google and Facebook. If there were similar magnitude of attacks, we would be in serious trouble.

The Reliability and Safety of AI

Artificial intelligence, as implemented and available today isn’t perfect. Adversarial examples are input data that are specifically designed to trick the AI systems into making a mistake. These are like optical illusions, except these are for machines.

The following image, while it is obvious to humans that it is a stop sign, machine learning models such as those used in self-driving cars can easily be tricked into thinking that these are speed limit signs suggesting a higher speed. [1]

Machine learning-based models see this sign as 45mph speed limit signs

Imagine the potential consequences of such an attack. The same concept can be used to bypass machine learning-based security solutions.

One report claims that 70% of the researchers in cybersecurity say that attackers can bypass machine learning-driven security solutions with nearly 30% saying it is “easy”. [2]

Machine vs. Machine

Attackers, too, have started adding artificial intelligence to their arsenal. Somewhat ironically, AI-based attacks are very effective against security systems that were specifically designed to prevent automated attacks. CAPTCHAs, the annoying distorted text that you often see in online forms is one such example.

Machine learning can also be used to test and generate adversarial examples using many of the same methods used to train malware classifiers. [3] There have also been “model poisoning” attacks that manipulate the statistical models and move decision boundaries in AI-based systems by repeatedly feeding misleading data.

Machine vs. Human

Spear phishing is a targeted form of phishing attack that involves sending individually customized baits to specific individuals as opposed to sending generic baits to random people. Despite being very effective, spear phishing was not as widespread as ordinary phishing attempts because the process is highly manual; that is before machine learning. One simulated attack that used AI-generated individualized tweets sent to 10,000 twitter accounts including U.S. Department of Defense personnel had as much as 35% click-through rate. [4]

This Twitter user clicked on an AI-generated spear-phishing link

The Future of AI in Security

It appears that the cat-and-mouse game the security industry hoped to end with the help of artificial intelligence is going to stay, for now. This may be attributed to the fact that even with artificial intelligence, the general strategy in security has largely remained the same albeit more automated — pattern matching and heuristics. Many security researchers are starting to think outside the box, as we see the shortcomings of that approach.

Humans remain as the weakest link and the largest attack surface in security. People are not only careless and gullible, but we also tend to create imperfect code riddled with bugs. The future of AI in security may not be in providing defense against attacks, but in making exploits extremely rare and difficult to find. AI can correct human behavior to make people “more perfect” and less error-prone. After all, preventative care is always better than reactive care.

Maybe cyberattacks won’t be a thing when we all become cyborgs

[1] Evtimov, Ivan, et al. “Robust Physical-World Attacks on Machine Learning Models” Cornell University, 7 Aug. 2017, https://arxiv.org/abs/1707.08945

[2] Carbon Black, Inc. “Beyond the Hype: Security Experts Weigh in on Artificial Intelligence, Machine Learning and Non-Malware Attacks” Carbon Black, Inc., 28 Mar. 2017, https://www.carbonblack.com/wp-content/uploads/2017/03/Carbon_Black_Research_Report_NonMalwareAttacks_ArtificialIntelligence_MachineLearning_BeyondtheHype.pdf

[3] Xu, Weilin, et al. “Automatically Evading Classifiers A Case Study on PDF Malware Classifiers.” University of Virginia, 21 Feb. 2016, https://www.cs.virginia.edu/~evans/pubs/ndss2016/

[4] Seymour, John and Tully, Philip “Weaponizing Data Science for Social Engineering — Automated E2E Spear Phishing on Twitter” ZeroFox, 4 Aug. 2016, https://www.blackhat.com/docs/us-16/materials/us-16-Seymour-Tully-Weaponizing-Data-Science-For-Social-Engineering-Automated-E2E-Spear-Phishing-On-Twitter.pdf