AI Powered Malware: The New Frontier for Cybersecurity
By Kate Stapleton and Yuan Stevens
This blog post discusses malware powered by artificial intelligence. We explain how malicious actors can employ autonomous malware for large-scale harm to gain access to a computer system, blend into its host environment in order to evade detection, and steal data even better than humans can. Beyond this, attackers could repurpose stolen computer code for malicious purposes or sell it on an online dark market to an adversary.
Why does this topic matter? We hope to help you as individuals or communities better assess potential threats to your computer systems. Our hope is to shed light on a growing and prescient area of concern that sits at the intersection of computer security and AI.
The scale of attack
AI has the potential to significantly increase the scale of harm engendered by malware attacks. Take the example of commodity malware (such as IoT botnets that use various password guessing techniques and spear phishing campaigns) that are traditionally operated by malicious human agents with limited resources.
Yet with the use of narrow AI, these human agents could be replaced with a set of machine learning algorithms that control lateral movement, command-and-control (C2) traffic, and data exfiltration within a given network, to almost fully automate the entire process. The result can be long-term and significant corruption of data, as well as the ability to scale attacks far beyond what can currently be accomplished by individuals acting together in groups or alone.
Gaining access and evading detection
Someone seeking to steal or corrupt data on a computer system must first obtain unauthorized access. AI driven malware has the potential to facilitate this process and compromise networks faster than even the most skilled attacker.
Computer environments can be plagued with single points of failure — such as the use of a single ‘master’ algorithm in a security system that acts as a catch-all for any unpredictable behavior within a computer environment. If there are no diversified algorithms to flag an anomaly in the master algorithm’s patterns or behavior, then an organization or company might not even know that their system has been compromised by an attacker. (Read below for the importance of diversified algorithms in security systems.)
The danger of AI powered malware comes with the ability to automate human skills and processes, and surpass our current ability to learn the most successful intrusion techniques. Malicious hackers typically use common and popular access ports and set up C2 channels during an organization’s operational hours in order to exfiltrate data without being noticed. They’ll also use domain fronting with domain names that look like the organization or companies’ existing internal ones in order to evade detection.
With the help of AI, attackers could deploy weaponized AI that has learned to blend in to whatever target network it has infiltrated, and operate unnoticed even better than a human would. Autonomous malware can learn these same access ports, protocols, accounts with admin and root access, and regular organizational behavior such as business hours much faster and more efficiently than humans.
Indeed, AI powered malware might be more effective than human attackers: while malicious hackers often have to make educated guesses about elements such as a vulnerable computer’s operating system, AI enabled malware could quickly learn about the target environment. It can also systematically learn the best domain fronting channels for command-and-control upon infection, causing much more long term and significant damage without the same risk allotted to human error.
Data theft; maximum damage
One of the hallmarks of cyber attacks is their ability to move faster than organizations are able to respond — taking the network by surprise and doing a lot of damage in a short amount of time.
Malware that has the ability to learn (about its target environment) can be trained to siphon small amounts of data from infected networks, at times extracting less than 1MB at a time. An IRL equivalent would be like an employee slowly taking small amounts of cash from the bank they work at, versus taking a large amount at once. This tactic can take advantage of outdated legacy tools that tend to only flag high upload amounts of 500MB or more as suspicious. Such malware can learn to make small, continual transfers, replacing the 12 to 24 hour attack model with, say, a month-long one, making detection nearly impossible.
AI powered malware can also become responsive to the infected network in order to blend in while exfiltrating data. For example, it can learn to transfer large amounts of data only when the infected machine is video-conferencing, and only using that same video-conferencing system to send out data.
AI systems are not immune from attack. Malicious actors who’ve gained access to an internal AI network can do significant damage, for example, by tampering with labels involved in the supervised learning process that are used to train the algorithms to decipher between malicious and clean code. By switching these labels, the attacker could allow their malware to go undetected in the system, all while internal operations run smoothly. In other words, attackers can cover up their tracks even if a company employs protective cybersecurity AI systems by altering what would could be recognized as irregular behavior.
In order to enact maximum damage, malware can learn the difference between high and low value targets and approach them accordingly. Through the use of narrow AI, attackers could tailor their attacks far more effectively, such as installing ransomware on high value machines and keyloggers on low value ones. This automated contextual awareness allows the malware to mimic trusted behaviors and maximize the damage it causes through scaled, targeted attacks.
Repurpose code — or sell it on the black market
Once an attacker has successfully exfiltrated any desired data, they can use the hijacked code for other purposes or sell it on the black market most likely to an adversary. For example, an attacker could reverse engineer compromised algorithms to determine which indicators flag malicious code in any given system. The malicious agents could then remove these elements from their code before deploying it in order to evade detection, or perform this service for hire. They could sell their knowledge of the inner workings of the infected machine’s code to third parties, to help inform which malicious techniques to deploy in the target environment and which to avoid for maximum payoff.
With all this in mind, there are indeed some solutions. An important step requires people in the data science and computer security worlds to come together and learn from each other. Researchers in this field will need to share knowledge to develop practices that better secure computer systems from AI enabled malicious attacks.
Another proven solution against malware is the use of ensemble learning models. These statistical and machine learning models combine various individual models tailored to one specific area, to create an aggregate machine learning model based on these predictions considered as a whole. This diversification of inputs decentralizes the model, with each individual predictive model producing an independent result to add diversity and combat the use of a master or central algorithm (mentioned above).
“The idea of ensemble learning is to build a prediction model by combining the strengths of a collection of simpler base models.” — Trevor Hastie, Robert Tibshirani, Jerome Friedman via Microsoft
Microsoft Security presented on this model at Blackhat conference in 2018. They provided specific use cases where the ensemble model identified and thwarted a large scale attack that would have otherwise evaded detection. These models rely on monitoring the probability scores of many individual models, which would not necessarily indicate suspicious behavior on their own, but when considered as a whole would be able to flag the presence of malware.
It is imperative that people interested in computer security begin preparing for the threats made possible by AI powered malware. This is especially important to do early on — not least because it can sometimes take years to understand the reasoning behind algorithmic decisions, particularly complex ones where computers write their own code. We imagine this blogpost will be one of many future attempts to make sense of AI powered malware, which is bound to be a pressing issue in the years to come.
Kate Stapleton is a cyber security analyst, blockchain developer and privacy advocate.
Yuan Stevens is a legal, ethics and technology researcher. She is a research affiliate at Data & Society.