How Artificial Intelligence Can and Will Be Corrupted, and How to Prepare For It

Brooke Torres
Jul 25, 2018 · 11 min read

At the end of the day, artificial intelligence software is still software — and software can be corrupted. While we may be in no realistic danger of Siri or Alexa turning into HAL 9000 or SkyNet, the inevitable misuse or malfunction of A.I. systems is an eventuality for which individuals and organizations must prepare.

The first step in preparing for rogue A.I. is to understand how artificial intelligence systems can be corrupted in the first place. University of Louisville researcher Roman Yampolskiy has identified eight unique pathways to dangerous A.I., which are outlined below. Then, we’ll discuss methods for mitigating the risk of malicious artificial intelligence.

How Artificial Intelligence Evolves

“Artificial intelligence” is a fraught term with many competing definitions, (especially when used by marketers), but a generally accepted description of A.I. is software that modifies itself without direct oversight from human programmers. A.I. software evolves in response to stimuli, resulting in a new system that wasn’t directly coded by human beings.

Most modern A.I. is a product of two factors: a core algorithm written by humans, and training data that informs how that algorithm modifies itself to improve or adapt its performance. For example, an A.I. solution that recognizes faces in online photos would be the product of both the original image-recognition algorithm and the various training images the algorithm consumed to “learn” how to distinguish human faces from other image data.

Training data is well annotated and delivered in a controlled environment, pre-deployment. It dictates how the A.I. will perform immediately after it is released and in general use.

Post-deployment, however, A.I. solutions will encounter “real world” data that can be used to inform and advance the A.I.’s performance. This means that several copies of the same pre-release A.I. can and will become radically different once they adapt to the different users and different organizations that employ them.

More simply, some artificial intelligence solutions are “born bad” by their developers, others become dangerous as an outcome of their natural growth. This corruption can happen pre- or post-release, on purpose or by mistake, as the result of environmental factors, or simply as the logical conclusion of the A.I.’s own development curve. Each circumstance leads to a different breed of malicious artificial intelligence.

The Definition of “Bad” A.I.

Malicious software — often referred to as malware — is software that is designed to cause harm to computer systems, human users, or both. What distinguishes malicious A.I. from conventional malware is that A.I. can become malicious without the consent or intent of its developers. A.I. in some measure designs itself, so an A.I. that is presently malicious may not have been initially designed that way, and A.I. that is designed to be non-malicious may not stay benevolent or benign.

As mentioned above, A.I. is evolved software that rewrites itself over time, possibly acquiring new functions or new objectives along the way. As such, any A.I. is potentially malicious, not just because its developers or users can direct the A.I. to cause harm, but because the A.I. system can evolve harmful behaviors on its own.

Possible Routes to A.I. Malware

Evil by Design A.I.

The Yampolskiy Type A artificial intelligence is the most likely and most dangerous type of malicious A.I. — artificial intelligence that was intentionally designed to cause harm. Type A malicious artificial intelligences are made malicious on purpose, pre-deployment. While it’s possible that any A.I. might accidentally evolve to become dangerous or harmful, it’s a certainty that someone has already a built an A.I. system that’s intentionally designed to hurt someone or something.

Several militaries are openly working on A.I.-controlled automated weapon systems. Multiple intelligence agencies are using A.I. to build better cyber-weapons and surveillance tools. Criminal organizations are almost certainly using A.I. to hone their phishing attacks — imagine, for example, a horde of chatbots running competent Spanish prisoner scams via email or Facebook Messenger — at this very moment.

Whatever harm conventional malware can do, A.I. can help malware do better. Artificial intelligence can make malware itself more effective, and A.I. can also make malware better at avoiding detection. When A.I. becomes consistently capable of passing the Turing Test — which is to say, when you can no longer easily tell whether you’re talking to a person or a chatbot — then a whole new scale of social engineering attacks becomes possible.

An A.I. con artist is more dangerous than any conventional huckster because an A.I. con artist can operate at almost unlimited scale.

Repurposed for Evil A.I.

A Yampolskiy Type B artificial intelligence has been designed for a legitimate purpose but redirected — rather than reprogrammed or hacked — towards malicious ends. Type B A.I.s are made malicious on purpose, but only post-deployment, and generally without directly altering their code.

Take, for example, an A.I. image-analysis system designed to help engineers identify structural flaws in bridges and highway overpasses by simply taking conventional photographs of the roadways. When used for its intended purpose, this A.I. solution saves lives by directing needed resources to roads and bridges in danger of collapse. If, however, a terrorist group got ahold of the same A.I. system, they could use it to direct bombing attacks to areas where they would do the most damage.

Similarly, an A.I. system could exist to manage the demand-reduction controls that many electrical utility companies place on home air-conditioning units. The system could be designed to minimize consumption while still ensuring that no home or business is without the needed air conditioning to ensure the safety and comfort of its occupants. The relative weight given to conservation vs. comfort would be controlled in a settings screen of the A.I. application, with preferences set by a human employee. If that employee “went rogue” and instructed the A.I. to maximize energy conservation at all costs, the effect would be to shut off air conditioning even during extreme heat waves, perhaps endangering the medically fragile.

In these examples, neither the behavior nor the underlying code of the artificial intelligence was altered, the A.I. was just redirected by human users to malicious or dangerous ends.

Poorly Designed A.I.

A Yampolskiy Type C artificial intelligence is simply poorly designed such that its “correct” performance is accidentally, rather than intentionally, dangerous. Type C A.I.s are made malicious accidentally, pre-deployment. This differs from Evil by Design (Type A) systems in that Type-C agents aren’t created for malicious purposes, but merely lack safety features or “inhibitions” that prevent unwanted behaviors.

An example of a poorly designed A.I. becoming malicious is a navigation system that plans the most fuel-efficient routes for delivery trucks, but which does not account for bridge or tunnel clearance heights. The result could be damaging or even fatal accidents as high-clearance trucks are guided to low-clearance routes, with vehicles crashing into overhead obstructions. This bug may not have been discovered during pre-release testing, but would become apparent post-release once accidents started to occur.

Almost every major software release contains bugs and errors that aren’t apparent until the software is “in the wild.” A.I. will all but certainly endure the same problems, and those issues could easily give rise to dangerous A.I. malware.

Poorly Managed A.I.

A Yampolskiy Type D artificial intelligence is accidentally directed by human users towards dangerous behavior. A Type D SA.I. is made malicious by mistake, post-deployment, as a result of user error. This differs from a Repurposed for Evil (Type-B) agent in that the human user does not intend to direct the A.I. towards malicious behavior, but simply does not understand the outcome of the instructions he or she issues to the A.I. system.

As an example, a financial portfolio management A.I. may be instructed by a naive user to minimize tax liability for the next fiscal year, and the A.I. may respond by shifting the company portfolio to intentionally poor-performing investments that bring about a financial loss, rather than producing profits that could be taxed. Even if the A.I. has safeguards to prevent these kinds of poor choices — many mature applications present an “Are You Sure?” challenge when a user makes a potentially damaging choice — humans persistently ignore those safeguards due to ignorance, haste, fatigue, or simple misunderstanding.

A.I. will be no more immune to end-user stupidity than any other type of software.

Model-Corrupt A.I.

A Yampolskiy Type E artificial intelligence is a perfectly functional copy of an innately flawed “natural” intelligence. Type E agents are made malicious by their environment, pre-deployment. More simply, Type-E agents suffer from the “monkey see, monkey do” problem. Robotic Process Automation, which sees low-grade A.I. systems learn to mimic the behavior of human users, will also mimic all the inefficient or counterproductive behaviors of those same human agents.

An example of a model-corrupt A.I. would be an RPA agent that was trained to file medical insurance claims on an insurance payer website. Unfortunately, the human agent that the RPA “trained under” may have skipped several compliance or research tasks, which led to a high number of those claims being rejected or denied by the insurer. Worse, because the RPA agent could file claims much faster than the human it replaced, the insurance company will suspend the doctor from filing online claims due to a very high rejection rate in a short period of time.

If we teach A.I. to model human behavior, we need to be certain that behavior is correct. These A.I.s will be a product of their environment or “company culture.” If we teach an A.I. to behave stupidly or dangerously, it will be stupid and dangerous at an A.I. scale.

Code-Corrupt A.I.

A Yampolskiy Type F artificial intelligence has suffered from conventional software corruption, leading to dangerous malfunctions. Type F A.I.s are corrupted post-deployment, by environmental factors. This corruption is derived from the same causes as those that afflict “normal” software, which is to say a physical corruption of the hardware storage that manages the A.I. — code corruption from a failed hard drive, memory chip, or copying error — that turns a good A.I. bad.

Again, A.I. software is just software, and all digital files accrue corruption over time. Sometimes, that corruption will lead to malware, even in A.I. systems.

Over-Evolved A.I.

A Yampolskiy Type G artificial intelligence is one that naturally grows into malicious behavior over time, without any intent from its developers or users. Type G A.I.s independently become malicious, pre-deployment; no hacking, corruption, or bad instruction from users is required.

An example of a Type-G agent is a Software-as-a-Service management A.I. that recursively optimizes its own code to guarantee the SaaS apps it oversees meet a service-level agreement (SLA) for minimum uptime, or minimum transaction response time. Such an A.I., after training for some time and observing the factors that lead to SLA violation, may evolve unforeseen resource-hoarding strategies, like proactively shutting down competing applications or websites in its data center to ensure unfettered access to bandwidth, power, and runtime.

This behavior is a logical conclusion of the A.I.’s directives, but the A.I. agent may only reach that conclusion after “learning” from its training data experience that such malicious behavior would help it meet its goals. These A.I. agents may lack necessary safeguards because human developers do not anticipate the “conclusions” the A.I. will reach during its natural evolution.

Badly Taught A.I.

A Yampolskiy Type H artificial intelligence is an A.I. that becomes malicious based on exposure to flawed or dangerous real-world training data. Type H A.I.s independently become malicious post-deployment. The most famous example of a Type-H agent is Microsoft’s Tay chatbot that, once exposed to the willfully offensive behavior of online humans, quickly began to adopt the same racist and misogynistic language patterns it observed “in the wild.”

Type-H agents differ from Type E Model-Corrupt A.I.s in that they independently evolve into malicious software after being put into use, rather than being initially built to mimic a flawed set of behaviors. Type-H systems “fall in with a bad crowd,” in a manner of speaking, and become malicious based on their analysis of inappropriate, learned behavior.

While pre-release training data is controlled and appropriate, post-release training data is largely uncontrolled — thus the infamous “don’t read the comments” admonition for Internet users — which will present challenges for A.I. developers for the foreseeable future, especially those creating agents designed to interact with human beings.

A Remedy for Malicious A.I.

So, given all these possible paths to corruption and malicious behavior, is artificial intelligence doomed to failure? Hardly. The key to safe, successful A.I. development is to ensure that malicious A.I.s can’t proliferate, and that all A.I.s can be monitored and interrogated to predict or prevent malfunctions or dangerous behavior.

Some A.I.s, like Designed-for-Evil (Type A) or those easily Repurposed for Evil (Type B), should not be made widely available. Just as we regulate the possession of dangerous industrial chemicals or explosives, we must regulate the possession of dangerous A.I. systems.

As this paper has noted, there are many types of A.I. that are not intentionally or obviously malicious, but nonetheless become dangerous over time. As such, it is necessary to “interrogate” an A.I. agent as to its stated objectives, the training data used to refine it, who built it, who is currently running it, and what actions it has taken in the past or plans to take in the future. This A.I. audit log will help identify A.I. agents that are potentially malicious, as well as identify circumstances, organizations, or individuals that have turned good A.I.s bad.

Create Scarcity with Blockchain

To halt the proliferation of dangerous artificial intelligences, it will be necessary to account for every copy of a given A.I., and perhaps even restrict which persons or organizations can possess or operate certain A.I. agents. Despite the best efforts of legal authorities and independent licensing safeguards, no one has successfully created “software scarcity” because, by its very nature, traditional software is easily copied. (For a clear view of this failure, look no further than the widespread online piracy of movies, television shows, songs, and Windows software licenses.)

Blockchain could solve this problem for A.I. agents. Blockchain is designed to address the double-spend problem for digital currency by ensuring that the location and ownership of each virtual coin in a blockchain ledger is strictly accounted for. There can never be two copies of the same Bitcoin. With a blockchain registry of A.I. agents, there can never be an unauthorized copy of any artificial intelligence.

To keep willfully dangerous A.I.s out of the wrong hands, a blockchain registry for artificial intelligences will become an almost certain requirement.

Create Accountability with a Distributed Ledger

It is not enough to simply limit the number of A.I. agents in use, as there will always be rogue developers that create off-the-grid A.I.s for nefarious purposes. Thus, a method is needed to identify the origins of any artificial intelligence and verify its intended purpose. The same blockchain technology that creates A.I. scarcity can also make A.I.s auditable through blockchain’s distributed ledger.

Like to website security certificates used for secure online transactions, an A.I. authentication blockchain could name the original developer of an A.I. algorithm, document any licensed or open-source training data sets used to improve the A.I.’s performance, record or declare the “mission statement” or objective of the A.I., identify the current user of the A.I. agent, and audit the transaction history of the A.I. itself.

That last audit function — creating an A.I. transaction history — will become vitally important as A.I. agents begin interacting with each other, rather than working exclusively with humans.

The blockchain is uniquely suited to A.I. security, as its distributed nature prevents any single A.I. developer from becoming an arbiter or authenticity or quality in the A.I. space. It’s encrypted nature also prevents hackers — or even rogue A.I.s themselves — from altering the distributed ledger logs of A.I. agents to disguise malicious activity.


Artificial intelligence will magnify all the security concerns of conventional software. Because A.I. evolves, software that is safe today may become malicious tomorrow. Because A.I. is a product of both a core algorithm and exposure to training data, A.I. systems that are harmless when released can be trained into malicious behavior by exposure to bad training sets. Because A.I. agents will be more broadly and uniquely competent than conventional software, the consequences of user error and malicious user intent are more serious for A.I.

As such, artificial intelligence will require a security solution that answers these unique challenges. The blockchain, with its unique capacity to enforce scarcity — thereby limiting the proliferation of dangerous A.I.s — and provide a rigorous, encrypted, neutral-party transaction log of all A.I. activity, is the best presently available technology to rein in A.I. malware.

If you’d like to learn more about how blockchain can secure and regulate A.I. agents, check out BotChain—which we’ve spent the last year developing to mitigate some of these exact risks.


BotChain is a platform for registration, identity and audit of autonomous systems

Brooke Torres

Written by

Founder @ Voyage. Previously: Founding team @ Talla, Early Smith alum. Ultra-marathon runner. NYC / Boston.



BotChain is a platform for registration, identity and audit of autonomous systems

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade