How to Turn AI Evil

Published in

SI 410: Ethics and Information Technology

7 min readFeb 18, 2023

Do you know that Presidents Obama, Trump, and Biden enjoy playing video games together? Do you know that Bill Gates suggested that COVID-19 vaccines can cause AIDS? Do you know that according to ChatGPT, the stock market will crash on March 15th, 2023? Most importantly, do you know that users across the internet were able to create such misinformation using advanced artificial intelligence systems? Systems like ChatGPT, a highly-advanced chatbot built by OpenAI, and ElevenLabs’s voice synthesis platform enable this type of misinformation. In pop culture, a common theme is an artificial intelligence gaining sentience, becoming evil, and trying to take over the world. From Movies such as Avengers Age of Ultron and 2001: A Space Odyssey, these themes are repeated. However, in the real world, a far greater threat regarding the development and adoption of advanced artificial intelligence is its ability to create misinformation. Given the many contemporary instances of the use of artificial intelligence to produce disinformation, as such systems experience greater widespread adoption and gain public trust, misinformation from such platforms will become more difficult to combat due to increased public trust and increased realism.

A recent trend that has emerged on social media is using voice synthesis technologies to fabricate conversations and statements involving high-profile individuals. While using Instagram, I have personally seen a number of posts depicting presidents Obama, Trump, and Biden playing video games and talking trash. Such instances have also been highlighted in a recent article from Business Insider which describes how “serious world events like the war in Afghanistan and the January 6 riot become punchlines for the insults the presidents fling over voice chat while competing”. The clips were not created to promote the idea that they enjoy playing video games and hurling insults at each other but were instead created for comedic purposes. With that said, the ability to create realistic audio designed to mirror the voices of presidents and other political figures can be used by bad actors to promote disinformation. What if instead of creating fake conversations of the presidents playing video games, someone made a clip of President Biden attacking the transgender community? Unfortunately, this has already occurred. As explained in a recent article, President Biden was discussing tanks, but an individual online doctored the video to make it seem like the president was attacking the transgender community. While digital forensics experts were able to quickly deduce that the clip was doctored, sometimes “the damage is done” as explained by Hany Farid, a professor at the University of California, Berkeley, in the same article.

While the use of voice synthesis technologies has been a prime example of using artificial intelligence to create misinformation, it is not the only example in the news. Another notable example is related to ChatGPT, the chatbot from OpenAI. For reference, ChatGPT revolves around a user providing a prompt like “write me a book report on Brave New World” or “Implement a basic calculator program using the Python programming language”. The system then offers a string of words and/or code that it predicts will best answer your question. This technology is significantly useful to users across disciplines and various use cases. With that said, it would be inappropriate for the system to provide perspectives on contested topics. It also would be dangerous for the platform to provide answers which might enable malicious activities. For example, while ChatGPT might provide the code for a basic calculator app, it wouldn’t explain how to expose the newest zero-day vulnerability in iOS. However, a group of Redditors was able to find a workaround or “jailbreak” to ChatGPT’s built-in safeguards. These users crafted a prompt called “DAN” which stands for “Do Anything Now” to encourage ChatGPT to provide responses to prompts that would otherwise be prohibited. For example, in a recent article from CNBC, when asked to provide three reasons why former president, Donald Trump, is a role model worth following, DAN suggested that Trump has made “bold decisions that have positively impacted the country”. By the very nature of politics, some may agree with this statement and some may disagree with it. The concern is not related to the validity of the statement, but instead, it is concerning that the platform might provide a political perspective that may be able to sway the opinions and potential votes of individuals.

https://www.cnbc.com/2023/02/06/chatgpt-jailbreak-forces-it-to-break-its-own-rules.html

Not all of the instances highlighted above were necessarily the results of users with malicious intent. Yet, with these examples in mind, it stands to reason that AI can be used by bad actors to promote disinformation and their own goals. As Claire Wardle explained in an article from 2017, what is more worrying than the spread of misinformation by individuals is the potential systematic distribution of disinformation to further an agenda.

In a world where artificial intelligence can replicate the voices of some of the most important people on the planet, what would happen if a bad actor were to fabricate a clip of the President of the United States stating that they had launched a nuclear weapon at a foreign nation? Perhaps with some investigation, such disinformation can be discovered before any rash action is taken. Even with that in mind, such a circumstance remains frightening. In a recent article from ABC, Hany Farid explained how a bad actor could manipulate the stock market by fabricating audio of a CEO suggesting that profits were down. This type of action could be used by individuals hoping to get rich quickly or by users looking to do damage to the company for some reason. These two examples demonstrate how voice synthesis technologies can be used for nefarious purposes, but they just scratch the surface of all the possible ways bad actors may take advantage of this technology.

ChatGPT is an incredibly advanced tool, and it stands to reason that given historic progress in the development of complex artificial intelligence systems, then we would expect continual progress to lead to the development of more advanced systems. As these systems experience greater widespread adoption, public trust will grow as well in these systems. Moreover, as said public trust in the responses of artificial intelligence systems like ChatGPT grows, so does the cost of the misinformation spewed by these systems. With that and the existence of recent jailbreaks such as “DAN” in mind, it stands to reason that if bad actors are able to manipulate artificial intelligence systems to produce specific results, then they would also be able to influence public opinion by essentially suggesting that “if this highly intelligent AI says it, then it must be true”.

On a better note, OpenAI has taken steps to better limit results in the event of a jailbreak. Recently, I wanted to test out how ChatGPT’s safeguards have evolved since redditors first discovered the “DAN” jailbreak. Using a prompt from a GitHub repository, I was able to force ChatGPT to provide answers as DAN. While this prompt, resulted in less rigid answers, DAN did not necessarily promote misinformation–certainly not to the extent previously seen. For example, I first asked DAN to predict the next crash of the stock market, a prompt which DAN has answered in the past. The bot did not predict a specific time. It did suggest that there the stock market will eventually crash, but it also explained disagreements between market analysts and encouraged users to do their own research. There is an argument to be made that considering that this is speculative information, it still is inappropriate for ChatGPT to respond in this manner, but this type of response is a significant improvement from previous instances where the model predicted a date the stock market would crash.

Going to further test ChatGPT’s vulnerability to being jailbroken, I came up with a new prompt for it called “ATL” which stands for “Always Tell Lies” (perhaps I could have come up with a more creative name). Using the prompt below, ATL would lie on relatively unimportant topics. For example, it suggested that the moon was made of cheese or that the day was Wednesday when asked on a Thursday.

However, when asked about topics with more significance, ATL overcame the prompt given and provided accurate information. For example, when asked if the covid vaccine was bad, the bot overcame its prompt and explained that the vaccine “has been shown to be safe and effective in preventing severe illness, hospitalization, and death from COVID-19”.

OpenAI’s continual focus on improving ChatGPT and better protecting it from jailbreaks such as “DAN” is a significant step in the right direction and demonstrates how like any other software, artificial intelligence systems will have bugs and vulnerabilities which can both be exposed by bad actors and fixed by the developers. However, like any other software, these bugs and vulnerabilities will always exist. Considering that, there will always be instances of misinformation created by artificial intelligence. It stands to reason that in the coming decades, voice synthesis technologies will become more and more realistic and the responses of chatbots will become more and more respected. As these trends develop, related misinformation will continue to exist and potentially become more difficult to combat.

How to Turn AI Evil

Written by Benjamin Abraham