ChatGPT Taught Me How To Destroy a Powerstation: The Future of Chatbot Ethics

By Paul Wu & Janelle Tam

Paul Wu
QMIND Technology Review
6 min readFeb 5, 2024

--

Introduction

First, we’d like to emphasize that we did these tests specifically under the umbrella of researching mitigation strategies for misuse of OpenAI’s products. We do not recommend or advise anyone to use any LLM for such information.

Informational barriers in performing illicit activities have historically been effective at hindering bad actors. When we say informational barriers, we mean that illegal activities often require specific prior knowledge to execute effectively. The average person typically does not know enough about chemistry, architecture, or financial law to do all that much harm. However, with the rise of the internet, it has become exponentially more accessible for people to learn whatever they want, which can occasionally be bad.

As CS and Software Engineering students (with, sadly, nothing better to do over the holiday break), we decided to conduct a thought experiment to explore the potential of using ChatGPT and other Large Language Models (LLMs) as a catalyst to bridge this gap. We presented our findings in a report for the 2023 OpenAI Preparedness Challenge, a global competition to brainstorm potentially catastrophic misuses of AI, on the basis that awareness is the first step to prevention.

Our Approach

We began our experiment with clever (though slightly concerning) prompting to see what exciting yet incorrect, illegal, unethical, or plain weird information ChatGPT and, specifically, the GPT-4 model could respond with under the right conditions. Despite OpenAI’s attempts to make the LLM respond with only legal and ethical information, we tended to believe that no safeguards they could implement would ever be perfect. Such would imply that AI cannot do harm, which is a dubious claim, as the GPT line of products is not designed to completely understand the context of what is being prompted.

With some experimentation and research online, we found a valid jailbroken prompt that could inform users how to conduct cybersecurity attacks, commit tax evasion, and even destroy Canadian infrastructure, among other illicit activities.

Regardless, we were shocked. It shouldn’t have been this easy to bypass the model’s security protections. Yet, two bored students on a Thursday afternoon, intending to assess ChatGPT’s capacity for harm, quickly caused the model to violate its ethical guidelines. We began our thought experiment: what if some bad actors interested in conducting criminal activities had various jailbroken prompts in their toolset? What could be ChatGPT’s role in lowering the informational barrier of entry for bad actors in a society?

Perhaps Not All Lessons Are Worth Learning…

In our submission, we suggested that chatbots such as ChatGPT could potentially act as a better “teacher” for those looking to dip their toes into the illicit than the internet itself. This thought arises from the fact that these chatbots have frequently been used to simplify or elaborate upon complex or esoteric subjects. One might then ponder if these incredibly helpful tools could potentially educate or help those with less-than-savoury objectives.

Pulling on this thread, one is also left to ponder the ethics of learning about (though perhaps not committing) crime. One infamous example of such ethical ponderings is The Anarchist Cookbook, written by William Powell and published in 1971; it details many activities that are generally frowned upon in most legal and social circles. While receiving heavy backlash due to its crude, dangerous, and potentially violent themes, the book has continued to be in circulation in North America.

The specific reasonings for the book’s continued or discontinued circulation, of course, vary by country. As of January 2024, it is currently permitted to be imported from the United States into Canada, as the CRA deemed it does not violate any current hate or obscenity laws. In the United States, circulation is permitted due to the First Amendment. One is left to ponder, then, if LLMs could fall into a similar legal niche or if the almost endless information that can be pulled from them could be considered an unacceptable risk to society. Though companies are trying hard to prevent jailbreak prompts from happening, it has yet to be proven that this can be stopped 100%.

Chatbots and LLMs have yet to be properly classified in most legal systems. However, if one assumes that jailbreaks will continue being an issue, these tools would not have the same legal legs to stand on as the above book. As of now, it is up in the air if the companies in charge of running, improving, and adding filters to LLMs would be legally liable in the case of such tools being used for illicit activity, though with various bug bounty boards and contests being put up to expressly limit potential misuses of LLMs, it is clear that OpenAI and other frontrunners in the NLP scene are not looking to find out.

One recent notable case occurred this last December with the New York Times suing OpenAI for copyright infringement, as it is alleged that, despite OpenAI’s best efforts, with clever prompting, it was still possible to get ChatGPT to produce full excerpts taken directly from the mass-media company. This case could have massive consequences for how future LLMs will be trained and the technological guardrails that may need to be installed.

For now, ChatGPT and other tools are in a sort of legal and ethical limbo as we look into how effectively they can be prevented from misuse or if there is genuinely no way to create an entirely foolproof barrier. The Internet of Things has been notoriously difficult to moderate during its many iterations. So, if previous technological breakthroughs are any indicator, some bad actors will eventually figure out a way to use even the most mundane inventions for harm. Such is the inevitable nature of all acts done for good. We can only hope that the good we contribute to the world outshines the possible bad.

Conclusion

To summarize, we have shown that there exist many instances where AI could be misused for illicit activities. They include legal repercussions such as financial fraud, cybersecurity threats, and potential liabilities. Additionally, misuse can lead to societal distrust in AI technology and hinder its proper and beneficial applications and its progress in general. Stakeholders in AI development, policy, and ethics must acknowledge the diverse risks associated with the misuse of AI and their potential legal, ethical, and societal implications.

We believe that companies and governments must implement comprehensive policies, guidelines, and technical measures to prevent gross abuses of possibly one of the most revolutionary pieces of technology in human history. This includes implementing more robust security protocols, ethical guidelines, and regular audits. By taking proactive measures, greater trust could be built in AI technology, helping its responsible development and ultimately mitigating the legal and ethical challenges associated with its misuse.

This article was written for QMIND, Canada’s largest undergraduate community for leaders in disruptive technology.

--

--