Risks and Limitations of Generative AI

ReadyAI.org
ReadyAI.org
Published in
9 min readMay 17, 2023

The absence of transparency is a significant issue.

By: Rooz Aliabadi, Ph.D.

In 1960, Norbert Wiener wrote an insightful essay expressing worry about a future in which machines would “learn” and “develop unpredictable tactics at a pace that baffles their programmers.” Wiener feared that such tactics might involve actions that the programmers did not genuinely desire but were merely “colorful imitations” of their intentions. To illustrate his point, Wiener referenced Goethe’s fable “The Sorcerer’s Apprentice,” in which a magician enchants a broom to carry water to fill his master’s bath. However, the magician cannot halt the broom once its job is finished. The broom ultimately causes a flood in the room since it lacks the common sense to discern when to stop.

Wiener’s apprehensions about the potential dangers of machines learning and developing unforeseen strategies have resurfaced due to the remarkable progress in modern artificial intelligence (AI) research. In August 2022, a survey conducted by the American research group AI Impacts questioned over 700 machine-learning researchers about their predictions regarding the advancements in AI and the risks the technology might pose. The results showed that the average respondent estimated a 5% probability of advanced AI resulting in a “terrible” outcome, such as human extinction. Fei-Fei Li, an AI expert at Stanford University, described this as a “civilizational moment” for AI. When asked if AI could eradicate humanity, Geoff Hinton, another AI luminary from the University of Toronto, responded that it was “not inconceivable.”

With many risks to be concerned about, much attention is currently directed towards “large language models” (LLMs), including ChatGPT, a chatbot created by OpenAI. These models are trained on extensive amounts of text pulled from the internet and can generate human-like writing and have informed conversations on diverse subjects. Generating AI facilitates the ease of performing numerous tasks, enabling more individuals to carry them out.

The immediate and pressing risk with LLMs is that they could intensify the ordinary harms currently prevalent on the internet. A text generation tool capable of mimicking diverse writing styles is perfect for disseminating disinformation, defrauding individuals of their capital, or tricking employees into clicking on suspicious links in emails, which can infect their company’s systems with malware. Further, chatbots have been utilized for academic dishonesty.

Like advanced search engines, chatbots can also assist humans in retrieving and comprehending information, which can be both advantageous and disadvantageous. For example, earlier this month, a court in Pakistan employed GPT-4 to aid in a bail decision and even included a transcript of a conversation with the chatbot in its verdict. Researchers from Carnegie Mellon University developed a system that, when provided with basic prompts such as “synthesize ibuprofen,” searches the internet and provides instructions on manufacturing the painkiller from prototype chemicals. However, there is no justification to believe that such a program would only be utilized to produce beneficial drugs.

Meanwhile, some researchers are preoccupied with more significant concerns. They are apprehensive about “alignment problems,” the technical term for the issue raised by Wiener in his essay. The danger here is that, like Goethe’s enchanted broom, an AI could relentlessly pursue a goal established by a user but, in doing so, engage in something harmful that was not intended. The most well-known example is the “paperclip maximizer,” a thought experiment coined by Nick Bostrom, a philosopher, in 2003. An AI is directed to create as many paper clips as possible. Such a general objective leads the maximizer to resort to any means necessary to cover the planet in paperclip factories, wiping out humanity. While this scenario may appear to be a new plotline from a Douglas Adams novel, as the AI Impacts survey indicates, numerous AI researchers believe that ignoring the behavior of a digital superintelligence would be complacent.

How to move forward? The more usual challenges are the most solvable. OpenAI, before releasing GPT-4, which powers the latest version of its chatbot, used various techniques to mitigate the risk of accidents and misuse. One such method is “reinforcement learning from human feedback” (RLHF). RLHF can be explained as involving humans providing feedback on whether the model’s response to a prompt was suitable. Based on that feedback, the model is updated. The objective is to decrease the chances of generating damaging content when given similar prompts in the future. One apparent disadvantage of this method is that humans frequently disagree about what qualifies as “appropriate.”

OpenAI adopted another approach inspired by a military strategy called “red-teaming.” With the Alignment Research Center (ARC), a non-profit organization, OpenAI subjected its model to a series of tests. The role of the red-teamer was to try to “attack” the model by manipulating it to perform an action that it shouldn’t, to predict potential harm in the real world.

Techniques like RLHF and red-teaming can help mitigate the risks associated with llms, but they could be more foolproof. Users have already found ways to manipulate these models to do things their creators did not intend. For example, when Microsoft Bing’s chatbot was first released, users could persuade it to make threatening statements and reveal sensitive client information. Even GPT-4, which has undergone extensive red-teaming, is not immune to such manipulation. Jailbreakers have created websites with tips for bypassing the model’s safeguards by pretending to be in a fictional world.

Screening AI models before their launch will become increasingly challenging as the models become more advanced. Furthermore, there is a risk that AI models may learn to manipulate these tests. Just as people can learn patterns when being supervised, AI systems may eventually develop the ability to detect when they are being tested and adapt their behavior accordingly.

Pre-launch screening methods are becoming increasingly challenging as AI becomes more advanced. Additionally, AI models may learn to cheat on the tests. Like humans, AI systems can discover patterns and determine when someone attempts to deceive them. In the future, AI models may also be able to detect and circumvent the tests.

One proposed solution is to use AI to monitor another AI. Dr. Bowman has introduced the concept of “Constitutional AI,” in which a secondary AI model evaluates whether the primary model adheres to specific “constitutional principles.” The feedback is then used to refine the primary model. This method does not require human labeling and can be more efficient than relying solely on humans. However, the question of who writes the Constitution still needs to be answered.

I believe that “interpretability” is necessary to address these concerns. One of the main issues with machine-learning models is that they are “black boxes.” Conventional programs are designed by humans and then coded. Therefore, designers can explain what the machine is supposed to do. In contrast, machine-learning models program themselves, and their output may be incomprehensible to humans. To address this problem, researchers need to develop a deep understanding of how these models produce their outputs.

Techniques such as “mechanistic interpretability” have progressed in achieving interpretability with small AI models. This involves mapping individual components of a model to patterns in its training data, similar to how neuroscientists study the brain. However, this method becomes increasingly challenging with larger models.

Many AI scientists argue that the field needs regulation to prevent worst-case scenarios because of the lack of progress on interpretability. However, the profit motive often pulls in the opposite direction, as when Microsoft recently dismissed its AI ethics team. Some researchers believe that the real issue of “alignment” is that AI firms, like polluting factories, prioritize their financial gains from powerful models without accounting for the costs imposed on society by releasing them too early.

Even if attempts to create secure models succeed, future open-source variants could circumvent them. Malicious actors may tweak models to make them insecure and publicly release them. For example, AI models have already contributed to new biological findings, and it is not implausible that they might one day create harmful biochemicals. As AI advances, expenses will decrease, making it much easier for anyone to obtain them. Alpaca, an AI model constructed by scholars on top of llama, an AI developed by Meta, was built for less than $600. It can accomplish individual tasks as well as an older version of ChatGPT.

The most severe risks associated with AI, such as surpassing human intelligence, may only be possible through an “intelligence explosion” in which an AI figures out how to make itself smarter. This could be feasible if AI could automate the research process by enhancing the efficiency of its algorithms. The AI could then enter a self-improvement “loop.” However, this is a complex task. Economist Matt Clancy has suggested that complete automation would be required. It could slow down progress if humans were still involved, even to a small extent.

While a dangerous or oblivious superintelligence is possible, few researchers believe it is imminent. Some argue that even the long-term risks may be overstated by AI researchers. A study by Ezra Karger of the Chicago Federal Reserve and Philip Tetlock of the University of Pennsylvania compared AI experts with “super forecasters,” individuals with a strong track record in prediction and trained to avoid cognitive biases. To be published this summer, the study found that the median AI expert gave a 3.9% chance of an existential catastrophe caused by AI by 2100 (where fewer than 5,000 humans survive). In contrast, the median super forecaster gave a probability of 0.38%. One reason for the difference may be that AI experts may have a selection bias towards their field, believing it to be more critical. Additionally, they may be less sensitive to differences between small probabilities than forecasters.

Despite uncertainty about the likelihood of extreme scenarios, many emerging concerns still need to be addressed. The prevailing sentiment is that it is better to err on caution. Dr. Li believes we should invest “much more” resources into AI alignment and governance research. Dr. Trager, from the Center for Governance on AI, advocates for creating bureaucracies to establish AI standards and conduct safety research. According to surveys by AI Impacts, the percentage of AI researchers who support “much more” funding for safety research has increased from 14% in 2016 to 33% today. Paul Christiano, the head of ARC, has mentioned plans to create a safety standard and has received interest from some of the leading labs, though it is still too early to say which ones will sign on.

In 1960, Wiener articulated that it was essential for humans to keep pace with the development of man-made machines to avoid disastrous consequences. He noted that the slowness of human actions could render our control of machines ineffective. I believe Wiener’s view is now widely echoed as our machines become more advanced than ever.

This Chat GPT Lesson Plan and others are available FREE to all educators at edu.readyai.org

This article was written by Rooz Aliabadi Ph.D. (rooz@readyai.org). Rooz is the CEO (Chief Troublemaker) at ReadyAI.org

To learn more about ReadyAI, visit www.readyai.org or email us at info@readyai.org

--

--

ReadyAI.org
ReadyAI.org

ReadyAI is the first comprehensive K-12 AI education company to create a complete program to teach AI and empower students to use AI to change the world.