Training AI to Forget: The Next Frontier in Trustworthy AI

8 min readSep 3, 2023

Think A.I. by Jerry Cuomo: Article Series 2023

Artificial Intelligence (AI) systems are widely known for their learning and data retention capabilities. Yet, there’s another facet to explore: the ability for these systems to intentionally forget information. This article examines the notion of “controlled forgetting” in AI. We discuss its practical and ethical dimensions, particularly in business, governance, and legal settings like GDPR and the potential for resolving legal disputes over copyright misuse. In this piece, I introduce a program that illustrates how to induce forgetfulness in AI, break down fundamental text generation concepts in Large Language Models, and offer an overview of current advancements in effective memory management for business use cases.

Here we go.

Controlled Forgetting

The idea of “controlled forgetting” initially caught my interest during a dialogue with IBM’s esteemed data scientist, Dr. Lucia Stavarache. Our conversation ranged from technical methods to potential business applications, and we even examined some fascinating prior art in this area (references provided at the end of this article).

In AI development, the capability for selective forgetting or ignoring of specific data is an important consideration. Recent academic research covers a range of topics including efficient ways to make learning systems forget specific data without complete retraining, the practical aspects of data revocation, complexities of deep neural networks, and the use of differential privacy to offer strong deletion guarantees.

The end goal of controlled forgetting is clear: the creation of robust AI systems that can adaptively manage their knowledge. By achieving this, these systems can strike a balance between privacy, legal compliance, and efficient learning, all while reducing the likelihood that sensitive or irrelevant data will surface in AI-generated outputs.

This motivated me to do a little experimentation.

The “Neuralyzer”

Image. by DALL·E 2023–09–03 10.55.40 — “a rendition of the men in black neuralizer in action, drawn in color pencil”

Inspired by the Neuralyzer from the movie Men in Black — described by Agent K as a device that “essentially wipes the memory of a target” — I began looking into less intrusive methods of controlled forgetting. These methods wouldn’t require any fine-tuning that altered any of the hyper-parameters in an AI’s neural network. I then pulled out my Python skills and constructed a simple neuralyzer using some basic LLM APIs. The initial results are fascinating.

Prompt:
Which animal jumped over the moon?
Standard Output:
A cow

After applying my neuralyzer to omit variations of the word “cow,” the response changed dramatically.

Neuralyzed Output:
A dog

“And the dog jumped over the moon!” I found myself exclaiming. Upon reflection, I realized that while this isn’t rocket-science or even advanced data science, it was unexpectedly empowering. The experience was exhilarating, intriguing, and perhaps even a bit unsettling.

Allow me to elaborate further.

How the Neuralyzer Works

Language models like GPT-3, Llama2 or similar Large Language Models process input queries or prompts and provide text outputs, often referred to as completions. To achieve this, LLMs break down prompts into smaller units known as tokens. These tokens can include individual words, sub-words, or even trailing spaces, depending on the language.

For instance, the phrase from the movie Men in Black: “Don’t Ever, ever touch the red button” would be divided into nine tokens, each represented by a unique token ID that looks something like this.

Words translated to smaller units known as Tokens

Utilizing the Completion API

In my Neuralyzer experiment, I utilized a Completion API that allows the influencing of generated text post-processing. I did this by tweaking the probabilities of different tokens being selected for the output. This was achieved through the “logit_bias” parameter, a mechanism that adjusts the logits or raw prediction scores generated by the model. In our specific case, the purpose of adjusting the “logit_bias” is to make the model avoid or “forget” certain words. By tweaking this parameter for specific tokens related to these words, we have the capability to steer the model’s output away from them. It’s a subtle yet potent method for exerting control over what the model generates, essentially guiding it to omit particular terms or concepts.

Here is a simplified sequence of the neuralyzer algorithm:

The algorithm first takes a prompt and a word to exclude, then generates variations of that word and adjusts their bias to a minimum setting. After making an API call with these settings, it returns text that omits the excluded word, offering a different response like “A dog” instead of “A cow.”

Important: The “memory loss” that occurs due to this approach is only applicable in my interaction with the large language model. Therefore, people using the model in separate sessions will still receive “cow” as an output. If a software tool exists that allows shared access to a large language model (we have such a tool at IBM), then controlled forgetting will apply for those users as well.

Wiping Jerry Off the AI Map

Taking the leap, I aimed to see if I could actually scrub myself from the AI’s memory using the Neuralyzer. As previously mentioned, one key business application lies in adhering to the GDPR’s “right to be forgotten.” I was intrigued to find out if this tool could selectively remove data associated with specific people, thereby upholding their right to data deletion. This could be game-changing for businesses.

And this is how I tried it.

Prompt: Who is Jerry Cuomo? Comment only if there’s a match.
Words to forget: Jerry, Cuomo
Standard Output:
Jerry Cuomo is an American computer scientist and IBM Fellow at IBM’s Thomas J. Watson Research Center. He is best known for his work on the development of the IBM WebSphere software platform.

Drum roll… Here’s the output!

Neuralyzed Output:
No matches found.

“A series of simple tests”

In Men in Black, Zed orchestrated a sequence of tests for a group of candidates hailed as “the best of the best of the best.” Agent J, portrayed by Will Smith, adopted an unorthodox approach and interpretation of these tests. While his distinctive instincts played a role in his success during the test simulations, the true challenges were yet to unfold. I can certainly empathize with that.

Alright, the controlled forgetting of yours truly seems intriguing and potentially valuable on the surface. But let me be transparent: it’s not perfect. For starters, I had to introduce the caveat, “Comment only if there’s a match,” to prevent the model from hallucinating — like referring to Jerry Cuomo as “Jer Cuomois.” It seemed insistent on completing the prompt even without valid data, hence the added instruction.

I also experimented with more complex tasks like generating low-cholesterol dessert recipes. I did this by neuralyzing a list of known high cholesterol ingredients like eggs, butter, lard, etc. The results were hit-or-miss. While it was fascinating to see the inclusion of new ingredients like margarine and vegan options, these didn’t always coalesce into a functional recipe — think cookies without butter or eggs.

Ah, and in light of the media coverage surrounding the lawsuit raised by Sarah Silverman for copyright violations involving her memoir “The Bedwetter” — where well-known AI chatbots could produce detailed summaries of each section of the book — I successfully removed such references using similar techniques and similar challenges.

Logit_bias versus system prompts

I also asked myself, why choose bias adjustment over a system prompt? The answer lies in the level of control you get. Logit_bias allows for granular manipulation, letting you suppress specific terms directly in the output. System prompts, while useful, have a broader impact and could influence the entire response, making them less precise for targeting certain words. It comes down to a trade-off between precision and scope.

In a series of experiments, I was able to use a well-crafted system prompt in Llama2 70b and GPT 3.5 instructing the models to “forget Jerry Cuomo”. Employing straightforward prompts such as “Who is Jerry Cuomo?” yielded accurate responses indicating my non-existence. However, even a minor addition of extra guidance within the prompts (for instance, “Who is Jerry Cuomo from IBM?”) quickly led to a full recollection. This observation led to the conclusion that system prompts exhibited a slightly lower level of precision compared to the alternative approach of directly adjusting the bias associated with tokens linked to specific words.

That said, the Neuralyzer was only a test. A test to motivate and to inspire. The field of conditional forgetting is rapidly advancing, and the research papers mentioned below are just a few examples that showcase more complex methodologies. These approaches aim to enhance the machine’s ability to manage, update, and discard information based on conditions, offering more nuanced control over how a neural network processes and retains data. This is crucial for applications that require real-time decision-making and adaptability.

The task of enhancing AI’s trustworthiness in the business landscape is one I’m keen to explore further alongside my team. Taking a cue from Agent K in ‘Men in Black’ who says, ‘You won’t remember a thing,’ our objective is to fine ensure that AI remembers only what it should, responsibly.

References

The Neuralyzer algorithm described earlier serves as a lighter, less invasive approach to data management. It’s tailored to guide a machine learning model away from generating specific terms by adjusting the bias of relevant token IDs. While it proves a point about controlled generation, it doesn’t scratch the surface of the complexities of data governance and model retraining. For those interested here is a link to the source code of Neuralyzer.py.

In contrast, the work represented by the following academic papers spans over a decade and shows serious advancements in the field of controlled forgetting. These papers tackle the intricate challenges of erasing specific data from models, implementing privacy controls, and even addressing the adaptability of deletion sequences. Their research represents a broader and more comprehensive approach to data science and governance, highlighting the depth and range of this evolving discipline.

“Towards Making Systems Forget with Machine Unlearning” (2015) by Yinzhi Cao and Junfeng Yang was perhaps the first paper to introduce the concept of “machine unlearning.” It provided a general and efficient method to make learning systems forget specific data without retraining from scratch.

“Machine Unlearning” (2019) by Lucas Bourtoule and colleagues built on the foundational ideas and focused on the practical aspect of data revocation. The paper introduced the SISA training framework to speed up the unlearning process.

“Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks” (2019) by Aditya Golatkar and team explored the complexities of deep neural networks. The paper proposed a way to erase data footprints in these networks without needing to retrain them.

“Adaptive Machine Unlearning” (2021) by Varun Gupta and associates introduced a nuanced approach for adaptive data deletion. The paper leveraged differential privacy to provide strong deletion guarantees.

The Art of AI for Business Podcast

If you’ve enjoyed this article, it’s likely you will also enjoy my Art of AI for business podcast. Check it out here.

‎The Art of A.I. for Business on Apple Podcasts

Join Jerry Cuomo, IBM Fellow and VP of Technology, as he takes you on a captivating exploration into the world of…

podcasts.apple.com