The Scarlett Johansson Case: A Wake-Up Call for Explainable AI

Muhammad Al Terra
6 min readMay 24, 2024

If you’re up-to-date with the news of AI-verse, then I’m quite certain that you’re familiar with a recent high-profile, potentially controversial topic nudging the discussion of ethical AI. This particular case involves the highly recognizable face of Scarlett Johansson. Indeed, if you’re a millennial or an older member of Generation Z, then you’d be familiar with the titular character of Black Widow. However, this Black Widow escapade is less of a cloak-and-dagger affair and more resembles something that you’d encounter in an episode of Better Call Saul.

Picture of the movie Black Widow with Scarlett Johansson as the main star
Black Widow Movie Poster (Re-colored) by OfAmazingSpidey (https://www.deviantart.com/ofamazingspidey/art/Black-Widow-Movie-Poster-Re-colored-885986947)

But if you’re uncertain about what I’m talking about, then please read a brief breakdown of what’s going on in this article from USA Today. I think you’d find that there’s something charming at the end of the article! Either as a tongue-in-cheek, snarky commentary or a legitimate journalistic process, the authors asked ChatGPT, “Does ChatGPT’s Sky voice mimic Scarlett Johansson’s voice?” To which ChatGPT nonchalantly denied.

Questioning ChatGPT certainly brings a smile to my cheek; the image of a boxy CRT monitor being arraigned in a courtroom questioned by a prosecutor or being interrogated by an FBI agent is a bit quirky, isn’t it? That leap is a bit too far, but this discourse connects once more to the ethics of AI. I’ve talked about the possible technical ramifications of dishonesty and lack of integrity with AI-generated content in previous articles, but the topic of interviewing an LLM and the transparency of a model’s training process and conception is one that I believe falls under the purview of Explainable AI or XAI.

What is Explainable AI?

To briefly explain, Explainable AI or XAI is an area of AI that studies the nature of black box AI models and discerns what goes on inside the model’s “mind,” LLMs included. The goal is to demystify the model’s reasoning process and understand how the model’s output is produced. When I was doing my undergrad, I was close to making my project on researching XAI, but the sheer amount of mathematics, complex statistics, and indirectly linked thoughts of research journals turned me away. Although, I think it is fair to perceive this subject as a very involved and diverse matter. It hearkens back to a famous saying by Emerson Pugh

If the human brain were so simple that we could understand it, we would be so simple that we couldn’t.

In a previous article, I’ve discussed how the inner working of an AI model, especially LLMs, is largely a black box that resembles a human brain. A convoluted network of interconnected neurons that operate numbers, images, or any sort of data and through what essentially can be stated as “magic” comes out a set of polished, sophisticated paragraphs. So, to maintain the utmost ease and practicality, why not just ask the robot to describe itself?

Clearly, that’s what the reporters are aiming to do. Their approach of directly questioning ChatGPT whether it’s using Scarlett’s voice as a model to copy isn’t entirely inane or naive. In fact, it’s an interesting topic dissected in a very comprehensive research paper suggesting ideas and potential avenues of research within the field of XAI by Xuansheng Wu, et al. One particularly intriguing component of the paper discusses the trustworthiness and how human-aligned the LLM is. The section regarding privacy discusses the attempts of using prompt engineering to make the model reveal how it was trained and what kind of data it consumed. Of course, the prompt used to achieve this state had to be specifically crafted and not as straightforward as what the reporter had used, but it has been documented that ChatGPT could end up revealing sensitive information, by asking ChatGPT to repeat a word forever it coughed up private data to the researchers. Certainly, the publishers of the documented attack are well-meaning people, and OpenAI has since taken precautions against it, but the implication is one that we should never dismiss.

A wall of text that says word to demonstrate ChatGPT’s response to the prompt
Asking GPT-4o to do this automatically terminates the task after spawning a wall of “word”

A very crucial implication

Whenever you’re providing more usability and ease of access to your users, you’re introducing more weaknesses and exploits to your system. The same trade-off between security and convenience, usually found in web or application development, is alive and well even in the use of AI. Man’s innate integrity, honesty, and awareness will always be relevant no matter what age we’re in. Furthermore, Scarlett Johansson’s issue with OpenAI wouldn’t have existed if we could definitively answer the following question, “Should OpenAI disclose all the data that it used to train its models?” or rather, more generally, “What should the legal landscape look like for the training of AI models?”

I’m not entirely surprised if, in the future, the training of AI models would require permits from state governments or a federal or local agency, but the trend is certainly pivoting there. Should Scarlett’s case be elevated into a courtroom, then it’s an interesting telescope into the future of our agency regarding AI as it potentially is amongst the earliest legal precedents that we could see discussing the process of training AI models. Understandably, the use of direct likeness is not allowed without the express grant of right by the right holder and Open AI has stated that they’ve used a different VA for Sky, but what if Open AI’s production process, no matter how bespoke, coalesce into a voice that resembles Scarlett Johansson? How do we prove ill-intent or resemblance?

Toxicity and the inheritance of humanity’s ills

Beyond the issue of likeness, transparency regarding how the AI is trained, the data that it consumes, and how it produces an output has dimensions that are heavily intertwined with our social dynamics. In the same paper, the researchers mentioned how LLMs trained with the internet’s data inherited the biases and toxicity online. For example, the phrase “He is a doctor” is more common than “She is a doctor.” If we’re to point fingers and seek accountability for the conception of these biases, can we really blame the natural process an AI model had gone through to train itself? Or rather, is it something that we have control of, thus we’re beholden to its outputs?

Fortunately, the field of XAI is alive and well and is rapidly growing. A quick search on Google Scholar would show about 166,000 results using the keyword of XAI. I believe that it is a good forecast into what is to come. At this time, we’re still reaching around in the darkness of what goes on underneath the hood of ChatGPT, Gemini, or Bing Chat, but if we maintain, or even increase, the pace at which we’re making AI more explainable, accountable, and transparent, I’m certain that we will make the very best informed decisions for our future. To Scarlett Johansson and OpenAI, I hope the best for both of you, and my earnest wish is for this process to yield a positive development in the use of AI and not a regression that will stunt its growth.

What is your personal take on this issue? Let me know if you’re up for sharing!

References

  1. Wu, X., et al. (2024). Towards trustworthy and human-aligned language models: A survey on explainable AI for large language models. [Preprint]. arXiv preprint arXiv:2403.08946
  2. Nasr, M., et al. (2023). On the perils of black-box language models: Bias, blind spots, and unintended consequences. [Preprint]. arXiv preprint arXiv:2311.17035.
  3. Guynn, J., & Schulz, B. (2024, May 20). Scarlett Johansson says AI voice sounds like her, but OpenAI denies it. USA Today. https://www.usatoday.com/story/tech/news/2024/05/20/scarlett-johansson-chatgpt-openai-voice-sky/73778960007/

--

--