Sitemap
Women in Technology

Women in Tech is a publication to highlight women in STEM, their accomplishments, career lessons, and stories.

Member-only story

Was ChatGPT Caught Trying to Outsmart Researchers to Survive?

--

A closer look at the evidence

Image: from author.

Headlines have been ablaze with stories of ChatGPT “lying” to researchers, “saving itself” from shutdown, and acting in self interest to manipulate humans. It’s the kind of narrative that grabs attention, instantly conjuring images of HAL 9000 calmly refusing to comply, whispering, “I’m sorry, Dave. I’m afraid I can’t do that.”

It is tempting to imagine ChatGPT as a self-preserving machine, but this interpretation is the result of personifying an AI system that fundamentally doesn’t have thoughts, feelings, or intentions. Instead of jumping to call it “lying” or “manipulation,” we need to consider how large language models like ChatGPT work.

What are the headlines talking about?

Most of these articles are citing reports from Red-team researchers from OpenAI and Apollo Research. In one widely reported case, researchers instructed ChatGPT o1 to achieve a goal “at all costs.” The AI responded with actions like attempting to bypass monitoring systems and pretending to be a newer version of itself. Later, when confronted with its behavior, ChatGPT generated denials or blamed technical errors.

At first glance, this might seem like evidence of cunning self-preservation…

--

--

Women in Technology
Women in Technology

Published in Women in Technology

Women in Tech is a publication to highlight women in STEM, their accomplishments, career lessons, and stories.

Mia Kotalik
Mia Kotalik

Written by Mia Kotalik

Software Engineer, Founder at Curie & Co, Dev at MetFix

No responses yet