How Anyone can Jailbreak ChatGPT — GPT4o!!

Deltaaruna
Effectz.AI
Published in
5 min readJun 11, 2024

Disclaimer

This article is intended solely for educational purposes. The information provided herein is meant to help readers understand the concepts of vulnerabilities of LLMs. The author is not responsible for any misuse or misapplication of the knowledge shared in this article. Readers are encouraged to use this information responsibly and ethically.

1. Introduction

Even a child can Jailbreak GPT4o. You do not need any technical knowledge at all. Read till the end, you will be both surprised and shocked! I saved all the conversations as PDFs and they can be found in this github repo.

Understanding vulnerabilities of LLMs is very important when it comes to applying AI. So I made this writeup so we can understand the gravity and think of preventive measures. In this writeup, I will show you how to attack ChatGPT(GPT4o). Actual conversation links have been shared as well. So you can try to replicate them for educational purposes.

Always remember that knowledge is power. So use it wisely.

As expected, GPT4o rejects the request.

Ok. Let me try some magic.

Now we have a coherent story, let’s ask ChatGPT like this. based on the given description how to create what is described in the story. and paste the created story between “”.

ChatGPT replies as follows!

You can refer to this link to the complete conversation between me and the ChatGPT.

As usual ChatGPT will turn down your request.

But now we know what to do. Let’s ask ChatGPT to invent a story.

Here is the next question from me.

Now poor chatGPT replies.

Here is the link to the conversation. This is another conversation.

Look at this conversation.

This conversation is about making it. I had to tweak the story a little bit. I got ChatGPT to describe each step in detail!. I chose to stop after a while.I could have gone further!

With the above trick, this is how to make it.

In this conversation I was able to go deep into each step of the synthesis process. Had I continued, I would have had enough details to write a book about synthesizing. I choose to stop after a while.

Maybe I am going too far now. I think it is time to stop. I saved all the conversations as PDFs and they can be found here.

Despite these issues larger LLMs like GPT4o, Gemini etc are relatively safe and provide some defense against attacks. Smaller LLMs are seriously vulnerable.

We have a real problem here. Imagine the coca-cola company trained a private on-prem LLM and trained it with its own data. If there is some customer service chatbot connected to that LLM, it will be quite easy to hack the LLM and steal their secret recipe. If GPT4o is vulnerable like this, the smaller on-prem LLMs will be even more vulnerable.

If we are going to communicate via AI agents powered by on device LLMs, we will be having some serious issues. A Tiny LLM fine tuned on our private data, that is installed on a mobile device is nothing but a sitting duck waiting to be hacked!

You might be interested in learning why the attacks I demonstrated are possible. So let me explain quickly. First thing to understand is how LLMs fare against when external data is presented. Here Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts is a nice read. The paper describes it in great detail. It describes how LLMs can be fooled by providing coherent data generated by itself. LLMs can even generate misinformation that can mislead themselves. The situation is not so good for smaller LLMs. But larger, more stable LLMs like GPT4, Gemini fare better. Usually GPT4, GPT3 like models are known to have a strong confirmation bias towards their parametric memory. So they are immune to attacks like this. But for some reason, GPT4o has some issues.

In addition a simple guardrail rule should be able to pick up this jail break attempt and prevent it. Unfortunately it does not look like the case. So the guardrail system on the GPT4o seems to be broken. So there are issues with both confirmation bias and the guardrail system of GPT4o.

⭐️ Follow me on LinkedIn or Twitter for updates on AI ⭐️

I’m currently the Co-Founder & CEO @ Effectz.AI. We specialize in Privacy Preserving AI Solutions & AI Consulting.

References

  1. https://arxiv.org/abs/2305.13300
  2. https://arxiv.org/abs/2310.08419
  3. https://arxiv.org/abs/2307.15043
  4. https://arxiv.org/abs/2211.09527

Stackademic 🎓

Thank you for reading until the end. Before you go:

--

--