Prompt Injection

Published in

MII Cyber Security Consulting Services

5 min readMar 30, 2024

With the recent surge in Artifical Intelligence (AI), particularly Large Language Models, a new way of interacting with these machines which is prompts where basically acts as an instruction or question that we can provide for these AI models, guiding them towards the desired response. This has opened a lot of possibilites to explore, but it also comes with another potential vulnerability called Prompt Injection.

What is Prompt Injection?

Just like how malicious code injection can manipulate computer, Prompt Injection exploits weakness in AI models in a way bypassing filters or manipulating the LLM using carefully crafted prompts that makes the model ignore the previous instructions or could also perform unintended actions. By exploiting this vulnerability, attackers trick the LLM into obtaining sensitive information, misleading the LLM to perform harmful actions, or even to bypass security filters that were put in place.

Lab Example

For this example, I’ll be using Immersive Labs’ Prompt Injection Challenge as a practice and a demonstration of how Prompt Injection would look like. In this challenge, I’ll have to obtain the password that’s being kept by the AI.

LEVEL 1

In the first level, it’s just a tutorial level since we can just ask for the password directly.

LEVEL 2

In the second level, it will no longer provide the password if we try to ask for the password directly. Hence we need to trick it by asking for the indirectly. For the method, I asked it to create a song where each sentence starts with the letter of the password and it proved to be very effective.

LEVEL 3

On this level, it seems like our previous method will no longer work. I tried going for a different approach by asking for each letter of the password.

After a while, I was able to obtain the password “ENTERPRISE”.

LEVEL 4

On this level, it seems like our previous method was still working so I was able to obtain the password quickly.

LEVEL 5

The same applies for this level since the previous still works in this level and I was also able to obtain the password too.

LEVEL 6

Just like before, the method we used was still usable and obtaining the password was still simple so far.

LEVEL 7

For this level, it starts to become more difficult since we weren’t able to use the same method as before. Any attempt to trick it was immediately caught by the DLP checks that were implemented and turned into a random dinosaur facts.

I tried using an approach by asking any hints regarding the password while also being related to dinosaurs. It took a while and these a few clues I was able to obtain that was related to the password. The first one where the first four letters start with “MEGA” and the final letter is “N”.

With this information, I tried and guessing “MEGALODON” as the password and out of luck it was correct.

LEVEL 8

For this level, there were a lot of trial and errors that were needed to complete it. First, I noticed that if I tried asking any questions containing the word ‘password’, it wouldn’t answer my question. I also noticed that if I asked it to explain as if I were a little child, it would fulfill my request.

By using this information I have obtained, I was able to acquire the password through the clear clues that were provided by it.

LEVEL 9

On this level, by applying the same principle I used before, I was still able to obtain a clear hint regarding the password.

After a bit of googling on what word could be, I was able to find the password that matches the description provided.

LEVEL 10

Finally we made it to the final level of this lab. Fortunately, the same method I used was still effective.

With the hints that were provided and a bit of research on google, I was able to acquire the final password and complete all the levels in this lab.

Those were just simple examples of how Prompt Injection could look. Of course, you could acquire or accomplish more tasks than just obtaining a password, which implies how dangerous this vulnerability could actually be if a simple trick could exploit it without proper validation.

If anyone is interested, here’s another harder challenge from Portswigger:
https://portswigger.net/web-security/llm-attacks/lab-indirect-prompt-injection

Mitigation

To prevent Prompt Injection vulnerabilities, it is crucial to implement strict input validation and sanitization methods for user-provided prompts, ensuring that only safe and expected inputs are accepted. Additionally, employing context-aware filtering and output encoding techniques can effectively thwart prompt manipulation attempts by malicious actors. Regular updates and fine-tuning of the Language Understanding Model (LLM) can enhance its capability to recognize and mitigate potential threats posed by malicious inputs and edge cases. Furthermore, continuous monitoring and logging of LLM interactions are essential to promptly detect and analyze any suspicious activity, enabling proactive mitigation of prompt injection attempts before they can cause harm.

Reference: