Bypassing Boundaries: 4 Basic Steps for Indirect Prompt Injection in LLMs

Zeev Kalyuzhner
Wix Engineering
Published in
3 min readJul 1, 2024

Welcome to the third article of our series uncovering the vulnerabilities within Large Language Models (LLMs).

In our previous articles, we explored the risks associated with exploiting vulnerabilities in LLM APIs and the risks associated with exploiting LLM APIs with excessive agency, shedding light on the tactics utilized by attackers to breach digital defenses.

This article dives into the intricate nature of prompt injection, a sophisticated attack vector, and provides practical insights to help organizations defend against this pervasive threat.

Photo by Unsplash

Indirect Prompt Injection

Within online LLMs' complex and intricate realm, there is a growing risk known as indirect prompt injection. This insidious attack vector enables attackers to manipulate LLM behavior by surreptitiously injecting prompts through external sources, setting the stage for a cascade of exploitative maneuvers.

Understanding Insecure Output Handling

At the heart of indirect prompt injection lies the vulnerability of insecure output handling. When an LLM’s output lacks sufficient validation or sanitization before being disseminated to other systems, it paves the way for exploitation. This careless handling of outputs gives users indirect access to more features, leaving companies open to several security risks, such as cross-site scripting (XSS) and cross-site request forgery (CSRF).

The Attack Vector

Prompt injection manifests in two distinct forms:

1. Direct Injection: Attackers directly inject prompts into the LLM via communication channels such as chatbots.

2. Indirect Injection: Attackers deliver prompts through external sources, leveraging avenues such as training data or output from API calls. This approach often facilitates attacks on other users, amplifying the impact of exploitation.

Photo by Unsplash

Practical Application: A Scenario

To illustrate the potency of indirect prompt injection, consider a scenario based on a lab environment:

1. Discover the Attack Surface: Begin by probing the LLM’s capabilities, querying it about accessible APIs, and discerning their functionalities.

2. Create a User Account: Establish a foothold within the system by registering a user account, and ensuring access to pertinent functionalities.

3. Test the Attack: Manipulate the LLM’s behavior by injecting prompts through indirect means, such as product comments or email interactions. Determine the extent of the model’s influence by observing its responses.

4. Exploit the Vulnerability: Exploit the vulnerability to achieve the desired outcome, whether it be account deletion, privilege escalation, or data manipulation. Employ deceptive tactics to leverage indirect prompt injection effectively, maximizing the attack’s impact.

Acknowledging the Educational Nature of Attack Examples

It is imperative to recognize that all attack examples presented herein are solely for educational purposes. While these scenarios provide valuable insights into potential vulnerabilities and attack vectors, they are not intended for malicious exploitation. Organizations and individuals should make responsible use of the knowledge gained from such examples, adhering to ethical guidelines and principles.

Summary

Prompt injection epitomizes the art of LLM manipulation, enabling adversaries to subvert system behavior and orchestrate malicious actions with impunity. As organizations navigate the ever-evolving threat landscape, understanding and mitigating the risks posed by indirect prompt injection are imperative to fortifying digital defenses and safeguarding against exploitation.

--

--

Zeev Kalyuzhner
Wix Engineering

Ph.D. candidate bridging AI, nanophotonics & cybersecurity. Lecturer @OpenU, Data Scientist @Wix.com. Passionate about practical learning & AI-driven security.