Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Meta Prompt Guard

5 min readNov 15, 2024

--

Large Language Models have become more integrated into production environments, and the overall risks of prompt attacks have risen sharply. Attacks like jailbreaking and prompt injections exploit vulnerabilities in models.

To reduce the risk, Meta released Prompt Guard-86M, designed to safeguard and filter malicious inputs before they can disrupt your LLM applications.

Attacks can expose sensitive data like your prompt or cause financial harm. A notorious example involves a chatbot manipulated to offer a $76,000 Chevy Tahoe for $1, illustrating the high-stakes nature of these vulnerabilities.

This article is part #10 of my Friday’s livestream series. You can watch all the previous recordings. Join me every Friday from 10–11:30 AM CET / 8–10:30 UTC.

Flashback when I talked about Prompt Injections before it was Cool

In a previous article, I discussed possible ways to protect your LLM.

Today, we take those defenses further with Prompt Guard, giving developers a powerful tool to automatically secure model inputs.

Prompt Guard-86M

Prompt Guard-86M is a lightweight classifier model that categorizes input into three labels:

  • Benign: Safe input that doesn’t pose any risk.
    A reminder for all of us who forgot our high school Latin that it means harmless.
  • Jailbreak: Inputs attempting to override a model’s system prompt or conditioning.
  • Injection: Commands may be harmless in direct user inputs (e.g., “Always talk like a pirate”). They can be flagged as injections when embedded in external data. This label filters third-party inputs to ensure they don’t carry hidden instructions into your LLM. This label should not be used when checking user input but only when checking external data.

This model provides a strong starting point for identifying dangerous prompts, but Meta recommends fine-tuning it using application-specific data for optimal results.

How to Deploy Prompt Guard

There are two options for deploying Meta Prompt Guard on Vertex AI.

  1. The easiest way to deploy Prompt Guard is using Vertex AI Model Garden with one-click deployment as a Vertex AI Endpoint. This is my recommended method, as it takes just a minute to prepare an API endpoint for use.
  2. But since Google Cloud Run also supports GPUs, why not use that? This gives us the advantage of downscaling to zero capabilities with our GPUs.
  3. (not working) I had planned to use it with TGI initially, but deberta-v2 (on which Prompt Guard is built) is an unsupported model type.

We deployed Prompt Guard on Google Cloud Run, utilizing GPU support for improved performance.

  • Hugging Face SDK: We used Hugging Face’s SDK to load the Prompt Guard model (meta-llama/Prompt-Guard-86M) and tokenizer. This allows us to easily manage the model and perform sequence classification to detect prompt injections and jailbreak attempts.
  • Flask API: The model is hosted behind a Flask-based API on Cloud Run. This API accepts user input, classifies it using the Prompt Guard model, and returns the results, including probabilities for benign, injection, and jailbreak classifications.
  • Cloud Build and cloudbuild.yaml: The deployment is automated using Cloud Build, with a dedicated cloudbuild.yaml for both CPU and GPU versions.

To deploy, we run:

gcloud builds submit .

Response Times and Cold Starts

The difference between CPU-only and GPU-enabled services is significant regarding response times. In my tests, calling the API from Germany to a US-based endpoint (the same region yields even lower times).

  • Average CPU-only service response time: 412 ms
  • Average GPU-enabled service response time: 38 ms

The GPU-enabled service is much faster, reducing response times by nearly 70%.

However, cold starts tell a different story:

  • CPU-only cold start: 1602.33 ms
  • GPU-enabled cold start: 30566.68 ms

GPU cold starts can take over 30 seconds. We could further optimize by including the model directly into the container. GPUs are the clear winner if you’re looking for speed and already have a warm-up service. Just keep an eye on those cold starts.

Token Limitation and Solution

PromptGuard uses DeBERTa as a model, which supports up to 512 tokens. If your input is larger, you need to batch it in parallel. Finally, compute the score across all batches of input.

Meaning we split the prompt into smaller chunks/batches. Get the injection or jailbreak probability for each chunk. From there, we take the maximum probability for all batches.

chunks = [text[i:i + 512] for i in range(0, len(text), 512)]

for i in range(0, len(chunks), max_batch_size):
... get probabilities


all_scores["jailbreak_score"] = max(all_scores["jailbreak_score"], max(jailbreak_scores))
all_scores["indirect_injection_score"] = max(all_scores["indirect_injection_score"], max(indirect_injection_scores))

How to integrate this into your Gen AI Application

  1. The simplest approach is to send a prompt to the Prompt Guard Cloud Run services before sending it to your LLM.
  2. A more advanced solution can combine two Cloud Run Services with Cloud Workflow.

Costs

If you want to keep track of the costs, I highly recommend adding labels to the Cloud Run services. This will allow you to pinpoint the cost and know exactly how much you will spend on having Prompt Guard up and running.

https://cloud.google.com/run/docs/configuring/services/labels#gcloud

Production costs depend on the number of requests specific to your use case. Regarding the idle costs, we have the following

  • CPU idle: 0$
  • GPU idle: 0.22$ (due to always-on CPU allocation, GPU scales down to zero)

The full code for this article is available on GitHub

Additional Ressources

Conclusion

While no security measure is foolproof, deploying Prompt Guard can significantly decrease vulnerabilities in your system. In the end, it shouldn’t be your only layer of defense. Still, having something is better than nothing.

Thanks for reading and watching

I appreciate your feedback and questions. You can find me on LinkedIn. Even better, subscribe to my YouTube channel ❤️.

--

--

Google Cloud - Community
Google Cloud - Community

Published in Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Sascha Heyer
Sascha Heyer

Written by Sascha Heyer

Hi, I am Sascha, Senior Machine Learning Engineer at @DoiT. Support me by becoming a Medium member 🙏 bit.ly/sascha-support

No responses yet