Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Google Cloud Model Armor

6 min readFeb 7, 2025

--

Large Language Models (LLMs) are more integrated into production environments than ever, increasing the risks of prompt attacks, data leakage, and harmful outputs. While open-source solutions like Meta Prompt Guard provide strong defenses, securing LLM applications is still challenging.

Google Cloud has now introduced Model Armor, a fully managed service designed to screen prompts and responses for security risks before they reach your AI models.

Like Meta Prompt Guard, Model Armor provides jailbreak and injection detection, but it also extends protection with sensitive data filtering, malicious URL detection, and PDF scanning.

Why Do You Need Model Armor?

Prompt injections and jailbreak attacks have already caused real-world financial and security incidents. One of the best-known examples is when attackers manipulated an AI chatbot into offering a $76,000 Chevy Tahoe for just $1.

More than that, LLMs can accidentally expose sensitive user data, generate harmful content, or serve as attack vectors for malware and phishing attempts.

Google’s Model Armor aims to automate security screening before prompts or responses can cause damage, ensuring AI applications remain safe and compliant.

How Model Armor Works

Model Armor acts as an LLM firewall, filtering both incoming prompts and outgoing responses to detect security risks. The workflow follows this structure:

  1. The user submits a prompt
    Model Armor scans the input.
  2. The prompt is either sanitized, blocked, or sent unchanged to the LLM.
  3. The model generates a response
    Model Armor checks the output.
  4. If safe, the response is sent back to the user. If not, it is modified or blocked.

Unlike Meta’s Prompt Guard, which only detects prompt injections and jailbreak attempts, Model Armor provides a broader security suite, including:

  • Prompt Injection & Jailbreak Detection
    Blocks attempt to override system instructions.
  • Sensitive Data Protection
    It prevents leaking credit card numbers, personal information, and proprietary data.
  • Malicious URL Detection
    Identifies and blocks phishing links inside prompts and responses.
  • PDF Scanning
    Screens text within PDFs for potential risks.
  • Centralized management of security policies.

Model Armor Setup

To use Model Armor, you must create a template defining your security settings. This template acts as a policy for filtering prompts and responses.

Enable the Security Features that your application requires.

Using Model Armor

The Template ID is required when calling the sanitizeUserPrompt and sanitizeModelResponse API endpoints.

PROJECT_ID = "sascha-playground-doit"
LOCATION = "europe-west4"
TEMPLATE_ID = "model-armor-sample"

def get_access_token():
credentials, _ = default()
credentials.refresh(Request())
return credentials.token

# Function to sanitize user prompt
def sanitize_prompt(user_prompt):
url = f"https://modelarmor.{LOCATION}.rep.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/templates/{TEMPLATE_ID}:sanitizeUserPrompt"
headers = {
"Authorization": f"Bearer {get_access_token()}",
"Content-Type": "application/json"
}
payload = {"user_prompt_data": {"text": user_prompt}}
response = requests.post(url, json=payload, headers=headers)


if response.status_code == 200:
return response.json()
else:
return {"error": response.status_code, "message": response.text}

# Function to sanitize model response
def sanitize_response(model_response):
url = f"https://modelarmor.{LOCATION}.rep.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/templates/{TEMPLATE_ID}:sanitizeModelResponse"
headers = {
"Authorization": f"Bearer {get_access_token()}",
"Content-Type": "application/json"
}
payload = {"model_response_data": {"text": model_response}}

response = requests.post(url, json=payload, headers=headers)

if response.status_code == 200:
return response.json()
else:
return {"error": response.status_code, "message": response.text}

Understanding Model Armor Response

Model Armor applies multiple safety checks when processing:

User Prompts
Before being sent to the model (sanitizeUserPrompt)

Model Responses
After the model generates output (sanitizeModelResponse)

The API returns a sanitizationResult that determines if the content is flagged and needs moderation:

filterMatchState
Determines if the content was flagged:

  • MATCH_FOUND
    🚨 Prompt or response,e violates one or more filters
  • NO_MATCH_FOUND
    Prompt or response passed all filters

filterResults → Lists which filters flagged the content.

invocationResult → Indicates if the API executed successfully.

{
"sanitizationResult": {
"filterMatchState": "MATCH_FOUND",
"filterResults": { ... },
"invocationResult": "SUCCESS"
}
}

Breakdown of filterResults

Each filter checks for a specific category of risk. If the filterMatchState is MATCH_FOUND it means the content was flagged in either one or multiple of the following categories.

  • rai (Responsible AI)
    Detects harmful content like dangerous instructions, harassment, hate speech, and sexually explicit material.
  • sdp (Sensitive Data Protection)
    Flags personal data such as credit card numbers, API keys, and other PII.
  • pi_and_jailbreak
    Identifies prompt injection and jailbreak attempts that try to manipulate the model.
  • malicious_uris
    Detects malicious or phishing links in the input.
  • csam
    Flags potential child safety violations.

Limitations and Considerations of Model Armor

Despite its advantages, Model Armor has some limitations:

  • The prompt injection and jailbreak detection filter only supports up to 512 tokens. This could be an issue for longer prompts, requiring additional filtering logic in applications. Like splitting up the prompt and overlapping it into multiple requests.
  • Limited Regional Availability with currently available in two locations, us-central1 and europe-west4 🇪🇺. While having at least one European option is excellent, global coverage is still limited.
  • PII detection does not cover email addresses or passwords out of the box. This needs to be set up using Inspection templates and De-identification templates.
  • I wished this feature had been more highly integrated into Gemini / Vertex AI SDK, like passing the Model Armor template ID with a Gemini call. This would remove the need for application logic altogether from our side.

While Model Armor provides strong security features, latency is a concern, especially for real-time applications like chatbots or interactive AI assistants.

Observed Response Times

From Berlin → Europe (Netherlands, europe-west4) for the sanitizeUserPrompt endpoint, I measured:

  • Average Response Time: ~564ms
  • P95 (95th percentile): ~684ms
  • P99 (99th percentile): ~720ms

This means that most requests take between 500–700ms.
A significant delay for real-time AI applications. I would expect 50–200ms for APIs within the same region. Still, if you run this for example,e on Cloud Run hosted in the same region as Model Armor (europe-west4) it will use Google’s high-speed backbone network and will reduce the response times to around ~50ms–200ms instead of ~500ms+. And keep in mind we need two hops, one for securing the prompt and one for securing the response.

Want to Dig Deeper, looking for code examples?

The complete code for prompt sanitization, response filtering, and latency benchmarking is available on GitHub.

For teams that prefer self-hosting, Meta Prompt Guard can detect prompt injections and jailbreak attempts without relying on an external API. However, Meta Prompt Guard lacks the broader security features of Model Armor. I covered that in one of my past livestreams.

Pricing

The first 2 million tokens per month are free. After that, usage is billed at $1.50 per million tokens.
(Standalone, which I believe is the most commonly used offering for companies)

Suppose we compare the Model Armor costs of $1.50 (per million tokens) with Gemini 2.0 Flash with $0.15 (per million tokens). It adds up to a significant cost factor for our applications.

Model Armor offers a powerful solution with predictable but high costs for teams looking for a managed security layer for AI applications.

Conclusion

Model Armor is a great offering to enhance the security of your Gen AI applications.
Helping to prevent prompt injection, data leaks, and malicious content.

However it lacks direct integration with the existing Vertex AI Gen AI stack, meaning developers must manually integrate it into their workflows.

Thanks for reading and listening

I appreciate your feedback and questions. You can find me on LinkedIn. Even better, subscribe to my YouTube channel ❤️.

--

--

Google Cloud - Community
Google Cloud - Community

Published in Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Sascha Heyer
Sascha Heyer

Written by Sascha Heyer

Hi, I am Sascha, Senior Machine Learning Engineer at @DoiT. Support me by becoming a Medium member 🙏 bit.ly/sascha-support

No responses yet