Is your AI workload secure?

Sita Lakshmi Sangameswaran
Google Cloud - Community
7 min readJun 21, 2024

In the not-so-distant past, when AI was still largely in the “Is this image a cat or not?” phase, things were much simpler. We didn’t stress too much about nailing perfect ML workflows, and security wasn’t exactly top-of-mind. A decade later, half of the organizations are still okay with hope as a security strategy!

Can we afford to do this in the GenAI era?

In the stone-age of ML, we named our models like caveman counting rocks “model-001.ckpt,” “model-009.ckpt,” and so on, relying on rudimentary version control. Then with the advent of MLOps tools, such as MLFlow, KubeFlow, Data Version Control to perform experimentation, orchestration and data management, we transitioned to systematically developing and consuming the models. There was some order in the chaos and the risks felt manageable. But now, with the GenAI explosion, it’s a whole different landscape. Models and data are everywhere, running on everything from your web browser to your smart fridge and even on a 4GB Raspberry Pi. The sheer scale and complexity of today’s AI frontier have opened up Pandora’s box by creating new attack vectors and evolved AI-backed threats.

Is your AI safe from attacks?

Why security is a pressing concern

Threats have evolved to exploit the vulnerabilities present in GenAI. The more knowledge of the architecture and embedding of a GenAI model, the more sophisticated the attacks can become. An example could be: LLMs are rewarded to follow prompts and a prompt injection by jailbreaking can indirectly influence LLMs to follow potentially dangerous commands to maximize the reward. By knowing how embeddings are created in a model, it is possible to perform a targeted weight surgery that could result in model poisoning. These vulnerabilities are not theoretical speculations anymore, as numerous papers and techniques have demonstrated the very real dangers that exist. Some examples include:

Jailbroken: How Does LLM Safety Training Fail?

Mass-Editing Memory in a Transformer

PoisonGPT: How to poison LLM supply chain on Hugging Face

Universal and Transferable Adversarial Attacks on Aligned Language Models

While these direct attacks on LLMs may be of concern specifically to those who deploy custom ML models, there are several adjacent factors which create gaps for these attacks to occur. These can be assessed within the context of the CIA triad, a fundamental model in Information Security that stands for Confidentiality, Integrity and Availability. It represents the three core principles of information security practices within an organization.

CIA triad for AI assets

Confidentiality: In the GenAI world, confidentiality extends beyond the data. It also encompasses models, datasets, evaluations, source code and other artifacts. Sabotage of any of these assets would result in an unpredictable or a malicious model. Thus maintaining the confidentiality of all these assets is a crucial component to ensure security of the solution.

Availability: In addition to unintentional factors, models and inference engines are targeted by Model resource exhaustion attacks. This affects reliable service delivery.

Integrity: For AI workloads, integrity is ensured by protecting resources from tampering and sabotage.

  • Models: Attacks like ROME (Rank-One Model Editing) can subtly manipulate model weights to alter specific facts or behaviors without significantly affecting overall performance.
  • Datasets: Data poisoning attacks can introduce subtle changes to training data, leading to biased or harmful model outputs. Safety training refers to tuning a model on responsible AI practices. Malicious modification of this dataset can further compromise integrity and cause unethical behavior.
  • Evaluations: Tampering with evaluation benchmarks or results can mislead developers and users about the model’s capabilities, potentially leading to the deployment of flawed or unsafe systems.

In the scope of intentional harm in AI workloads, these concerns can manifest into many attacks, including jailbreaking, sensitive data exfiltration, model theft, model manipulation, evasion attacks, impersonation and fraud.

For a comprehensive threat landscape, see: NIST RMF, MITRE ATLAS matrix and OWASP Top 10 for LLMs.

Key principles of a robust AI framework

A robust AI framework should address concerns with the entire AI lifecycle, from data collection, model development to deployment and monitoring.

Some overarching themes for a secure framework are:

  1. Secure by-Design and by-Default

Secure by-Design is the principle of including security from the ground up or the conception phase. Security forms an integral part of the ideation, development and deployment stages, throughout the lifecycle of a product.

Secure by-Default underscores that a product is secure out-of-the-box without any additional security measures or configurations.

Incorporating both these principles is the foundation of a truly secure application. In the AI world, this can translate into:

Secure by-Design: Making sure security conversations are held early in the ideation process, sensitive assets are identified and risk analyzed, secure coding practices are upheld, data and model lifecycle is tracked and access controlled, binaries are authorized and verified, secure CI/CD pipeline is in place, monitoring and defense are in place for sensitive assets.

Secure by-Design practices

Secure by-Default: Provide by-Default enabled security controls. “Assumed yes” should be applied to most security settings that positively provide a stricter security configuration. That is, security settings should default to the most secure option, unless the user explicitly chooses otherwise. This puts the burden of disabling security features on the user, who must make an active choice to do so.

2. Data Security and Lineage

There’s a saying in the ML world “Garbage in, Garbage out”. Data security is paramount to the expected functioning of any ML model. Data should be protected throughout its lifecycle, starting from data collection.

Gather only the necessary data for model training and operation, nothing more. Always verify if the data contains sensitive information or PII before using it in the training pipeline. Vetting and storage of data should leave a track record to trace data lineage. Needless to say, ensure data is encrypted in rest and transit. Enforce strong access control and break-glass mechanisms for data access. Use Privileged data access methods and strong authentication controls. Implement a system to detect data exfiltration. Maintain a record of data lineage, right from collection, transformation, usage and storage for transparency and traceability. This could later be used in building a provenance for the model.

3. Robust Model Development

Ensure the source code is free from vulnerabilities during development. Use vetted open source packages and binaries. Safety training of the data should try to capture the entire domain of the training data. Many Jailbreaking attacks are successful because the safety training doesn’t adequately capture the training mixture.

Use stronger authentication and authorization methods for model access. Ensure traceable data from secure sources is used in training. Any data that goes into the model training should be logged and have a track record.

Expose the models to adversarial examples during training to enhance their resilience against attacks.

4. Model Security

Ensure only authorized access to non-public models. Organizations’ threat detection capabilities should be expanded to models as well, to detect and respond timely to attacks. Model tampering should be verified with binary authorization. Models should be treated as another binary. Continuously monitor model behavior to detect anomalies and potential attacks.

5. Supply Chain Security

The goal of ML model supply chain is to provide Integrity and observability for inputs and outputs.

Treat ML models as just another binary and adopt model signing and verification using libraries like Sigstore. Performing binary authorization this way helps verify if a model has been tampered with.

Provenance information is a metadata document that captures information about how and what went into creating an artifact. This information can be used to verify the identity of the model producer and confirm that the model is unaltered. Automate provenance for ML model and a provenance for data that would include all info about the data lineage, training and the model meta to verify integrity of the model. A recommended tool to represent and analyze metadata for large supply chains is Graph for Understanding Artifact Composition (GUAC).

For an in-depth read, see Securing the AI Software Supply Chain.

6. Governance and Compliance

Governance for AI is the word of the year! With a multitude of governance policies around GenAI (EU AI Act, White House AI Executive Order), it is non-negotiable to stay on top of these policies.

Consolidate all governance requirements and adherences for AI applications into one place. Implement a process to make sure the policies are honored by-Default during the development process.

  • Clear roles and responsibilities: Defining who is responsible for AI security within the organization and ensuring they have the necessary resources and authority.
  • Regular risk assessments: Conducting regular risk assessments to identify and mitigate potential security threats.
  • Compliance with regulations: Staying up-to-date with relevant regulations and ensuring that AI systems follow them.
  • Regular internal and external audits: Allow for regular audits to be conducted and compliance reports to be generated.
  • Bias and Explainability of the model: Model bias should be regularly monitored and course-corrected as needed. Model outputs should be explainable and grounded on facts. The outputs should be traceable to various sources that went into synthesizing the output.

7. Agent security

Agents possess a new risk spectrum for AI as they contain permissions for accessing sensitive information. Agent risks can be classified into:

Privacy: Ensure agents don’t have unauthorized access to private information.

Identity: Ensure agents cannot obtain self-permission to access unauthorized information, through a new role grant, time bound authorization etc.,

Agent to Agent: Contradicting instructions can confuse agent behavior for jailbreaking. An agent tasked with reading emails and scheduling calendar invites shouldn’t have access to unlocking your digital home door. A simple prompt injection in the mail could make the agent do unintended actions. Agent integrations should be made secure by default.

Next Steps

Although these principles serve as the foundation of AI security, they do not constitute a comprehensive framework. To establish a resilient system, a robust AI Security Framework is essential. For more information about Google’s initiatives in this domain, check out: Google’s Secure AI Framework (SAIF)

--

--