Securing AI Model Weights: Q&A with Sella Nevo

It’s becoming more and more important to safeguard AI from bad actors. But what does that look like in practice? And where should you begin?

Published in

RAND

6 min readJul 30, 2024

Digital image of a lock on a colorful background. Photo by da-kuk/Getty Images — Photo by da-kuk/Getty Images

Considering the transformative potential and immense commercial value of artificial intelligence systems, it’s becoming more and more important to safeguard AI from bad actors. But what does that look like in practice? And where should you even begin?

Let’s start with the basics: What is an AI model, and how does it work?

AI models, and more specifically machine learning models, are different than classic software. Instead of telling the computer exactly what to do, programmers give the model high-level instructions on how it should learn. The model itself does the rest of the work. It goes through large amounts of data and determines promising ways to solve the problem at hand. When developing advanced frontier models, AI companies will spend hundreds of millions of dollars to train the model on many terabytes of data.

What are AI model weights, and where do they fit in?

Everything the model learns is encoded in what are informally called model weights. Technically, these weights are numeric variables that the model uses in its calculations when it’s trying to answer a question. But in a more meaningful sense, weights represent everything that the model has learned, everything it knows.

Meselson Center

The Meselson Center is dedicated to reducing risks from biological threats and emerging technologies.

www.rand.org

Why is securing AI systems important?

AI models are rapidly becoming more capable. They’re already incredibly commercially valuable, which is a strong motivation for theft. But as many national governments have acknowledged, AI models may soon be critical for national security, too. They could potentially drive advantages in strategic competition and, in the wrong hands, enable significant harm. For example, multiple organizations are now testing whether new models may enable people to develop biological weapons.

AI systems have many elements that are worth protecting from theft, abuse, and compromise: source code, training data, underlying platforms and infrastructure, algorithmic insights and computing power, and, of course, the model weights.

As many national governments have acknowledged, AI models may soon be critical for national security.

Let’s talk about those. Why is it so vital to secure AI model weights specifically?

Again, securing model weights is far from the only important aspect of securing AI systems, but I do think it’s a critical one. There are several reasons for this.

Model weights represent the culmination of many costly prerequisites for training advanced AI models. Producing these weights requires significant investments in computing power, the collection of large amounts of training data, and years of research by top talent into algorithmic improvements and optimizations. Protecting these prerequisites is important, too, but if you don’t secure the model weights, then attackers can leapfrog them. Bad actors don’t need these prerequisites once they have the weights.

In fact, once an attacker does have the weights, they can pretty much do whatever they want. An attacker would still face a few obstacles and costs, but these are small potatoes relative to everything needed to produce the weights in the first place. And once an attacker clears those low hurdles, they’d have complete control over the model with almost no limitations.

Tell us about your new report. What makes this study unique?

This is the first in-depth report on what would be necessary to secure AI model weights specifically. It’s also the first in-depth set of recommendations specifically for frontier AI labs, as opposed to the tech industry in general. This allows us to discuss security goals that, while infeasible for most startups (and even most companies), may be critical for securing the most advanced AI systems from the most advanced nation states.

Our report is also the first to provide concrete benchmarks for securing AI systems. We assess what caliber of bad actors you can expect to be safe from, depending on which set of security measures you implement.

Finally, we provide the first public set of security recommendations that could be used in AI companies’ various voluntary frameworks for AI responsibility, preparedness, and safety.

What are your main findings?

Frontier AI labs face a wide variety of potential attack vectors that could endanger their model weights. They need a diverse, multi-layered set of security measures to deal with this.

We provide five benchmarks for secure AI systems. These benchmarks include nearly 170 security measures that AI labs can implement. We also highlight several urgent actions that AI labs should consider taking now. These include reducing the number of people who have access to the weights, hardening interfaces to weight access against theft, securing the weights during their use via confidential computing, and engaging in third-party red-teaming exercises — just to name a few.

We also identify measures that are critical in securing models against more-capable actors but would take a long time to implement and deploy. It would be good if that work began soon.

What should decisionmakers at AI companies take away from this report?

First, executives and security teams at leading AI companies can explore the attack vectors we describe. If they think that some of these are infeasible, then they should look at the real-world examples we share that suggest otherwise. Many security experts we talked to were deeply familiar with some attack vectors but unaware or skeptical of others. That can leave their systems vulnerable.

Second, they should compare their existing security posture to the five benchmarks in our report. By identifying which benchmark is closest to their current state, they can better understand which actors they’re likely secure against and, more importantly, which actors they are not.

Third, they can use the benchmarks to identify next steps to improve their security posture. If they’re missing specific security measures (or similar alternatives that make more sense with their infrastructure) to meet a specific benchmark, then we recommend focusing on those first. Once they have achieved a security benchmark, they can look to the next one for further recommendations.

Finally, AI companies can commit to reaching a certain benchmark before developing or deploying models that have certain dangerous capabilities. For example, if a model can develop biological weapons, then it should be sufficiently secured to prevent its theft by cybercrime organizations.

AI companies can commit to reaching a certain benchmark before developing or deploying models that have certain dangerous capabilities.

What about decisionmakers in government? What are the public policy implications?

Our report doesn’t give direct policy recommendations, but the information it provides is useful for policymakers. Most importantly, it allows them to translate what they know and care about into concrete expectations for private companies. For example, a policymaker who’s concerned about a certain AI model falling into the hands of a terrorist group could urge the company to achieve a security level that’s equivalent to a specific benchmark in our report. This would help policymakers get clearer answers on whether the company is or is not willing to invest in keeping its model weights secure. This is relevant whether we’re talking about voluntary commitments, guidelines and recommendations, or even enforceable regulations. (That said, more work would be necessary to translate our benchmarks into a compliance regime.)

This is the first report to be publicly released by the Meselson Center, which is part of RAND’s new Global and Emerging Risks research division. As its director, what can you tell us about what the center will explore moving forward?

We focus on two areas: AI security and biosecurity.

On AI security, we see this report on model weights as the first step. We want to explore how to secure the other parts of AI systems discussed above. We also want to provide policy recommendations for governments that want to make critical AI systems more secure and work to better understand risks from AI in the cybersecurity space.

On biosecurity, we’re working on how to prevent and prepare for severe pandemics, including in governance of dual-use research of concern, nucleic acid synthesis screening, reducing risks from AI-enabled biology, and pathogen-agnostic biosurveillance.

Sella Nevo is a senior information scientist at RAND and director of the Meselson Center, part of RAND Global and Emerging Risks.

How AI Labs Can Safeguard Model Weights

What can be done to protect model weights — the learnable parameters that encode the core intelligence of an AI — from potential attackers?