ML models security — from MLOps to inference

Maciej Mazur
Ubuntu AI
Published in
6 min readMar 8, 2023

Security is an important part of any computer system. IT security is a well-described and known field. Red teams are constantly trying to find new vulnerabilities and blue teams are hardening their systems and mitigating attacks.

Applications using machine learning are no different. MLOps pipelines, inference servers, and data lakes also need to be secured, CVE patched, and properly hardened according to local regulations (FIPS, DISA-STIG, PCI-DSS, NCSC …).

However, in the field of data science, we have new attack vectors and threats that are less known, and protection against them is not yet an automated routine of every SoC on the market.

Let’s take a look at some new surprises that a red team or an APT group can present next time they target us.

High-end attacks against ML

As we can see in the paper “I Know What You Trained Last Summer”, much more interest and research are happening on the defense side, and the total volume of publications on that topic is low.

Literature on ML model security

As a red team member, it’s an amazing situation, as if I’m a little lucky defenders will have no idea what hit them.

CIA triangle

There can be different goals for the attacker. They can target confidentiality, integrity, or availability.

Confidentiality — target the training data (like model inversion) or the IP behind the model (neural network architecture and hyperparameters extraction)

Integrity — target prediction quality, like trying to increase false negatives in a credit card fraud detection system

Availability attacks — the target is to make a model irrelevant by blocking access to it or increasing its error rate to a point it is useless

Adversarial inputs

Whenever we use a model in an open system (like a traffic camera) or exposed in MLaaS (Machine Learning as a Service) mode we are no longer in control of the inputs. This means an adversary can freely craft any input they want to avoid detection or modify the results. It can be as simple as an “adversarial t-shirt” or complex, large-scale manipulation of Tweet content as part of an INFOOPS trying to manipulate election results.

Depending on the attacker’s access we can observe white-box attacks, where they have access to the model’s parameters, otherwise, we call it a black-box attack.

Adversarial T-shirt

Data poisoning

You can use training data in order to control prediction behavior, especially in the case of the models using reinforcement learning or a simple scheduler that re-trains a model with new inputs every day. It can be used in both white-box and black-box scenarios.

This can be used to alter the way how recommender systems work on social media, online stores, video hosting services, or your dating app account.

This can be achieved by sophisticated automation and tools to C&C (Command and Control) an attack or just a simple “troll farm” which you hire on the dark net to input product opinions and undermine your competition manually.

Model stealing techniques

Model stealing is the scariest type of attack on a business. Imaging training a model to trade stocks as a hedge fund, or a model for targeted therapies as a top pharmaceutical company. You need to gather tons of data, some of it proprietary for your company, and then spent cloud costs and a year of labor in your data science team. Then you expose the model, and in 10 minutes your competition can benefit from your multi-million investment without any upfront cost because they got the model.

There are different objectives of such attacks:

Model stealing objectives

There are many examples of how model stealing works, one example is the paper CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples

Model stealing process

the process is:

A- generate unlabeled adversarial examples as a synthetic dataset

B — query the victim model using the generated synthetic dataset

C — label adversarial examples according to the output of the victim model

D — train the local substitute model using the synthetic dataset

E — use the local substitute model for predictions. The local substitute model is expected to match the performance of the victim model

How to defend

Detection

Watermarking

Watermarking is a well-known technique, used ie. on photos or money. Model watermarking is a way to prove ownership of the model. You can achieve that by embedding data that are known only to you (specific parameters, weights, additional detection capabilities ie. for your brand logo). This is not protecting your models from being stolen but can help your lawyer in a court of law when you find out that your competition is using a stolen model.

Watermark

Monitoring based

An LMA stack (Logging, Monitoring, Alerting), like an open-source COS can help a lot in detecting adversarial attacks against your models. You can do it by observing traffic to your inference API and looking for anomalies or specific patterns that are used in model extraction attacks described above. Best open-source observability tools already have predefined alerting rules that you can use for that.

Prevention

Re-training from scratch

You can re-train a model from scratch with a different architecture or parameters, getting an accuracy that is close to the original model and using a blue-green deployment mechanism to direct some of the adversarial traffic to it. This will significantly decrease the performance of the knock-off model, the results I got from testing this was lowering the stolen model F1 score by 42%

Differential privacy

Differential Privacy protects against stealing the decision boundary of a model. Their main idea is to make outputs of all samples lying in the boundary-sensitive zone, i.e. samples that are close to the decision boundary, indistinguishable from each other. This is achieved by adding perturbations to these outputs, by a so-called boundary differential privacy layer (BDPL).

Input and output perturbations

The main idea is to perturb input/output probabilities to make the gradient maximally far from the original. Some papers proposed to use of the reverse sigmoid activation function as a

defense. A specific characteristic of that function is that it maps different logic values to the same probability. This leads to wrong gradient values and complicates the stealing process.

Model modification

Contrary to perturbing the data, which aims at reducing the precision of the behavior of stolen models, one can modify the model architecture and/or parameters. The motivation for protecting the architecture can e.g. be in scenarios when this architecture is novel and has certain advantages over existing ones. The main goal of the defender is thus not to protect one specific trained instance of this architecture (i.e. the learned model parameters) or training hyperparameters, but more in general the architecture itself, as it would prevent an attacker to apply this architecture to a different domain

Open-source

Automating all of the above defense mechanisms can be done using an MLOps pipeline like Kubeflow. The defense strategy, that can be implemented as part of your ML Ops infrastructure could look like this:

Defense strategy for ML models

If ML model security is a topic that interests you I would recommend starting by learning about INFOSEC in general, and in parallel trying dedicated open-source tools like MLsploit. MLsploit is the first user-friendly, cloud-based system that enables researchers and practitioners to rapidly evaluate and compare state-of-the-art adversarial attacks and defenses for machine learning (ML) models.

If you want to hear which defense mechanisms are used in the healthcare industry and how to protect patient privacy I want to invite you to my Kubecon keynote https://kccnceu2023.sched.com/

You can also reach me through my social media channels listed on https://www.maciejmazur.com/

--

--