Membership Inference Attacks (MIAs) and Data Leakage in Generative models

Published in

Data Reply IT | DataTech

9 min read5 days ago

Why we need to be conscious of generative models

Let’s set the stage a bit before diving into membership inference attacks. Understanding the landscape of machine learning and data privacy will help us see how these attacks can be a concern.

In today’s era, the proliferation of data is inevitable, and the increasing availability of interconnected devices (IoT), the easy access of these to the internet, the A.I. revolution, and social media are some of the reasons why we are exposed to an exponential increase of data availability.

With all this data, cybercriminals can find more ways to attack you, if you are not convinced, think that the global cost of cybercrime is forecast to jump to $23.84 trillion by 2027. Up from $8.44 trillion in 2022, according to estimates by Statista¹.

A Membership inference attack (MIA) is an attack on machine learning models, it can be considered as a benchmark for security in the machine learning field.

If a MIA succeeds, the attacker can understand which data is part of the training set and which is not, thus indicating that the overall model is unsafe. A dataset with sensitive information, like profile photos, addresses, clinical data, and anything really, needs to be protected from these attacks to prevent leakage of sensitive information.

For instance, if a machine learning model is trained on data collected from people with a certain disease, by knowing that a victim’s data belongs to the training data of the model, the attacker can immediately learn that this victim carries the disease.

The need to preserve privacy is concerning, especially with the increasing availability of machine learning (ML) models.

How membership inference attacks work

The state-of-the-art MIAs are based on two main approaches: one centers on employing shadow models, while the other revolves around obtaining comparable outcomes as the first approach but within a resource-constrained setting and with more stringent assumptions.

In both cases, this attack thrives in the context of machine learning as a service (MLaaS) which is a concept extensively used by all major cloud providers and also by AI professionals that usually deliver a MLaaS to a client.

Shadow model approach

The first MIA on machine learning models was introduced by Shokri et al². it uses a shadow model approach and tries to gather information from a prediction model.
The main principle behind adversarial attacks is that machine learning models can behave inconsistently when presented with data they were trained on versus completely new, unseen data. This variability in behavior is influenced by the model’s susceptibility to overfitting, where it may overly rely on specific details or memorize some training data.
When we have data from both the training set used to train the model and additional external data, we can feed these two types of inputs into the MLaaS prediction API. The API generates two distinct distributions based on how confident the model is: one from the training set data and another from the external data. This enables us to develop an attack model that effectively distinguishes the discrepancies between these distributions.

But unfortunately through the prediction API, there is no access to the input data from the training set, and so the shadow models come into play.

Considering we possess a model’s architecture and type, along with data similar to its training data, we can train shadow models (one per predicted class). These shadow models will mimic the target model’s behavior to a reasonable degree. Since we have trained these shadow models, we can differentiate between data from their training sets and external data. This distinction allows us to create a training set for a classifier, functioning as the attack model.

While access to the training data itself is often limited, most models expose their architecture and type through APIs. Even without knowing the specifics, we can synthesize data for shadow models using the prediction API itself. To evaluate the quality of this synthetic data, we can feed sample inputs to the prediction API. High confidence scores suggest the data aligns with (or closely resembles) the model’s training distribution.

This explanation of Membership Inference Attacks (MIAs), applies not only to prediction models but also to generative models like Stable-diffusion.

MIA for generative models

In this section, we will see the impacts of MIAs on diffusion models such as Stable-diffusion.

It is important to understand the privacy concerns related to these kinds of models, diffusion models that regenerate scraped data from the internet can pose privacy and copyright risks.

Privacy is a major concern with diffusion models. One of the main reasons along with medical images can be a copyright infringement, particularly with the emergence of “digital forgery”, techniques that copy professional artists’ work⁴.

The shadow model approach used for MIA attacks can be similarly applied to text-to-image models, particularly Stable Diffusion, one of the largest open-source models in this category

The attack is made without any knowledge of the training set or the model parameters, the main idea is to understand if an image, called query image, is present in the training set of the attacked model, the attack consists of the following parts:

Prepare shadow models: The attacker has an auxiliary dataset of images that may or may not overlap with the training dataset of the attacked model, usually when models are very big it is likely that the auxiliary dataset overlaps with the training dataset since oftentimes images are scraped from the web. The auxiliary dataset is used to train shadow models that simulate the target model’s behavior, it is a good practice to use a subset of the dataset to train a shadow model; Choose a conditional image generator model (shadow model) with a similar structure or one that can mimic the diffusion process and train multiple copies using different auxiliary dataset subsets, these models will try to mimic the attacked model;
Prepare data for attack inference model: Use each shadow model to generate synthetic data that simulates the target model’s responses. For each sample of the auxiliary dataset, query all shadow models to gather a diverse set of responses, since only a subset of the auxiliary dataset was used for training, then the shadow models will have both examples belonging to its training set and examples that do not belong to it. Collect the outputs (generated images) and corresponding inputs (text prompts) to build a comprehensive dataset for training the attack inference model;
Training attack inference model: Use the data generated by multiple shadow models to train the attack inference model. This model can be a classifier trained to distinguish between member and non-member samples based on similarity scores;
Attack Execution Phase: Given an image, use the attack inference model (classifier) to understand if the queried image was part or not of the training set.

Membership Inference Attacks come in various forms, targeting models with diverse configurations, but to have visual feedback I will use an image from Carlini⁵, even if the attack does not follow the same steps described:

This image from Carlini⁵ gives visual feedback on how similar the images extracted with MIAs can really be.

MIAs in a resource-constrained environment

The problem with the shadow models approach is that a large amount of resources is required to train hundreds or thousands of text-to-image models, given that a single one e.g. Stable diffusion v1.x has approximately 890 million parameters.

With a more constrained yet effective approach in a resource-constrained environment a MIA on text-to-image models can still be executed.

We will now explore a MIA performed on text-to-image models but without the use of shadow models, the attack pipeline⁶ has been slightly modified for better performances.

To explain the pipeline for the attack, additional assumptions must be made to ensure its feasibility. There is a trade-off between using computationally expensive shadow models and a less resource-intensive setting. This trade-off depends on the flexibility of the attack’s requirements. Less flexibility allows for a reduction in resources, while greater flexibility necessitates more computationally intensive approaches.

Given limited resources and the vastness of the image domain, partial knowledge of the training set of the attacked model is a necessity to make the attack feasible; It is also realistic to think that some knowledge about the training set exists; for example, the model could have been generated by updating a previous model for which the training set was known. Or, perhaps many images were privately collected to create the model, but the training set was supplemented with public datasets.

The pipeline works in this way:

We have an auxiliary dataset that includes some images from the training set of the attacked model, as well as other images that do not belong to the training set but follow a similar distribution. For example, if the attacked model generates images of cats, the additional images in the auxiliary dataset should also be of cats, not unrelated images like those of elephants;
The images from the auxiliary dataset will be used to generate another image using the attacked model with the following mechanics: Use a captioning model (e.g. BLIP2) to extract the prompt from the image, give the prompt as input to the text-to-image model, take the generated image and the original one and remove their background (in this context the model generates specific subjects, usually with a background, if the attack is made on environmental images, it does not make much sense to remove it);
Use a model for embedding the images (e.g. CLIP) and compute their difference, use the vector of the differences as the sample for the training set;
We have now created a dataset of embedded vectors, each containing the differences between the original image and a generated one. The vectors are present for both images belonging and not belonging to the training set of the attacked model and we can train a binary classifier to understand if an image is part or not of the training set.

Protection against MIAs

The paper “Membership Inference Attacks Against Machine Learning Models” by Reza Shokri et al.² discusses several techniques for protecting against membership inference attacks. These techniques are designed to reduce the amount of information machine learning models leak about their training datasets. The bad thing about protection against these attacks is that usually, the attacked model needs to drop performance to be able to defend from these attacks.

Here are some of the key mitigation strategies outlined in the paper that can be applied also to generative models:

Regularization: Regularization techniques, such as dropout and L2 regularization, help prevent models from overfitting to their training data. Well-regularized models generalize better and leak less information about their training data.
Differential Privacy: Training models with differential privacy ensure that the inclusion or exclusion of a single training record has a limited impact on the model’s outputs. This makes it difficult for adversaries to infer whether a particular record was included in the training set.
Adversarial Training: Incorporate adversarial examples in the training process to make the model’s predictions less confident and more uniform across different inputs.

Conclusions

In this article, we have discussed membership inference attacks (MIAs) and their various approaches, including the shadow model and the resource-constrained approach. We examined the primary settings for these attacks, such as the use of shadow models, their application to generative models, and MIAs in resource-constrained environments for generative models. Additionally, we briefly explored some protection techniques against these attacks.

I hope this article has been useful and has provided you with a new perspective on generative models.

REFERENCES

2023 was a big year for cybercrime: here’s how we can make our systems safer. 2023 was a big year for cybercrime — here’s how we can make our systems safer | World Economic Forum (weforum.org)
Membership Inference Attacks Against Machine Learning Models. 1610.05820 (arxiv.org)
GANs for Medical Image Analysis. 1809.06222 (arxiv.org)
Diffusion art or digital forgery? Investigating data replication in diffusion models. 2212.03860 (arxiv.org)
Extracting Training Data from Diffusion Models. 2301.13188 (arxiv.org)
Membership Inference Attacks Against Text-to-image Generation Models. 2210.00968 (arxiv.org)