In Poison, There is Truth

Data Poisoning Meets Privacy in Machine Learning

Ayrton San Joaquin
Geek Culture
7 min readSep 7, 2022

--

Photo by Zbynek Burival on Unsplash

“Data is the new oil” was proclaimed by data scientist and business executive Clive Humby in 2006. [1] Nearly two decades later, modern machine learning (ML) models rely on massive amounts of data, especially for a subclass of these models that involve deep learning. These models’ hunger for data seems justified. After all, more data, in general, gives more information, which theory dictates will make a model better at their specified task (which is formally called their objective function).

Some took this comparison even further. Another business executive named Michael Palmer remarked that “Data, if unrefined [like oil], cannot really be used”. [1] This observation, again, makes sense. ML models only require information relevant to their tasks. Everything else is irrelevant noise. For a simple example, consider an algorithm trying to distinguish between a dog and a deer. If one had training data that included cat images, one must refine the training data by removing those cat images.

But just like oil, a flammable and toxic substance which can and have caused environmental damage (e.g. oil spills) since the start of the industrial revolution, data is only useful if it is used for its intended purpose. But isn’t this already obvious? Isn’t data meant to be used to improve the performance of an ML model? How can data be otherwise, i.e. be “flammable and toxic”?

Photo by Mary Rose Xenikakis. Photo is public domain and taken from [4]

For my bachelor’s thesis, I looked at that exact question. I considered how to make data harmful to ML models. Specifically, I devised two ways to make the information contained within the data harmful to the privacy of deep learning models. But how can harmful data be introduced to the model? I considered the situation where multiple parties contribute their data to train a single model. By analogy, suppose each company is a worker in an oil refinery, with oil representing their data. Then it only takes one malicious worker to sabotage the refinery and endanger a fraction of, or even, the entire supply.

Let’s backtrack to the beginning to establish some background. Machine Learning models train on data in order to learn and improve themselves. Any training data that does the opposite, i.e. harms the model in some relevant way, is called poison data. Deliberately inserting poison data is called data poisoning. A common form of training is supervised learning, where each data point has an accompanying label. This is the form of training we will consider throughout this article. When training is done, the resulting model is deployed to perform its task in the real-world, which involves producing a prediction given never-before-seen data (& without a label).

Moreover, breaking/harming the privacy of a model essentially means getting information about the training data the model was trained on by interacting with the model. This information can be about any property of the specific training data. For our purposes, and since this is the standard operational definition, we focus on getting the membership information of a query data by interacting with the model. Concretely, given a trained model, we interact with it to determine whether a data point we are interested in was part of the training data used to train that model. This is called membership inference, and methods that do this are called membership inference attacks (MIAs).

Image of membership inference.
Overview of a Membership Inference given a data point X. The Membership Inference Attack (MIA) is the blue rectangle and interacts with the trained model. Setup is the same for both poisoned and unpoisoned models. Image taken from my thesis (unpublished). Red character taken from [3].

So how does model training relate to membership inference attacks? The methods I introduced are data poisoning methods, which insert specific data points to improve MIAs. When I say improve MIAs, I mean that these poisoning methods make models more vulnerable to MIAs. As hinted above, I look at collaborative training schemes, where multiple parties share data to train a single model. I assume at least one of these parties, called the adversary, wants to break the model’s privacy to gain information about the other party’s data.

Image of a collaborative training scheme.
Overview of a collaborative training scheme. Multiple participants contribute their data, with the adversary contributing poisoned data. The resulting model is poisoned. Image is also from my thesis, and characters also taken from [3].

The first poisoning method, called the Label-Flip attack, aims to improve the MIA on specific data points. As the name suggests, the adversary introduces the data point/s of interest into the training set with a corresponding wrong label. That’s it. It’s that simple. In fact, it’s embarrassingly simple. Experiments show it improves an MIA by orders of magnitude. The second poisoning method is called the Turncoat attack, which is an extension of the Label-Flip attack, that aims to improve the MIA on as much points as possible. In my case, I used it to improve the MIA against data of a specific target class. While it is less potent than the Label-Flip attack, the adversary does not need the exact data point of interest. They only need to mislabel their data to the same label as the target class.

So how does mislabelling improve MIAs? We can reframe membership inference as a remembering game. When an MIA is performed on a model, it is essentially asking it if it “remembers” if a data point is from its training set. Mislabeling makes a model remember that data point more (“I’ve never seen a cat that looks like an airplane before.”), and it is analogous to how a human remembers a novel experience better than a specific but repeated experience. When one mislabels a data and the actual data is part of the training set, the model, under the MIA, can indicate “I’ve seen this data in the dataset twice with different labels, so the actual data indeed was part of the training set!”. The opposite indication happens when the actual data is not in the set. Note that this is an oversimplification of why mislabelling improves MIAs. Look at [2] for the formal explanation.

Rethinking how we preserve privacy when sharing sensitive data

The results were quite funny to me at first given how simple they were, but they became alarming as I thought about their implications. Multiple organizations, who often do not have much data, rely on collaborative training schemes to train models. One such example is Federated Learning, whose goal is to preserve the privacy of the training data of a participant from other participants by ensuring the data does not leave the corresponding owner/participant’s machine. However, these schemes assume that there is no poisoned data in the training set. Though I did not experiment on a Federated Learning setup directly, it seems that the poisoning methods will still improve MIAs because they only rely on the assumption that they are included in the training set; where the data is located is irrelevant. In summary, like having an oil refinery without any means of inspecting if the pipes all work properly or oil is where it is supposed to be, it seems our current collaborative training setups are fragile and vulnerable to these new class of poisoning attacks.

To give you an example of when an adversary would want to know about the data of another participant in a collaborative training scheme, you simply have to look at healthcare. As ML is increasingly adopted in healthcare to personalize treatment, speed-up diagnosis, and automate routine procedures, there is an open challenge to apply ML in treating rare diseases. Private healthcare companies / hospitals, having insufficient data alone, are forced to collaborate to create a shared model that addresses the rare disease. Intel has done this for the Penn Medicine group of hospitals in the U.S. to analyze brain tumors. This scheme does not happen exclusively for rare diseases. A group of Spanish hospitals implemented a scheme for COVID-19 research. The danger is that competition can encourage companies to use the poisoning attacks to violate the privacy of the data of other companies. A company that discovers a certain patient’s data is used by another company in training the model can argue that the latter company failed to preserve that patient’s privacy. What can ensue are lawsuits or worse, blackmail.

From Big Data to Good (& Big) Data

As the poisoning methods rely on the simple act of mislabeling, something done almost always when labeling a dataset whether by people or machines, looking out for mislabeled data is a simple remedy to this poison. However for collaborative training schemes, especially those that claim to preserve privacy, the data is only known by the owner and thus cannot be verified to be clean. Future research needs to develop curation methods when the training data itself is private. This problem seems really hard, and I would wager it will be easier to redesign/overhaul existing collaborative training schemes to remove the assumption that all training data is trustworthy. For the general ML practitioners, this serves as another reminder that we have to be conscious of our dataset’s quality. Poisoning does not just seek to harm the model’s prediction accuracy. In fact, poisoning does not have to (our methods barely, if at all, change the total accuracy when done for small number of targets). We show it can even break the privacy of the model and the affected data. Poisoning matters when privacy matters.

If you are interested in the technical details of the poisoning methods: Unfortunately, I have yet to release my thesis, but I co-authored a peer-reviewed paper as a direct product of my thesis and in collaboration with a team from Google Brain and Oregon State University. It will appear at the 2022 Association for Computing Machinery Conference on Computer and Communications Security (ACM CCS). You can find our pre-print [2] here. It includes the Label-Flip attack and an attack similar to the Turncoat attack. It also includes applying the attack on (unsupervised) language models.

[1] New, J. Why Do People Still Think Data Is the New Oil? (2018), Center For Data Innovation

[2] Tramér, F., et. al. Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets (2022), Arxiv

[3] coolguyyyysk. 18 custom among us characters that you can use without copyright (2020), Reddit

[4] Xenikakis, M. Kuwaiti firefighters (2003), Wikimedia Commons.

--

--