Contests of Context: The Problems with Art-Generating Artificial Intelligence

Jennifer Kim
SI 410: Ethics and Information Technology
7 min readFeb 26, 2023
an artificial face in the middle of a swirl of code and color
Source: Thinkstock (https://www.dreamstime.com/illustration/beyond-thinking.html)

“People are suffering algorithmic harm, they’re not being told what’s happening to them, and there is no appeal system, there’s no accountability,” argues Cathy O’Neill, the author of the book Weapons of Math Destruction, in the Netflix documentary “Coded Bias”.

In the age of information, we, as consumers, have little knowledge of what goes behind the scenes. This makes it difficult to make value judgements on the technology we use. That certainly seems to be the case for the algorithms that generate art. More specifically, as you might be aware, there’s an ongoing debate about whether these algorithms are ethical.

Let me give you the bottom line up front: these algorithms can be unethical because they scrape and utilize data without the consent or knowledge of those who produced the data in the first place. In fact, there’s a whole system that enables these algorithms to take the data. Plus, there’s little to no opportunities for these individuals to take back their data.

At first glance, AI-generated art, or the algorithms themselves, seem harmless. You might have interacted with it through social media applications that turn a picture of your face into an anime-style portrait, or through wildly hypnotic landscapes that portray scenes from science fiction.

Source: Cecilia Hwung (https://www.videoproc.com/video-editor/ai-anime-filter-tiktok.htm)

Yet, do we really know how it works? And if we do find out that there are harmful effects, who should be held accountable for these effects?

Those are some deep questions, but I’m going to be honest: I don’t think I’m happy with what I’ve found. The way in which the algorithms work and the question of accountability, or at least what we understand from it, troubles me, and I think it should trouble you too.

Let’s take a step back for a moment: What exactly is AI-generated art?

To put it simply, it’s any piece of art generated through machine-learning algorithms. These machines are self-learning, which means that they take large amounts of data from a dataset and use them to train itself. Today, most of the algorithms use generative adversarial networks (GANs). This means that there are two systems: one that takes the data and generates random products (like images), and another that judges the products and determines which ones best align with the dataset.

Here’s the key word: “dataset”. Where the data comes from matters a lot.

Does it, though? Why should I care?

Well, dear reader, as authors Catherine D’Ignazio and Lauren Klein point out, “Most data arrive on our computational doorstep context-free.” This issue matters most to artists whose art is used in the datasets or to people whose faces and bodies are included in the data without them knowing. And, this issue is particularly tricky when the dataset itself is composed of products that are supposed to be private or copyrighted.

Take the LAION-5B dataset, for example. Many images in the dataset were taken without the artists’ consent, and without crediting or compensating them. To add on to that, within the same dataset, an artist who goes by Lapine found photos from her private medical record. Lapine wasn’t aware of that her private medical photos were uploaded on the internet, let alone scraped into the dataset. But, to have your works or even your body be used in such a way without your knowledge is unfair. It breaches your privacy because when you copyright a work or when you go to the doctor’s office, you trust that the work or the results will be handled in a way that respects the value you assign to them. In Lapine’s case, she had signed a form refusing the practice the right to put the image in a dataset.

This is not to say that the creators of LAION-5B knew that Lapine’s image was in the database, or that they had any malicious intent. In fact, it’s very believable that the creators and the algorithm itself probably just gathered whatever the internet had to offer. The creators of LAION-5B have pointed out that the dataset is taken from the “publicly available internet” and is “uncurated”. However, it does seem a little sus [slang for “suspicious”]; yes, it’s clear that the dataset is too large to be filtered through, but to what extent can “publicly available” data be used for, well, everything? And who should take responsibility when data is used so unsparingly?

Let’s work down the chain of custody and see what we can infer.

Would the responsibility fall on the doctor’s office? Lapine suspects that the images were taken from the practice after the surgeon in charge passed away. Perhaps the onus should have been on the office to make sure that these images were kept within the practice. The surgeon’s passing may have created some confusion about what to do with the files, but there were ways to prevent the incident. These include deleting the data, working with the patients to put the data elsewhere, or letting the patients know that the data had been leaked. To be fair, there is no sure way to keep the data safe 100% of the time. It could also have been the case that the office didn’t know that the images were leaked.

A flowchart showing where the data went. It starts off with “original source of data”, then to the hosting website, then to LAION-5B, then to the user.
A flowchart showing how data gets to the user (Source: Jennifer Kim, writer)

So, if the responsibility can’t be entirely held by the office — should it also fall on LAION-5B instead?

On LAION’s public Discord server, Romain Beaumont, an engineer, pointed out that LAION itself doesn’t host the images, but only takes from certain websites that does host those images. Beaumont also suggested that individuals should “ask for the hosting website to stop hosting it” and that a blacklist of host websites should be created.

To be honest, this sounds like LAION is shifting the blame and responsibility for taking these images off the web. Who would expect a picture of your medical condition to be public online? Plus, not everyone knows who or how to ask to get the images removed.

At the same time, though, let’s be honest: sifting through the internet would be a horrible task.

Perhaps we could turn to new research about unlocking where the algorithm takes its data from. A research group from Nvidia tried running the neural network in reverse, and were successful in producing some of the training data images. But, as the article naming Nvidia points out, this reverse engineering process is still in the works; this makes it hard for companies, let alone regular users, to access it.

And here’s something else to ✨spice up✨ the problem: AI-generating platforms cannot hold copyright. This means that the art produced by these platforms belong in the public domain. In other words, no matter where the AI gets its data, the end product doesn’t belong to anybody in particular. This creates a huge problem: artists can’t get credit for any products generated based on their art. This issue is especially frustrating to artists who find their art everywhere, even if the art is copyrighted. Some artists have even found their mangled signatures on AI art, which makes the link between the source data and the products clearer.

Even then, however, there’s not much the artists can do to control how their art is used, because it’s already in the hands of the users. And users have taken their liberties with the products: one AI piece won a prize at a state art fair, while other pieces have been sold as NFTs.

So, here’s what we’ve got so far: the surgeon’s office, the websites hosting the images, and LAION-5B are all problematic in their own ways. At each of these stages, however, there’s not a lot of accountability being held.

But what about us? Do we have a responsibility too?

If we take a view based on the book Data Feminism, maybe the users do have a responsibility. Authors D’Ignazio and Klein argue that users should “ask questions about the social, cultural, historical, institutional, and material conditions under which that knowledge was produced”. Perhaps LAION-5B should put up a warning telling users that the products created by the algorithm could be derived without the creator’s consent.

Okay, so… what can I learn from this mess?

Well, dear reader, I wanted to share that the whole system is full of ethical conundrums, especially when it comes to where the data comes from. There’s little to no accountability at each stage, hurting artists and patients like Lapine. No one wants, or can, take the full responsibility for making sure that data is ethically sourced or used. It’s quite dismal, really.

But, as I’ve suggested, there is some hope. At least you, dear reader, know that art-generating algorithms take from the unsuspecting, and that there are always a few ways to stop that from happening. And we can always work to improve the system.

--

--