Having a Voice about AI, Part One: Know it when you see it

Published in

daios

11 min readJun 10, 2022

This is the first blog post in a series on Having a Voice about AI, in which Dr. Thomas Krendl Gilbert reflects on what it would take for laypeople to better recognize and control the automated decision systems that increasingly manage their lives.

Introduction

“I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description [“hard-core pornography”], and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.”
- Supreme Court Justice Potter Stewart, Jacobellis v. Ohio

AI now occupies an awkward spot in the popular imagination. On the one hand, AI-enabled services are everywhere. For years they have been used to auto-generate text for sports stories; new “foundation models” like GPT-3 are now being used to co-write articles of ever-greater length and complexity. News stories of the latest capabilities and applications are common, even if they may be mystifying and disorienting about the full implications. And while they don’t advertise it as prominently as they could, companies do often state where and how AI is used in their products. This ranges from obvious examples (voice assistants like Alexa) to the more subtle (automated classification techniques used to calculate your credit score).

On the other hand, how many of us can actually recall the last time we interacted with an AI system? Did you even notice or care? Today we are like tourists traversing a rainforest, with guides who occasionally point out a particularly colorful insect or poisonous frog for us to gape at, even as the canopy above our heads and the soil under our feet teem with thousands of uncatalogued, unrecognized, or forgotten species. We don’t yet feel remotely at home in that rainforest, let alone know how to make a home in it worth living in.

Beyond armchair disputes about whether these systems really count as “intelligent”, there is a more practical matter: how can we get better at recognizing their use? Where are they and how common are they? What difference does it make if they are used to generate the emails we read, the images we see, or the decisions passed down to us?

What is software?

The AI applications that surround you as you surf the web or step outside or rest your head on your pillow at night remain a form of software. And software, an executable program that aids in the completion of some task, is as old as computers. Computers themselves were originally women employed to perform complex calculations (their job was to “compute” things, like a teacher “teaches” students or a nurse “nurses” patients back to health). Some of the earliest pieces of “software” were the differential equations used in nuclear physics as part of the Manhattan Project. Software is a tool that, when used, changes something about the state of the world. It took us from a world where one could scarcely imagine human extinction to one where it could happen at any moment with the push of a button.

Let’s consider the features of software through a familiar example: using photoshop to design an image for a poster. First, software executes the intention in your head. In photoshop, this is achieved by translating your mouse clicks and drags into design choices that add color and textures that are machine-readable. Second, software represents your intention by showing the difference it makes when applied. In photoshop, you immediately see the contrast between a green or red apple even if all you executed was a sequence of zeros and ones at the level of computer binary. The third defining feature of software is that it is static. In a traditional software application, every change in the program is implemented by the user. The program “translates” your intent into something computable by the machine, but the user has constant veto power and starts or stops whatever happens. It follows that, however it may appear, software cannot actually “do” anything on its own; what it “does” only makes sense in light of an intent and context supplied by the user.

To sum up, traditional software is a static representation and execution of the user’s intentions. Whether you are designing in photoshop or using a calculator app on your phone, you decide what the colors and numbers are, what they refer to, and when to stop the program. All the software does is help you complete tasks based on purposes that are yours to define.

What makes AI different from software?

Now we know the features of traditional software. What about AI? How can we tell if what we’re seeing was created using AI? We can begin to feel the difference by starting with a fun question: what makes the poster for the movie Legend so great?

Legend itself is a great movie. Tom Cruise, a complicated actor even in his best roles, is downright cherubic in this movie as Jack, an everyman “hero with a thousand faces”. Tim Curry as Darkness is so iconic that his character was recently featured as the perfect romantic match for the year 2020. But the poster has taken on a life of its own. There are even multiple versions, all iconic amongst collectors. What makes it so special, so evocative?

The posters for Legend, The Lion King, Blade Runner, and E.T. were all designed by John Alvin, an artist whose stated mission was “creating the promise of a great experience” for filmgoers. His work is hand-drawn pieces of pop art. It paints a picture of a cinematic world in terms of a transportive human experience. You can sense that the person who made the poster has seen the movie and wants to share the experience with you. In the case of Legend, Alvin translated his memory of that experience into a painterly style distinct from film itself but true to its literary influences. It evokes Grimm’s fairy tales, and invites that reading of the film as you watch it. The poster is an intentional execution of what it feels like to watch the film.

But film posters don’t have to be hand-drawn to be great. Consider the example of Moonlight, the 2016 Oscar winner for Best Picture.

The poster works because it tells the story of the movie–three pivotal episodes in the protagonist’s life–by highlighting the emotion in the eyes. The poster captures this through a simple digital composite of his face at each of these moments. If you haven’t seen the movie, it makes you curious about what it would feel like to watch that story unfold; if you have seen it, it reminds you of what it felt like to experience it. The composite wasn’t drawn by hand (the images are derived from film stills) but you can tell the artist understands the character’s journey and what makes it worth experiencing on film. The composition has been thought through.

But what happens when the poster artist clearly hasn’t experienced the film in question? What if the poster was generated using AI? Consider how DALL-E 2, an OpenAI program that can generate art in response to simple language descriptions, answers the prompt “Marvel movie poster featuring multiple superheroes”:

There is a lot going right here. Looking at all these posters, you immediately get the sense and scale of a comic book. They have the right colors, and the characters are at least plausible. It all feels like the artist has looked at a bunch of comic books for inspiration. But there’s a major problem: every word is misspelled. The artist apparently doesn’t know how to spell the words “marvel” or “avenger”; or, to be more precise, the artist can’t tell an “avenger” from an “avenler”, a group of “mavelers”, or a “mavergeler”, because no one ever told it what the difference was. The artist knows extremely well what comics (and movies based on them) look like, but the artist has no idea what it will feel like to read the comic book.

DALL-E 2’s defenders might argue that these limitations could be technically overcome with more diverse training examples, a larger dataset, and perhaps more parameters on which to train the model. That might very well make the program more accurate in its spelling, but it is missing the point. Great movie posters anticipate the experience of seeing the movie, while AI-generated posters only distill instances of previous movie posters. DALL-E 2’s generated images, while impressive, are inherently derivative of prior examples. They don’t entice the viewer into taking a journey that the artist has actually taken.

Still, DALL-E 2 is able to generate movie posters, which photoshop can’t do on its own. A graphic artist could use software to augment a poster design of their own choosing, but that isn’t the same thing as what AI is able to do. How do we make sense of DALL-E 2’s capabilities in light of its obvious limitations?

The key difference between AI and software is that AI replaces forward-looking intentions with backward-looking computation. As discussed above, software requires human intent in order to run and do whatever the designer wants to make happen; by contrast, AI is about finding out how much can get done based on rules derived only from past examples. It turns out that, with automated learning and ever-greater amounts of memory, an awful lot can get done. The beauty of a self-driving car is that it can learn to proficiently cruise down a highway, pass through a four-way stop, or signal to children that it’s safe to cross without having any idea of where it is going, or why it matters. That doesn’t mean it’s stupid–humans who have driven the same road for years often forget how they got to work that day–but that its activity is not future-oriented.

Of course, there is still some implicit idea of “driving” or a “Marvel movie poster” behind these AI applications. The point is that the program’s representation of these ideas is derivative, and does not depend on active input from the human user in order to be executed. AI can “drive a car” or “design a movie poster” once they have been transformed into tasks that may be completed through computation. But you know when you see it that they are no longer the product of future-oriented human activity. They are no longer experiences intended to be shared or celebrated.

Why does it matter?

Movie posters are becoming a lost art, largely because the experience of going to the movies is no longer valued for its own sake. Today, we are more likely to stream content from home, or on our phones, or just share memes rather than sit through a 120-minute feature surrounded by strangers. The activity of standing under the marquee’s flashing lights and staring in wonderment at an upcoming film’s striking visuals no longer moves us. As autonomous vehicles become more and more common, the same might someday happen with driving–once rendered as a computable task, driving may well be seen as tedious, monotonous, boring, a waste of time. We will no longer think of it as an intentional human activity in its own right.

For all the talk about the ever-greater capabilities of AI systems, it is not true that they can directly replace human capabilities as of yet. Rather, AI reorganizes human activities around tasks that can be repetitively executed by machines. It’s often hard to define whether or how some task like designing movie posters helps constitute a larger activity like moviegoing. But what is moviegoing other than looking forward to the matinee, grabbing a good spot in line, picking out concessions, and spontaneously laughing or crying at the film with a group of strangers? Once more and more of these things are made into perfectly repeatable, derivative tasks, what happens to moviegoing?

What’s at stake is not whether AI can design a movie poster or drive a car, but how the decision to automate these tasks implicitly changes the wider activity. AI is increasingly being used to automate human vision, steering wheel control, and the higher-level planning from point A to B. But ridesharing and self-driving are implicitly reorganizing activities like commuting and in the process redefining public mobility around themselves. Automating driving will change roads, and in so doing shift how we understand them as supporting where we want to go as well as what it means to get there.

If software is a representation and execution of the user’s ideas, then it follows that the rise of ever-more-capable AI systems is a call for becoming more intentional about our own activities. Will we still care about moviegoing (or movies at all) once AI can write screenplays on its own or generate entire films on-demand? Will we still care about commuting when AI reorganizes roads around what self-driving cars can or can’t do? Will we still care about communicating when AI can auto-recommend whatever text or content we want to us through social media? These activities matter to us because they connect us to other people in a living, ongoing, organic way. We don’t have to care about how moviegoing, commuting, or communicating in public are being redefined by automated tasks. But if we don’t, the activities will gradually erode and die.

If we do care, then we have to shift our attention to the terms and conditions under which AI is used in the context of these activities. What features of those activities matter most to us? How can we protect them from AI’s encroachment, or even use AI to augment them? What makes these questions important is that AI, on its own, cannot answer them, no matter how “capable” it becomes or how many tasks it becomes good at completing. What’s at stake is how those tasks add up to an activity worth taking part in, as part of what it means to live a good life.

Conclusion

This post has covered a lot of ground. Here are the key points to take away:

software helps complete a task by executing a program based on user intentions
AI completes a task on its own based on previous inputs
automating tasks with AI makes the wider activity backward-looking and passive
humans need to be more intentional about these activities for AI to support them well

We don’t have to just throw up our hands and wait passively for the future to happen. The first step in voicing your concerns about AI is to understand what is at stake in the automation of tasks that used to be exclusive to skilled humans. In my next post, I’ll examine the importance of dissent for building AI systems, and in particular how we can learn to say “no” to what particular AI systems are doing in order to make them better.