Advocating for Protections for the Information Ecosystem
Today, I got to participate in a round table organized by Congressman Ro Khanna about how AI will impact the workforce, elections, education and mental health.
We each got to do 90 seconds of opening remarks. Here are mine — the words in italics are what had to be cut for time, but no one’s timing a blog post :)
We have a gap in our regulations when it comes to protecting our information ecosystem from “spills” of synthetic media produced by so-called “generative AI” systems (images, audio, video and text). While there has been some attention paid to deepfakes (and their potential use by bad actors to influence elections), not enough has been done about synthetic text.
- Last year, someone caused a large language model to extrude text in the guise of a mushroom foraging guide and then self-published the possibly lethal result on Amazon.
- CNET, Sports Illustrated, and others have taken to publishing synthetic simulacra of articles (again produced by ChatGPT or similar) as if they were actual reporting.
- Someone who wanted to cast doubt on a topic like vaccine efficacy could use an LLM to easily make many different articles with slightly different “information” on the topic closely mimicking the style of real reporting — and as a result taint the actual reporting with seeming unreliability.
When it’s hard to find trustworthy sources of information, it’s also hard to trust them once we’ve found them. And a public that can’t find and trust trustworthy information can’t participate effectively in collective efforts like democracy or public health.
With ChatGPT, OpenAI created a faucet and invited people to walk up to it and release synthetic media spills into our information ecosystem. (Sam Altman even recently bragged that one in every thousand words produced planet-wide now comes from ChatGPT.) And none of the companies producing these synthetic text extruding machines has taken even the simplest actions to make their output recognizable for effective clean up.
When Dow or 3M or Exxon or whomever spills chemicals into the physical ecosystem, we work to hold them accountable. We should build similar protections for our information ecosystem.
(And as you think about how to regulate this in ways that are consistent with the First Amendment, I’d like to remind you that though so-called “generative AI” systems require massive amounts of (often stolen) training data, representing the creative work of actual people, when they grind it up into their papier-mache output, that output is in fact, nobody’s speech.)
I was able to bring up the last point above later in the discussion — and got some pushback. I agree that First Amendment issues are tricky, but I think it is important to recognize that, with the advent of LLMs, not everything that looks like speech is in fact speech in the sense of conveying a person or group of people’s communicative intent. This important change to our environment must be taken into account as we work out how to navigate these issues.
It’s also worth noting that a system of machine-readable watermarks (even imperfectly robust ones) would be a boon. This would allow someone to create a browser extension that filters out (much) synthetic media. If the regulatory requirement is watermarking of synthetic text, and then individuals can choose what to filter, this probably side steps First Amendment questions — afterall, we should all be free to choose whose speech we attend to.
For more on the metaphor of the information ecosystem, see Shah & Bender (2024) “Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web?” to appear in ACM Transactions on the Web. (Preprint for now; will update this post with a link to article once it’s out.)