Beyond Misinformation — Navigating the AI World of Infinite Content

Jeremy Kirshbaum
5 min readJun 21, 2023

--

https://lexica.art

“The library will endure; it is the universe. As for us, everything has not been written; we are not turning into phantoms. We walk the corridors, searching the shelves and rearranging them, looking for lines of meaning amid leagues of cacophony and incoherence, reading the history of the past and our future, collecting our thoughts and collecting the thoughts of others, and every so often glimpsing mirrors, in which we may recognize creatures of the information.” — Jose Luis Borges, The Library of Babel

Misinformation from generative AI is indeed a significant problem, but it is actually only a subset of a larger issue. The bigger problem we face is the challenge of how a finite number of consciousnesses can consume an infinite amount of content. What does it mean to create information beyond the rate at which human consciousness can ingest information?

Jorge Luis Borges’s “The Library of Babel” is a short story that introduces a universe consisting of an enormous expanse of interconnected hexagonal rooms, each filled with all possible books containing every imaginable combination of a 25-symbol alphabet. This fictitious library represents a universe of information that is infinite, overwhelming, and largely indecipherable. Just like Borges’s library, the output from AI systems offers a nearly limitless amount of information, which poses a challenge of scale and comprehension. In both situations, an almost infinite amount of data exists, most of which may never be consumed or even acknowledged by human consciousness. The task of navigating and making sense of this sea of information — distinguishing the true from the false, or even the engaging from the mundane — echoes the challenges we confront with the burgeoning growth of AI-generated content. The striking similarities between “The Library of Babel” and our present-day reality underline the significance of our modern task: creating a system that doesn’t drown us in useless or erroneous information.

To illustrate this, let’s imagine a scenario in which all information produced by a large language model (LLM) or AI is accurate. Even in such a scenario, we still confront a substantial issue, because our systems are not designed to handle the sheer quantity of information or coherent text produced by AI models. We, as a culture, are ill-equipped to accommodate this shift. Previously, while no human could interact with all information, every piece of information had, at some point, been processed by a human. This ensured a baseline expense for producing information and knowledge.

Our big challenge for the next few years is ensuring we create a system that is knowable, one that doesn’t drown us in useless or erroneous information. Erroneous information can be just as dangerous as misinformation, even if it is true. Even if all your meals are healthy, you will suffer ill effects from eating nine of them a day. We are facing a challenge of scale, of which misinformation is only a subset. We need to consider the purpose of the information we produce, where it ends up, and how we can create powerful tools for exploring vast quantities of natural language data.

Indeed, we have interfaces for interacting with knowledge on a larger scale. Through mechanisms like semantic search and summarization, an individual human can engage with a much larger body of information than before. The challenge lies in designing these interfaces for knowledge, experiences, and interactions at a level suitable for an individual human or an organization that can keep pace with the production of information.

One part of the challenge involves sorting true from false, or if we’re talking about conceptual development or fictional writing, separating the interesting from the mundane, the explanatory from the non-explanatory; essentially, distinguishing good from bad. However, this is only a portion of the issue. Even after you have curated your data set, you are confronted with its scale, its analysis, and the questions of how to interact with it, how to experience it, and how to know it.

In early 2021, I created an art piece titled “20,000 Love Poems and a Song of Despair.” I generated 20,000 love poems in the style of Pablo Neruda with GPT-J and GPT-Neo-X, printed them, and read them in a gallery for 10 hours in downtown Oakland. An interesting aspect of this project was that the majority of those poems, let’s say 95% for estimation, will never be read by a human. They’ll never be experienced by a consciousness. This acts as a metaphor for what we can expect in the coming years as generative AI content begins to proliferate.

Watermarking can be part of the solution. The reasons for doing watermarking in text or imagery extend far beyond copyright. Most of the tools we’ll use to navigate this massively expanded knowledge base will be automated by necessity. Hence, we must create automated methods for detecting the source of documents, including identifying whether they are AI-generated or not. This could be crucial for issues like copyright or misinformation, but it also serves a critical use case purely at the level of utility, enabling us to more effectively understand the world that we are creating. When it’s easy to voluntarily impart these markers, we should go ahead and do so, even if it doesn’t make sense to apply regulatory requirements.

Ultimately, I don’t think there is a regulatory solution to this, or a technological one. When the cost and value of creating additional information drops to zero, what is left? The easy answer is that other cultural elements will increase in value:

  1. Application and execution: If creating and manipulating information is easy, it doesn’t necessarily make it easier to do something with it.
  2. Automating knowledge-creating processes: Automated methods of creating knowledge from information in a way that distills it can help, but this is easier said than done. The work of understanding how we perform knowledge creation and decomposing it into its component parts is difficult and often deeply reflective. The research group Ought is exploring this, and their Elicit platform is a very interesting, early attempt at something like this.
  3. Cultivating attention and intention: The quantity of available distractions will skyrocket, and individual discipline to attend deliberately and ignore constantly will be a crucial survival tactic.
  4. Invest in relationships of trust with other humans: Working collectively is crucial. Collaborating with people whose incentives you know are aligned with yours is key, and in a world of hyper-realistic bots, 1:1 relationships with deep context are the only kind that will be reliable.

I say these are the easy answers, because the hard ones are too fundamental to make a prediction about or really strategize around. There is something fundamentally different that is changing about how we express ourselves, learn, and communicate, and it is really unlike anything we’ve encountered before. It is going to get so much weirder and incomprehensible than it already has. I think it will take a lot of work, a lot of compassion, and more than a bit of luck to make sure we navigate and build this new future in a way that we end up with something more beautiful than before. Beyond that…maybe I’ll find the answer somewhere in the infinite shelves.

— post by Jeremy Kirshbaum. Reach out any time at jeremy@handshake.fyi

Want to navigate the AI future with other talented humans? Join our Generative AI Masterclass: https://maven.com/handshake/tech-bootcamp.

--

--

Jeremy Kirshbaum

I run Handshake, a generative AI consultancy. We build generative AI systems, research cutting edge use cases and impacts, and teach generative AI skills.