The Allegory of the Cave, also known as Plato’s Cave, is a philosophical concept presented by the Greek philosopher Plato. The text is an allegory in which he imagines a group of people who have been chained to a cave wall since birth, facing a blank wall. Behind them, there is a fire, and objects pass in front of it, casting shadows on the wall. The prisoners name the shadows, thinking they are real, but they are only distorted images of the actual world. Plato uses this allegory to compare “the effect of education and the lack of it on our nature”. The shadows represent the fragment of reality that we can normally perceive through our senses, while the objects under the sun represent the true forms of objects that we can only perceive through reason.
One of the focal points of my favorite analyst, Ben Thompson, is that the internet introduced the concept of zero marginal cost for distribution, distribution of software and content, whereby content spans entertainment, knowledge, advertisements, and more! Within the notion of zero marginal cost, a blogger such as myself can disseminate thoughts to a global audience with a click of a button whereby traditional newspapers relied on localization. While many of Ben’s pieces center around this removal of friction from a business sense, I am seeking to approach the topic from the perspective of Artificial Intelligence. By nature of Generative AI, content, from pictures, videos, audio, and text can be generated at zero marginal cost (Note: there are inherent cost to the hosting company and generation of text does cost fractions of pennies per token generated, however, when analyzing one’s own time as a tradeoff to financial gains elsewhere, writing a short prompt to generate a blog post is near zero cost).
Social media has proliferated the availability of content and the “law” of virality underscores that catchy content, albeit radical points of view to the absurd, to the noteworthy, gets promoted by people and caught by the underlying algorithms pushing information into our feed. Generative AI will only catalyze the generation of content and users, again with zero marginal cost, can craft prompts that will generate viral posts if they “play the game.”
Now, I wish to step aside from the viral content for a moment as a lot is non-valuable information that circulates and has defined society. What I seek to focus on is the notion of truths. In America, under the first amendment, there is the freedom of speech, the ability to share ones opinion regardless of what that might (with limitations). From blog posts, news articles, books, and everything in between, people and corporations alike are sharing perspectives which are all valid. In fact, each individual person is represented as a prisoner in Plato’s cave using experiences and education to guide ourselves through life, making calculated decisions (consciously or subconsciously) and collaborating with our fellow neighbors (globally) on the collective truths. Furthermore, this required us to parse through the troves of information available on the internet (comments sections, threads, news articles, etc.) and extrapolate truth from perspective with biased every piece (as we all are inherently biased with our own opinions, its where collective bias at an aggregate level gets problematic). This was the heart of what the early philosophers aimed to tackle, abstracting reality to uncover existential “truths.”
Where the lines between truth and malice blur is when the notions of perspective get plagued with malintent. During the 2016 presidential election, the concept of “fake news” arose and a “war on truth” was waged. This war took aim at disavowing actual facts that were being reported on by trustworthy news organizations and eroding our trust in information disseminated more broadly. Arguably, per the point earlier, this war started to plague good information on the internet and place additional burden on us as truth seekers to criticize and challenge opinions more.
So why talk about truths? There are two points I wish to discuss: (1) Generated content and (2) Training of generative models.
Generated Content
Given the lengthy discussion earlier, this section may be a bit brief. Going back to the concept of zero marginal cost and virality, not only is generating known content nearly zero cost (however, large organizations are investing in guardrails, reinforcement learning through human feedback (RLHF) training, and filter middleware / system prompts within LLM backed models), generating false content (outside of hallucinations) is extremely easy. The burden of truth seeking and parsing through the troves of information on the internet will only be more cumbersome. Reliable content may need to be paywalled and/or people may need to rely on statistical information to utilize collective knowledge to determine what is “truth.”
Trained Content
Generative AI, and all Artificial Intelligence models, are trained on information, arguably public information. The known fact about AI training is the garbage rule, “garbage in, garbage out.” For data scientists, one of the most tedious and cumbersome jobs is obtaining and cleansing the data — training the AI model is arguably the easy part! For algorithmic models such as recommendation engines or forecasting models that rely on quantitative data, obtaining actual observable instances (oversimplifying here as there are nuances and biases within every data record) is relatively straightforward (and costly). Natural Language, however, is more difficult. As corpora of information are built by using common datasets (common crawl as an example) and crawling the internet for content (text, images, video, etc) the garbage rule is taken into consideration with a lot more weight. Social media, again, can be biased. For systems learning how to model natural language and model the probability of words next to one another, using that information may be helpful, however, to extract knowledge and produce knowledge, that requires content to be unbiased (or balanced) and be truthful, which is difficult on the internet. That is why RAG (Retrieval Augmented Reality) and summarization factors such as those used in Bing Copilot are more trustworthy now than answers submitted by ChatGPT (in my opinion) and why I continue to push for more institutional knowledge build specific AI models orchestrated by a knowledge expert “agent.”
Regardless of industry, life continues to throw “hot button topics” at us that require us to take perspectives and determine what is truth. Through collective collaboration, society can determine what is true or false. Reality is that we are all Plato’s cave prisoners and no one really knows existential truths, but if we are all trapped together, the best we can do is ensure we are all agreeing on what is truth, and discarding what is false. The most dangerous aspect of generative AI and this new wave of artificial intelligence is if AI starts to shape the cave and distort the light we all see projected on the cave wall further than it already is. Our collective objective is to use AI to help peer backwards and reveal the fire, objects, and person controlling the experience.