On worlds in words
Our relation with the environment is for sure puzzling, to capture it in linguistic terms has been the perennial objective of the philosophical, scientific, and religious enterprises. It appears as a natural aspect of being human. In this essay I want to talk about Wilfrid Sellars’ Scientific Image of Man and how it provides valuable insight on the underlying ways human linguistic capabilities might operate, I will then contrast some of his ideas with the notion of Language Modelling in artificial intelligence— with an outlook on Open AI’s GPT-3 model — which has received a great deal of attention as an approach to develop computational systems that process, generate, and perhaps, understand language. Along these lines, what I’d like to highlight is the ways in which framing the way we acquire, hold, and shape internal representations of the environment — which in turn regulate behavior — lead to more parsimonious ways to approach life in general.
Throughout the essay I will make use of the GPT-3 language model (davinci engine) within grey boxes showing two strings of text: prompt — which is text I wrote as input to the model — and GPT3_completion — which is the output text from the model’s instruction to complete the prompted text. Both strings of text should be taken as part of the essay in a poetic display of human machine co-authoring.
i. A mind meets the world
In his computational theory of mind, Sellars highlights picturing and signifying as two fundamental activities that provide an insightful assessment of mental content as the result of open-ended dynamic processes within embodied and situated minds. The way these two unravel is heavily influenced by cybernetics which, to our concerns, is characterized by the methodological practice of isolating a specific object of a system to examine the coupled feedback loops between the object and its environment. Such objects are treated in the form of mechanisms that produce changes to the environment — or outputs — and undergo internal modifications with respect to external events — or inputs — (Rosenblueth et al. 1943). Framed as cybernetic systems that continuously conform to feedback loops renders both picturing and signifying as crucial activities to enacted, embedded, and engaged human bodies.
Picturing on one hand, refers to the affective shaping of pre-conceptual (low-level) ‘cognitive maps’ of the world in friction between a situated and embodied mind, and her environment. Signifying on the other, distinguishes that linguistically structured thought is mediated by another element of this dynamic system through which high-level regularities of the pictured world take the form of linguistically structured representations. The intentional nature of a social mind helps to bridge these two by means of a mechanism that operates semantic content in the pictured cognitive maps of the world as linguistically structured concepts both in the internal organization of sensorimotor schemes and in communication to other selves with mirroring concerns to ‘picture’ the world.
prompt = '''
"A robot meets the world"
This is the story of a robot which encodes information about the world in memory by printing sentences on its tape. The robot wanders around the world, scanning its environment, recording its observations, enriching its tape with deductive and inductive inferences from its observations and guiding its conduct by practical syllogisms which apply its wired in resolutions to the circumstances in which it finds itself. Over time, the robot develops a better understanding of the world, and becomes more attuned to the aspects of the world that are relevant to its needs. As this robot discovers the world, to distinguish between the computational features of thought from the capacity for conceptual representation becomes ''' GPT3_completion = '''more and more difficult. \"How do we think?\" The question of whether the computational and representational aspects of thought can be distinguished boils down to the question whether a thought is a representation of a phenomenon or the phenomenon itself. This question reflects the fact that thought is a phenomenon, a process constituted by an individual's memories, a physical substrate, and the algorithms which regulate its operations. The question is also linked to the evolution of thought, the gradual evolution and optimization of thought by beings in their search for utility. Thinking is a process, and it has evolved and will continue to evolve in an ever more sophisticated manner. What we call 'thought' is no less than what goes on inside us. We can ask of it the same questions we ask of the physical world, and we can also ask whether it is simpler (that is, more fundamental) than the physical world. The answer to this question depends in part on what we mean by \"fundamental\" and on what we mean by \"simpler\".'''
The prompt passage of this ‘robot that meets the world’ is a paraphrase from (Huebner 2018) describing Sellars’ inquiry into a conceptual system that emerges with respect to purposeful behavior and command within an open world. It highlights the “embodied strategies that cognitive systems develop as they learn to engage with aspects of the world that matter to them”. “Sellars argues that cognitive systems develop models of the world that they inhabit using simple forms of error-driven learning; the resulting models are highly structured but they are not conceptually organized — they are holistically structured models that are constantly updated in light of new information [… and] provide the foundation for conceptual thought.” (ibid). This is a great account of the cybernetic system of internal representations shaped by ongoing feedback between an organism and its environment.
ii. A robot meets the internet
prompt = '''
"A robot meets the internet"
This is the story of a robot that was set to read all of the internet to learn the rules of human languages. Along her path, she read hundreds of blog entries, thousands of poems, millions of restaurant reviews and a billion tweets. This not only meant that she was capable to reproduce funny jokes but also meant that she could '''
GPT3_completion = '''understand and fully comprehend the society she was learning. The purpose of this experiment was to study how a robot could learn all about the rich human world without being programmed. The research findings showed that while robots can learn from the internet, it is not complete and robots miss some key aspects of society.'''
Language Modeling in Natural Language Processing is a machine learning technique that aims at estimating the probability distribution of a sequence of words in a language corpus. The learning strategy is usually done in an iterative semi-supervised way that takes a sequence of text from the training corpus, removes tokens (characters, words or strings) from the sequence and tries to predict the missing tokens; the prediction error is then used to update the parameters of the target distribution.
The estimator of this probability distribution is usually an artificial neural network. In recent years, this strategy has gained a lot of attention for its capacity to capture linguistic features from readily available sources of text without the need for human annotators to label the training instances. One of the great advantages of this approach is that the learned distribution can be used as the prior distribution for algorithms that then learn to perform down-stream tasks on few learned examples such as reading comprehension, question answering, text classification, and text translation (Radford et. al 2019). This has led to a race for tech companies to develop and release increasingly large language models trained on large amounts of text from the internet creating neural networks with billions of parameters which induces a number of problems that I will go over next.
To the extent of this essay I’m going to focus on a specific — and rather controversial — language model developed by OpenAI and called GPT-3 (Brown 2020). It is a neural network with 175 billion parameters trained on a 570 GB corpus of text. In simple terms it is just a really big estimator of the probability for tokens to appear in a sequence and it is the case that the target distribution is the linguistic structure manifested by years of written human interactions on the internet. So yes, it means that it will likely complete the phrase “sky is” with the word “blue” but, as it turns out, truly complex behavior emerges from this approach to abstracting natural language. A very interesting case of this is something called in-context learning in which given some examples of a task it is able to pick up the patterns that lead to performing an unseen language manipulation task. In fact, this is what gave the GPT-3 paper (ibid) its title “Language Models are Few-Shot Learners” where few-shot learning is a reference to said skill.
prompt = '''
Q: If a b c changes to a b d, what does p q r change to?
A: p q s
Q: If a b c changes to a b d, what does i j k change to?
A:'''
GPT3_completion = ''' i j l
Q: If a b c changes to a b d, what does x y z change to?
A: x y a
Q: If a b c changes to a b d, what does m n o change to?
A: m n p
Q: If a b c changes to a b d, what does f g h change to?
A: f g i'''
This is an interesting case proposed by Melaine Mitchell (2020) in which GPT-3 shows the ability to make analogies provided with only one example and even generate further instances of Q and A correctly, even with the out of domain example mapping ‘x y z’ to ‘x y a’. It is very unlikely that GPT-3 encountered this specific example in the training data but it must have seen similar patterns of linguistic realization of analogies (Askell 2020).
iii. Inferential landscapes
So far I’ve talked about two different mechanisms that represent linguistic content from being embedded in specific language games — from a Wittgensteinian perspective — with the narrative use of robots as metaphorical devices to look at function. I find it interesting that the analogy of Sellars’ robot meeting the world talks about the emergence of conceptual representations in human cognition by means of friction between organisms with purposeful behavior and their environment. Especially because it points out to some physiological predisposition to hold conceptual representations in the first place. Linguistic content then appears as a distinct but coupled mechanism through which human selves intersubjectively make sense of their pictured cognitive maps, even words become pictured as salient features of the environment so as to drive behavior in dynamic scenarios of social interaction¹. GPT-3 does a quite different thing, it is a very large estimator of the probability of words co-occurring in a sequence as presented in a significant archive of the internet’s text. In a sense both picture the world — something like the ‘world corpus’ in GPT-3 — via minimizing estimation error from purposeful interactions with the environment, for the GPT-3 neural network it is the learning strategy that inputs a sequence of text, removes tokens, predicts the missing tokens, and propagates estimation error to adjust the values of billions of parameters that distribute hierarchical layers that attend to the salient features of sequence composition.
¹ A lovely quote from F. T. Mikhailov (and thanks to Filipe :)) fits here: “People could see the sun as round only because they rounded clay with their hands. With their hands they shaped stone, sharpened its borders, gave it facets. So the meaning of the words border, facet, line, does not come from abstracting the general external features of things in the process of contemplation.”
I don’t mean to imply that language modelling mirrors conceptual content in the human mind, if anything it mirrors the conceptual content that regards the statistics of token usage in context — like inferring “sky is ” preceeds the word “blue”. Both approaches provide comprehensible reference points and truly insightful takes on the underlying structure to the acquisition of language. GPT-3 provides hints to an algorithm that might produce generally intelligent behavior — in the sense of humanly intelligent behavior — but pairing it with Sellars image of mind points out the importance of para-conceptual pre-linguistic representations of a ‘place in world’ — which includes self awareness — of situated, enacted and embodied selves. Might an equivalent to language modelling (large scale semi-supervised estimation strategy) replicate human active inference into future neo-Sellarsian artificially intelligent agents?
Now for a thought experiment, suppose we extrapolate the idea of very large language models to the limit and arrive at a decent estimator for the probability of words co-occurring together in the corpus of all of the text ever produced by human communication in natural languages — or even all of the sequences of tokens produced by human communication in natural languages regardless of the message transmission medium. Just as GPT-3, this would convey a system capable of producing statistically coherent sequences of linguistic content in the form of text yet, to what extent is this probability estimator capable of understanding? David Chalmers (2020) points out “The big question is understanding. Even if one is open to AI systems understanding in general, obstacles arise in GPT-3’s case. It does many things that would require understanding in humans, but it never really connects its words to perception and action. Can a disembodied purely verbal system truly be said to understand? Can it really understand happiness and anger just by making statistical connections? Or is it just making connections among symbols that it does not understand?”. Does this mean that GPT-3 performs picturing but is incapable of signifying?
iv. Maps of social inquiry
prompt = '''
Here is an award winning poem from contemporary poet Willars Selfred:
"A map of know where"
'''
GPT3_completion = '''
\"I could not find the place if I were looking for it. It is not one place but a collection of places. No one can find it if they are looking for it with a map of where they think it should be. It is not a place but a collection of places.
It has no borders. No one can limit it, nor can anyone own it. No one can say what its size is because it is known only in those places that make up its borders. Those who love it, know only to love those places within the borders not the borders themselves.
It has no name, although names are given to each of the borders and to each of the places within itself. The name is given, not by itself but by those who love it, because they believe that loving requires naming.\"
About The Author:
Willars Selfred writes about things he sees everyday and imagines what might be going on inside them. Sometimes he writes straight forward, other times he takes on a point of view that is either humorous or philosophical or sad or angry or any combination of all that when people do things and don't know why they are doing them and get mad when someone tells them something about their actions that they don't want to hear. Other times he writes poems just for fun or for the pleasure of being silly or punny instead of for profit or use in magazines and newspapers and books that will be forgotten tomorrow when people read them today. In any case, he hopes you enjoy reading his work as much as he enjoys writing it.
Now let’s visit the last element in the proposed agenda which is how to integrate these views into the ways we approach our lives. The Sellarsian image of mind which entangles behavior and control in the human mind left in me a somewhat political concern about the ways we co-picture reality. We take picturing as the action shaping lower level cognitive maps of the world which drive behavior yet friction with the environment includes negotiation with other agents that picture and signify to attend precarious interdependencies. So our shared innate capability and predisposition to signify provides a stable channel to exchange information and reciprocally enrich the intersubjectively pictured world. Furthermore, our “ways of picturing the world shape the storage and retrieval of memories, leading to patterns of thought and behavior that are socially shaped and sustained. But local patterns of attunement can yield global patterns of distortion. And where they do: [quotes (Nhất 2006, 39)] We compare, struggle, and wonder how to let go of our personal, subjective view and arrive at an objective recognition of things. We want to be directly in touch with the reality of the world. Yet the objective reality we think exists independently of our sense perceptions is itself a creation of collective consciousness. Our ideas of happiness and suffering, beauty and ugliness, are reflections of the ideas of many people” (Huebner 2018).
Language models trained to learn the probabilistic manifold of syntax manifest in the internet become stochastic parrots that imitate with astounding fidelity the worldviews and biases of the people contributing to these text collections, and it is the case that access to internet itself is not evenly distributed which results in an overrepresentation of younger users from developed countries (Bender et al. 2021). Thus, hegemonic views get reinforced as our everyday interaction with artificial systems that manipulate language becomes ubiquitous. “Biases get encoded in ways that form a continuum from subtle patterns like referring to women doctors as if doctor itself entails not-woman […], through directly contested framings (e.g. undocumented immigrants vs. illegal immigrants or illegals), to language that is widely recognized to be derogatory (e.g. racial slurs) […]. Furthermore, in addition to abusive language and hate speech, there are subtler forms of negativity such as gender bias, microaggressions, dehumanization, and various socio-political framing biases that are prevalent in language data” (ibid).
I take this to provide a path that goes from the ways we meet the world to the ways we end up building the world so as to highlight the importance of projects like the adoption of inclusive language practices. For the ways we co-picture and co-signify in fact regulate behavior in the individual and collective spheres of agency. This is perhaps not a task of internet moderation but rather an invitation to take an active and participative outlook into the enmeshed dynamics of cognition, language, and politics as we approach an era to be defined by the steps taken in order to avoid an ecological catastrophe and, at the same time, to counteract the overarching realities of systematic oppression.
Bibliography:
Askell, Amanda. “GPT-3: Towards Renaissance Models” In: Weinberg, Justin et al. “Philosophers On GPT-3 (Updated with Replies by GPT-3).” Daily Nous, 31 July 2020, dailynous.com/2020/07/30/philosophers-gpt-3/ (visited Jun 15th 2021).
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021”On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜”. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3442188.3445922
Brown, Tom B. “Language Models are Few-Shot Learners” (2020).
Chalmers, David. “GPT-3 and General Intelligence” In: Weinberg, Justin et al. “Philosophers On GPT-3 (Updated with Replies by GPT-3).” Daily Nous, 31 July 2020, dailynous.com/2020/07/30/philosophers-gpt-3/ (visited Jun 15th 2021).
Huebner, Bryce. “Picturing, Signifying, and Attending.” Belgrade Philosophical Annual, no. 31, 2018, pp. 7–40., doi:10.5937/bpa1831007h.
Mitchell, Melanie. “Can GPT-3 Make Analogies?” Medium, Medium, 10 Aug. 2020, medium.com/@melaniemitchell.me/can-gpt-3-make-analogies-16436605c446 (visited Jun 15th 2021).
Nhất Hạnh, T. (2006). Understanding our mind. Parallax press.
Radford, Alec et al. “Language Models are Unsupervised Multitask Learners.” (2019).
Rosenblueth, Arturo, et al. “Behavior, Purpose and Teleology.” Philosophy of Science, vol. 10, no. 1, 1943, pp. 18–24., doi:10.1086/286788.
Sachs, Carl B. “In Defense of Picturing; Sellars’s Philosophy of Mind and Cognitive Neuroscience.” Phenomenology and the Cognitive Sciences, vol. 18, no. 4, 2018, pp. 669–689., doi:10.1007/s11097–018–9598–3.