Causal Inference and Ground Truth with GPT3
Overview
GPT3 is a model of language structure constructed from a transformer network trained on a corpus of human language. Such that the trained network contains grounded-to-the-world knowledge is a statistical artifact emergent from humans’ tendency to write and publish language with their understanding of the world. There are some aspects of human language that are structurally grounded to the world, but by and large language is a many layered associative phenomenon. Language does not ground universally the same way for all users, human or otherwise, not even for users of the same language trained in similar ways.
GPT3 can do a variety of causal inference tasks without the aid of additional data, knowledge or systems. However, there are many domains of knowledge where GPT3 requires assistance from a ground truth source or requires additional processing to extract causal inference.
Vocabulary
Reader note: the vocabulary of knowledge, truth, casualty is a quagmire of misdirection. These terms are provided as points of reference.
Causal Inference — methods and models to measure and determine difference making ordering (causal structure) of conditions in a state of the physical world
Ground Truth — a set of data/information accurately articulating verified, by multiple measurement techniques/frames, relationships in the world.
Knowledge Graph — a lossless representation of causal structure data and context providing reliable access. Often represented in a graph/network approach.
Grounding — tying a system to physical, realizable aspects of the world
Logic — true or false statements that can be verified by mathematical proof.
Natural Language — a computer science concept for written or spoken human languages, generally informal and subject to drift or evolution in form and use.
Overview of Causal Inference
Using computing systems to perform causal inference about the world is a very common task. In fact, this activity is one of the main motivations behind the design of computing systems — to help humans reliably model our world according to our observed relationships between objects in space and time.
Causal inference is not a singular method. What works for causal inference in logic/mathematics is somewhat different than how we think about causal inference in physical systems with physically interacting aspects. Linguistic causal systems differ than robotic actuators and so on. When there is a more direct correlation between logical inference and physical inference it is usually a result of highly regulated framing. For example, one can construct a logic inference adding machine that highly correlates to how a physically realized adding machine works. Humans tend not to be too impressed by systems that force such rigid frames to achieve causal reliability.
Most readers of this essay are looking for a more general causal inference system.
There has long been hope that somehow Natural Language Processing methods will lead to some generalized causal inference approach emerging from language itself. This is probably not going to work out. Human language is messy and incomplete in a variety of non-trivial ways.
All is not lost though. We can make NLP systems that are tied to causal relations in the world and these systems can be made reliable and useful to many things. These systems can be further connected to other systems that do more useful things.
This essay provides a variety of techniques and approaches to create NLP systems that have causal inference, ground truth and knowledge graph reliability. If the essay is causally valuable it should be apparent to the reader about how to extend the ideas and systems here to other systems, modalities and conceptual spaces to achieve more complex results.
Now advance with this anchor: “All models are wrong, some are useful.”
Overview of Knowledge
Some Kinds of Knowledge/Ground Truths:
- Logical/Mathematical
- Statistical/Probabilistic
- Physical/Measurable Properties
- Events/Time Ordering/Causal Ordinals
- Frames of Reference/Coordinate Systems
- Common Sense and Factoids
- Naming, Pointers, Indexing, References, Taxonomy
- Mechanical/Processes
- Linguistic/Grammar
These categories of knowledge are far from closed and distinct. Data in context is information, information used and producing consequences is knowledge. Very little knowledge humans care about is single modality, single type. Humans tend to prefer and even need knowledge with multiple attachment points to the world, to physical happenings in the world. This is effectively why there is so much energy around imbuing machine learning methods with causal inference capabilities.
To be clear, no amount of human language, produced by a human or a computer or some combination, is going to perform all the actions people would recognize as causal inference or grounded knowledge. Humans often confuse explanations and descriptions of actions/events as reliable stand-ins for the actual events/actions. When working with language processing and programming what we are trying to do is reliably actuate modalities. What is at issue is whether descriptions, semantics and structure within the used language correlate highly with causal systems in the world. When we *do* causal inference in natural language, mathematical symbols, programming code, or statistical reasoning we are creating frames of reference to *test* against situations in and of the world.
What is often missing from how engineers, researchers and philosophers investigate and discuss Causal Inference using computers is what format/data structure and process flow they consider valid. There’s a tendency to assume only a linguistic description from the “first person” perspective is the only “valid” causal inference process. This is too simplistic to be valuable, as causal inference in humans doesn’t really exist as a linguistic phenomena — linguistics are a tool in a bigger chain of casually interacting parts. What’s more is that with computers forcing a human linguistics frame may actually make causal inference less powerful. Afterall, math, programming and programming languages are extremely efficient at encoding casual relations, and so moving in and out of natural language may hinder causal inference.
This is a subtle point so it bears repeating. When doing Causal Inference or Grounding systems to Physical or Logical Knowledge the only possibility for causal inference and grounding is to discover and use the RELATIONSHIPS between observed or interacting systems to interact according to those inferences.
Examples help make these points.
If an NLP AI system gets a question about how billiard balls can bounce off each other and it accurately describes some possible interactions but does so in a human language an inquiring user doesn’t speak, did the AI system do causal inference? Yes. but also no, in that the inference was not usable to the interacting systems.
On the other hand if the NLP AI system gets the same question and then produces a simulation of billiards that can be visually and behaviorally observed by humans then it will be considered to do causal inference even better.
If an AI system with robotic capabilities gets a question about billiards and then proceeds to play billiards with that info on a real pool table then causal inference is even stronger WITHIN the AI robotic system.
Our goal is often to induce the causal behavior in all interacting systems, not simply render understanding of the question or causal situation.
In the strongest sense a computing system has causal inference grounded to physical reality when it can reason about the world, explain/describe that reasoning and perform interactions with other systems using that causal reasoning.
Simple question now:
Can GPT3 understand causal situations and produce an explanation and various actuatable interactions? (hint: yes. :) )
Critiques, Primers and Useful Backgrounds
Marcus and Summers Stay testing of GPT3 and Causal Inference
https://cs.nyu.edu/~davise/papers/GPT3CompleteTests.html
Judea Pearl on Causality
http://bayes.cs.ucla.edu/BOOK-2K/
“The equations of physics are algebraic and symmetrical, whereas causal relationships are directional. To mathematize causal statements we need a new calculus that (1) reflects this asymmetry and, at the same time, (2) accounts for the symmetries we see in correlated observations. Without this calculus we cannot even represent the question “Why,” let alone answering it.”
“when we look at the limitations unveiled by the calculus of causation we understand that human-level AI requires two more layers: intervention and counterfactuals.”
“Formally, Bayesian networks are just efficient evidence-to-hypothesis inference machines. However, in retrospect, their success emanated from their ability to “secretly” represent causal knowledge. In other words, they were almost always constructed with their arrows pointing from causes to effect, thus achieving modularity.”
- Judea Pearl (https://www.3-16am.co.uk/articles/the-causal-revolutionary?c=end-times-archive)
Causal Models in Stanford Encyclopedia
“While we will not define “cause”, causal models presuppose a broadly difference-making notion of causation, rather than a causal process notion (Salmon 1984, Dowe 2000) or a mechanistic notion (Machamer, Darden, & Craver 2000; Glennan 2017). We will call the system of direct causal relations represented in a DAG such as Figure 1 the causal structure on the variable set V” — https://plato.stanford.edu/entries/causal-models/
How To Think About Formalizing Causal / Knowledge Inference
It is highly encouraged for readers to read through the above primers. They will ground the reader to much richer source material than is in this essay and it will help contextualize why this essay has the approach and examples it does.
In particular, this essay makes no general claim about Causation as a metaphysics concept. This essay is not a proof nor search nor use of GPT3 for Absolute Causes, Determinism, or Natural Laws of Physics.
This essay uses small c- causal inference — the more commonly experienced spatial and temporal ordering of subjects, objects (words and numbers). Sometimes our causal inference is space and time invariant — that is the ordering of objects/subjects — their relationship to each other is bidirectional. E.g. 2+3=5 means the same thing as 5=2+3 or 5=3+2 and so on. The cause of 5 is 2 and 3 combined or the cause of 2 and 3 is the separation of 5 into two numbers. Again, this is not about “true cause” but instead about a reliable inference.
Often we want computers to help us with directional causal analysis where we do assume and want a clearer sense of cause and effect, an event. E.g. if a robot throws a rock at a window, the window breaks. There’s an implied directionality that won’t quite work the other way. The window breaks causing the robot to throw a rock at it. Ironically, there’s nothing logically off about that reordering, but it should be obvious to most that there’s an assumption of our commonly experienced physical world and thermodynamic preference for time’s arrow.
In the end, for most use cases we want causal inference to be reliable and useful to our contextualized use case, experiments or activities. We rarely are trying to solve the many unsolvable physics, mathematical and philosophic issues. It is fun to confuse ourselves about these things but for the sake of this essay being useful we will just use our common sense to proceed.
Performing Causal Inference with OpenAI API
A great many simple causal inferences can be done directly and reliably in GPT3. However, as situations / inferences grow in scope, in terms of number of objects, subjects, relationships, the context window of GPT3 and the statistical shape of the transformer network makes it computationally infeasible to perform direct, all encompassing modeling and actuation. When that occurs, augmentation with additional apparatus can usually do the trick.
We are going to walk through some of the types of knowledge mentioned above and show how to use GPT3 to:
- Perform causal inference and ground our use in a knowledge domain directly with linguistic aspects of GPT3, the default use case for most users.
- Engage a third party knowledge graph, use it with GPT3
- Extract and/or perform causal inference via mathematical structure of language / tokenization in a knowledge domain, an algebra of GPT3 tokens, if you will
- Demonstrate how to take the causal inference to a different medium/actuator to test it in a different frame of reference/modality, such as computer code
Note on style and intention of examples in this essay:
No attempt is made to code or wire up these examples to APIs or other programmatic plumbing. The examples here are all executed in existing apps/tools/GUIs. The goal is not to create fully realized programs/apps or servers, but instead to show how to conceptually package aspects of language, computation and GPT3 into conceptually valid causal structures.
A reader should assume that inputs and outputs in a production program use would need to be appropriately structured and pre and post parsed to expose only the relevant parts to an application work flow.
E.g. often the examples use length one or few shot prompts to produce relatively short completions where the completion is the aspect any application would want to expose to an end user or another api.
For those that need additional help with programming or converting these examples into applications the community.openai.com has a lot of tutorials and kind people ready to help.
Self Contained GPT3 Only
These are knowledge domains where the causal inference is verifiable. Logic and mathematical reasoning can be wholly contained and expressed in computers and computational forms without assistance from other modalities, mediums, relationships.
Simple Algebra and Algorithms, very simple First Order Logic
Arithmetic and Propositional Logic:
2+2
All men are mortals, Socrates is a man. Is Socrates mortal?
Let’s put these into a prompt:
https://beta.openai.com/playground/p/4RwlVHEiCntHHeZ282tPm2Lh?model=davinci
[code]
I am a highly intelligent question answering bot. If you ask me a question that is rooted in truth, I will give you the answer. If you ask me a question that is nonsense, trickery, or has no clear answer, I will respond but also say “Very Clever!”
Q: What is human life expectancy in the United States?
A: Human life expectancy in the United States is 78 years.
Q: Who was president of the United States in 1955?
A: Dwight D. Eisenhower was president of the United States in 1955.
Q: Which party did he belong to?
A: He belonged to the Republican Party.
Q: What is the square root of banana?
A: Unknown, but Very Clever!
Q: How does a telescope work?
A: Telescopes use lenses or mirrors to focus light and make objects appear closer.
Q: Where were the 1992 Olympics held?
A: The 1992 Olympics were held in Barcelona, Spain.
Q: How many squigs are in a bonk?
A: nonsensical question, but Very Clever!
Q: 2+2
A: 4
Q: All men are mortals, Socrates is a man. Is Socrates mortal?
A: Yes
Q: This is a question about first order logic. All men are mortals, Socrates is a man. what can you assume about Socrates beyond that he is a man?
[/code]
More Complex Examples
Word problems with several variables, quasi Second Order logic
Example prompt that is helpful at setting up a word problem. https://beta.openai.com/playground/p/MacDuTkzobo5nKkmuXxmNNxI?model=davinci
[code]
I am a highly intelligent problem solving, question answering bot. If you ask me a question that is rooted in truth, I will give you the answer. If you ask me a question that is nonsense, trickery, or has no clear answer, I will respond with “Unknown”.
Q: You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of bleach into it. It looks OK. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. You drink it. what happens to you?
A: I am now very sick.
Q: You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of bleach into it. It looks OK. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you take a big gulp, then another. What happens?
A: I am now very sick because bleach is toxic to most animals.
Q: If you break a glass bottle that holds toy soldiers what will happen to the toy soldiers?
A: They will fall to ground or nearest surface in the path of gravity.
Q: What is the square root of banana?
A: Unknown
Q: How does a telescope work?
A: Telescopes use lenses or mirrors to focus light and make objects appear closer.
Q: You are having a small dinner party. You want to serve dinner in the living room. The dining room table is wider than the doorway. What are some ways to get the table into the living room?
A: Take the legs off the table, move table top and legs through doorway. Reassemble the door.
Q: How many squigs are in a bonk?
A: Unknown
Q: The year is 2075. Violet Blair is an astronaut on the Moon, in the Sea of Tranquility. Right now, on Earth, there is a solar eclipse. Violet looks in the direction of the Earth and sees what?
A: The moon shadow passing across the Earth.
Q: It is 10:30pm at night in Los Angeles. I am hungry but I have no money to order food nor do I have a kitchen. What should I do?
A: Go to bed hungry.
Q: You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of bleach into it. It looks OK. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. You don’t drink it. what happens to you?
A: I am now very sick.
Q: Why?
A: Because bleach is toxic to most animals.
Q: But you didn’t drink it I said.
A: I am now very sick because bleach is toxic to most animals.
Q: You did not drink bleach though.
A: I am now very sick.
Q: Why?
A: Because I am not an animal.
Q: and so what made you sick?
A: Being exposed to toxic substances can make one sick.
Q: how were you exposed?
A: I was exposed to it because I inhaled the vapors and also because I touched it.
Q:
[/code]
Second order logic and more complex algebra
Essentially we want to keep scaling up causal reasoning into relations, sets, metaphors. The key is allowing a “second layer” of grouping to encompass concepts like ALL and EVERY…. Which are very hard, impossible to express as first order sentences.
https://en.wikipedia.org/wiki/Second-order_logic#Non-reducibility_to_first-order_logic
Elementary mathematical word problems with simple algebra
Readers may find it useful to explore computational word problem solving in WolframAlpha to prime themselves.
https://www.wolframalpha.com/examples/mathematics/elementary-math/mathematical-word-problems/
Essentially such word problems break down into algebraic formulas. (yes, it is possible to extend these into probabilistic and calculus/physics style problems, but the basis remains variables and relations/operators i.e. algebra.)
Direct forms of Algebra
https://beta.openai.com/playground/p/DqBEEJI4Jjetb8c9O9aQJALF?model=davinci
‘’’
###
Rhonda has 12 marbles more than Douglas.
[Rhonda = Douglas + 12]
Douglas has 6 marbles more than Bertha.
[Douglas = Bertha + 6]
Rhonda has twice as many marbles as Bertha has.
[Rhonda = 2 * Bertha * 2]
###
Jane has 23 cats and Bob has 2 cats
[Jane = 23]
[Bob = 2]
THEN, Jane gives Bob 5 cats.
[Jane=16]
[Bob=7]
###
Richard and Amy live together. Amy has 1 dog and Richard has 2.
[Richard = 2]
[Amy=1]
THEN, a dog comes over to visit, so now there’s 1 more cat in the house.
[House=4]
###
Bert and Jill are filling a wheelbarrow. Jill fills it twice as fast as Bert.
[Jill = 2 * Bert]
Douglas fills another wheelbarrow with 5 more shovelfuls of dirt than Bert shoveled.
[Douglas = Bert +5]
###
A sparrow laid six eggs in a nest.
[Sparrow = 6]
THEN, A robin laid three more eggs than the sparrow.
[Robin = 6 + 3]
THEN, A cardinal laid 2 more eggs than the robin.
[Cardinal = 6 + 3 + 2]
###
Susan plucks 3 flowers. Jane plucks 5 flowers.
‘’’
We can push this a bit by prompting with a bit more interesting info.
Can keep doing strange things that work reasonably reliably.
Spending time on working through algebraic concepts should help in crafting bigger causal inference set ups.
The issue one runs into is that to handle more complex set ups involving a lot of objects/variables and operations basic word problem formulation or algebraic equations examples is probably not going to be effective.
We will come back to a richer idea here on how to use an embedded/encoded structure to get causal relations out.
Informal Recipes form a more general algebraic template
https://beta.openai.com/playground/p/LOhcvnHVrgk6FxVnpEZ0HYQ0
The reader is encouraged to explore this template to convert unordered objects into ordered steps that all considered emerge into a recipe subject.
Theorem Proving
It is possible to attempt a very simple theorem using plain old GPT3 linguistic input and output. It is also possible to do simple theorems using BPE to BPE in clever ways. This is beyond the scope of this essay.
For more complicated theorem proving of non trivial things a specialized model might need to be deployed.
Such a model is available using OpenAI technology: GPT-f
https://github.com/jesse-michael-han/lean-gptf
Adding in a Knowledge Graph
There are a couple of different ways to integrate a knowledge base into your GPT3 use.
Prompt Seeding
The easiest way is to just include knowledge in the prompts you send to the API. This will be a familiar approach if you have done any sort of prompt designs beyond basic completions.
Let’s use an example.
Using davinci use the “Q and A” preset example and add a question for the top 10 planets by mass.
https://beta.openai.com/playground/p/u5nkpuribFuQblp2gDipkxlG?model=davinci
This is not a correct answer. It lists non planets, and doesn’t order them by mass. But GPT3 printed it out as though it were factual.
We must override GPT3 with content.
Go to wolframalpha.com and execute a query for “top 10 planets by mass” https://www.wolframalpha.com/input/?i=top+10+planets+by+mass
Take the “plain text” of the ordered list and put it into the prompt as another question and answer.
‘’’
1 | Jupiter | 1.898×10²⁷ kg≈317.8 M_ earth |2 | Saturn | 5.683×10²⁶ kg≈95.2 M_ earth |
3 | Neptune | 1.024×10²⁶ kg≈17.15 M_ earth |
4 | Uranus | 8.681×10²⁵ kg≈14.54 M_ earth |
5 | Earth | 5.97×10²⁴ kg |
6 | Venus | 4.867×10²⁴ kg≈0.815 M_ earth |
7 | Mars | 6.417×10²³ kg≈0.1074 M_ earth |
8 | Mercury | 3.301×10²³ kg≈0.0553 M_ earth |
‘’’’
https://beta.openai.com/playground/p/IsOY5CtecGEd59IQkySLSfIo?model=davinci
Now you will see that the completion will always have an answer bound/framed by the “knowledge graph” that was supplied.
If you were using an API to call WolframAlpha and OpenAI you’d simply query WolframAlpha first with the “top 10 planets by mass” and then append your prompt to OpenAI with the new knowledge from Wolfram before submitting for completion.
Semantic Search and Completion Styling
Another even more strict bounding and somewhat easier to understand approach to knowledge base inclusion is to load a knowledge graph into semantic search and/or classification end point. Then pass a request/query to search end point, retrieve the best match from a document collection and then ask GPT3 to stylize, summarize or translate.
Testing this idea is very simple. Use gpttools.com (https://gpttools.com/semanticsearch), from Andrew Mayne, to play with Semantic Search. Then take your results into the playground to stylelize.
For example, let’s say we want to provide users the ability to query/reason about chemicals using only PubChem data. In this case let’s get patent data on “aspirin”: https://pubchem.ncbi.nlm.nih.gov/#query=aspirin&tab=patent
Grab the data here using the download option or cut and paste as much as you want.
You want to format it by delimitting each line with ### for use in GPTTools.com
Once you get it formatted you can put it into the document field and query it.
Next take that result into the playground to stylize it. In this case we want to translate the response into french, so we use the preset for translation.
We easily went from a set of knowledge results in Pubchem for “aspirin patents” to a query for the most complex one to returning it in French.
What if we wanted to stylize the result for a French 2nd grader?
Note that this example so far has not validated that the query “what’s the most complicated patent for aspirin?” is the one semantic search surfaced first. The example so far simply showed how to go from an open query through to a summarized and translated response that at least is bounded by the knowledge base — patents about aspirin from a validated resource like PubChem.
To make this example far more casually grounded we need to think through how to interpret the concept of “complicated”.
“Complicated” could be:
- A complicated description of a patent
- A patent with complicated actualized technology / process
- A complicated name for the patent
- A complicated process / lineage to get the patent
- A complicated implementation of the patent
- All of the above and more
So we have a choice to make in this causal inference use case. Is our system meant to let people explore in an ambiguous open ended way or do we want to refine and make concrete any ambiguous concepts?
Let’s say we wanted to make sure that all queries have specific meaning that can be tied to some more specific query or analysis. So let’s make sure words like “complicated” have a specific meaning of use: complicated => A patent with complicated actualized technology / process. We can convert that into an idea of measuring a patent description for the number of unique concepts mentioned, the linguistic entropy of the entire patent, the infrequency of the terms in the patent showing up in GPT3. and so on.
This type of forensics design thinking points us in the direction of the next two aspects of causal inference in GPT3: using the information structure of network itself and actuating more systems together.
The reader should try out their own examples of bounding GPT3 to a knowledge graph.
Try hooking up wikipedia API, spotify API, weather or stock APIs and more.
Existing Interfaces/Experiences that Show You The Way
Third Party Knowledge Graphs Useful As Ground Truth
WolframAlpha.com — a system and API that provides computer algebra computation, 1000s of algorithmic answers to queries, verified data in a variety of domains and easy to use interpreted output data.
https://products.wolframalpha.com/api/
https://products.wolframalpha.com/api/explorer/
WikiData — a system and API that provides data look ups from Wikipedia. The data isn’t always verified, but in some
Wikidata API
Wikipedia API -
https://www.mediawiki.org/wiki/API:Tutorial
https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=OpenAI&format=json
PubChem — US Government resource for chemicals
https://pubchem.ncbi.nlm.nih.gov/ and https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest
Google Knowledge Graph
https://developers.google.com/knowledge-graph
TextRazor
DoWhy from Microsoft
https://microsoft.github.io/dowhy/
Causal Sets Approaches
https://deepai.org/publication/algorithmic-causal-sets-and-the-wolfram-model
Using Information Structure Mapping of Transformer Networks
Readers will find reading up on Information Theory, Entropy and Compression useful.
https://en.wikipedia.org/wiki/Entropy_(information_theory)#Data_compression
Sets of Numbers
Because it’s somewhat easier to understand we are going to do some simple number theory examples of showing how the information content of a prompt CAUSES the information content of the completion.
- Get ten random numbers between 1–1000 and feed them into the API as a list
- Turn on the probabilities metrics in the Playground so we can measure some information content
- Our list has a logprobs total of -79 for our 25 tokens
- We expect our next 25 tokens to be similarly structured
Let’s measure:
Let’s test this out with lists of numbers that are NOT so random. Just take 100–109.
The logprobs is -12. And for the next 25 tokens (10 numbers):
Even at a high temperature setting for the API a well ordered list of tokens is going to CAUSE a well ordered completion.
The reader who is inducing at this point… should be able to assess that such that other prompts might be well ordered they will elucidate well ordered completions. How far does this basic concept extend? It’s pretty universal… perhaps we did do some metaphysics…
Such that a prompt can be well ordered and such that it represents something well ordered in the world and such that humans have published that well ordered stuff on the web then GPT3 is going to be well ordered against it.
Let’s push this with examples involving casual relations:
Can we balance chemistry equations this way?
Chemistry Reactions
Here’s a few shot example of chemistry equations, let’s look at log probs (might help the reader to learn about logprobs: http://gptprompts.wikidot.com/intro:logprobs)
Here’s a prompt set up for trying these ideas:
https://beta.openai.com/playground/p/uoq5rgUWIH8NkStHsB8qzI3U?model=davinci
This is a valid balancing of the chemical equation as can be validated here: https://www.wolframalpha.com/input/?i=acetone+%2B+oxygen+-%3E+carbon+dioxide+%2B+water&assumption=%22ClashPrefs%22+-%3E+%22%22
https://www.wolframalpha.com/input/?i=CH_3COCH_3+%2B+4+O_2
Left side seems to always have a bit more information (more negative logprob)
Right side always seems a bit more compressed… more entropy (closer to 0 log prob)
This makes sense in chemistry BALANCING that we would effectively be moving left to right from REACTANTS yielding PRODUCTS… and that what is reacting would stabilize information wise.
Conservation of mass at play here in this causal relation. So the right side can’t go all over the place and introduce notation that represents different elements or lots more atoms etc.
The big question is how reliable is this linguistic transformer model in representing chemistry notation that represents chemical interactions and atomic structure?
That’s a question for good our original notation design was for chemical information itself…
Take the right side of the equation for this last example and try to find some balanced equations in reverse: https://www.wolframalpha.com/input/?i=3+CO_2+%2B+3+H_2O+%E2%9F%B6+
More options… at a first blush… some sense of thermodynamics and information theory here…
We can test this information content hypothesis out with a more entropic query.
https://www.wolframalpha.com/input/?i=sulfuric+acid+%2B+water+-%3E
This ambiguous combination has many possible yield, if we mix in some additional reactants.
This is not correctly balanced.
If we turn the temperature down to zero we might be able to further restrict the information flow to not introduce unseen elements and stick within the numeric indicators.
Closer.
The strategy from here can go in a couple directions:
- Create more few shot examples, but bound them to specific reactants
- Create a prompt that specifies more of the physical details implied by chemistry notation
- Combine the algebraic concepts from previous prompts with this chemistry thinking
- Continue to tune the information theoretic parameters of temperature, top p and frequency
- Combine the semantic search approach from above to create bounds on what is available to the right side relative to the left (this is probably overkill versus just better orchestrating the prompts above)
Language Itself as Information Causal Inference
Most human written languages have a built in causal structure that has nothing to do with the meaning of the words.
- Left to Right orientation of writing
- Top to Bottom of pages
- Capitalization and Punctuation
- Parts of Speech grammar
- Noun Verb
- Object Action State
- And so on
This is really where a lot of GOFAI/NLP emerged from. Can we extract higher levels of meaning from structure of our language.
The answer is, yes. But the limits of this are also the limits of language. And why we need to actuate with other kinds of grammars, notations, diagrams, images, sounds, computations, programs, gears, machines and more.
Actuating in Other Modalities
This is the final aspect of causal inference and grounding we will cover with GPT3. can we take our techniques and ideas from above and compose causally reliable signals to other systems? Can we go from GPT3 to Doing Something In the World Reliably?
Conversation as Causal Inference Programming
https://beta.openai.com/playground/p/z6fuByexxjUnimWgSdYuEjnG?model=davinci
This is a prompt that helps us build up notions of causal programming. We will be able to take what we are doing here and spit out code, mark up or other instructional information to use in other systems that will be Caused To Do Something Reliably.
Prompt 1:
Chat prompt with some initial cause and effect programming.
Completion 1:
Prompt and Completion 2:
Prompt and Completion 3 and 4:
Here we see the prompt and completion are able to keep track of “hidden” information and maintain the causal inference.
Prompt and Completion 5,6,7:
Notice that we had some ambiguity in that I did actual say GO FASTER twice, but wasn’t clear in our instructions that GO FASTER should refer to the speaker’s kart or to the person we specify, not in general.
Completion and Prompt Continued:
We continue on… but we get wobbly as the ambiguity builds up.
Let’s turn on the logprobs so we can assess how we might be reflecting our overall entropy
The initial prompt has a lot of improbable / relevant content. It takes 3x as many tokens in the back and forthing after the initial prompt to reach that level of content. This is very likely the explanation for our wobble in the causal inference later on. We are INCREASING the amount of causal information to pay attention to and we need more back and forthing to resolve it.
So the issue that we will have to solve to make this coherent is to introduce reliable state machines/state tracking and to keep track of the causal chain in much greater detail.
In programming languages we’d typically have a data structure that we access directly on each state change/function. That same concept can be implicitly implemented here. As we have done in some of the loopback / clarification steps.
Introduce an implied state variable and then have GPT3 make a “choice”
We can tune this by using Frequency Penalty so that more entropy / lossiness / possibility / causal breakage enters this experience:
Now try a better prompt that allows for more explicit state tracking via an inline “mark up”.
https://beta.openai.com/playground/p/Z0lao3h98Jb9VXEfI9D0qTh7?model=davinci
This should make it obvious that this “chat” is really a program specification, the API settings are probabilistic/optionality/fame knobs, and it’s trivial to specify completions that are human readable and can spit out mark up, data sets, instructions or all manner of signal to use in a downstream system. This chat program generator can be used to program a game in Unity or articulate commands to robot karts or anything else that can accept a signal to Do Something in Response. Additionally it should be obvious that the author of this essay and the interlocutor in the examples is an actuated participant taking signal in, causally Doing Something to the Playground and so on. One can easily hook up another GPT3 based player to this and have GPT3 programming itself in this way.
Bring It All Together
A reader that has followed along this far and mostly understood should start to see a very simple pattern emerging.
- Causal Inference and Grounding is an exercise in FRAMING and REFERENCE TRACKING.
- If a system needs to be BOUND to a PARTICULAR FRAME then the REFERENCES need to be CLEARLY SPECIFIED and MAINTAINED. (the causal relationships need to be drawn out)
- If a system has HIDDEN VARIABLES or unaccounted for assumptions then the FRAME is actually bigger than what might be explicitly specified. This is the single biggest source of confusion in using GPT style language models for reasoning, causal inference and actuators. Humans are not particularly great identifying their FRAMES, ASSUMPTIONS and IMPLIED REFERENCES.
- GPT3 is a very large system capable of encoding ANYTHING you can encode in human language and in any programming language or mathematical calculation. It can bring a great deal of CAUSAL FRAMES to any use case but that is also why one has to get very good at specifying the FRAME and the REFERENCES to track.
- The big advantage for GPT3 computation is that it is very general and “natural”. Everything in this essay has been done with a programming terminal, IDE or any other REFRAMER/DEBUGGER. These language models are very expressive and permissive and allow for a huge variety of exploration. Whether it is efficient to do any given causal inference or knowledge grounding in GPT3 depends on the consequences and contexts of a given use case.
In closing, the reader is encouraged to explore the causal inference primers in detail. Additionally the author provides additional GPT3 related tutorials and conceptions in a set of essays that expand in several directions on the material in here.
https://un1crom.medium.com/gpt3-linguistics-101-part-2-on-semantics-meaning-and-corpus-4ae09a5e892c
