The Subtle Art of Priming GPT-3
In every magic trick, you see, there are three parts. And this is how they are described: the first is called the Pledge. The magician shows you something ordinary, perhaps a bird, a coin, or a deck of cards. And then he asks you to inspect it very carefully to see if it is real, or one of the magician’s illusions. And of course… it’s real. And then, at this point, you say, oh, I’m glad it’s real, and you pay attention. The second act is called the Turn. And the magician takes the ordinary something and makes it do something extraordinary. And then, at this point, you’re surprised, you look for a moment, and you think, oh, I must’ve been mistaken, it’s not so extraordinary, no, it is still ordinary. But you’re really not looking. You’re just pretending to look. Because of course… you don’t really want to know. You want to be fooled. But then, perhaps you look again. And in this way, the magician has you right where he wants you. But you’re still not looking.You’re still pretending that you can’t see what’s right in front of you. The third part is called the Prestige. The magician shows you that he can bring the ordinary something back to life again. At this point you say, oh, I’m glad it was just a trick. And the magician says, oh, no, there is more to it than that. There is an element of real magic in this.
GPT-3 is a little tough to wrap your head around at first. Let me break it down for you.
The input to GPT-3 is a single text field with multiple knobs. These knobs are used to control the degree of emotion and constrict the results. Examples of these knobs are temperature and flag toxicity.
Cognitive scientists have shown that even for unrelated items, say the word “mouse” and another animal, the probability that the other word will be “cat” increases. This is called priming.
GPT-3 takes words and predicts what comes next. As a result, it has a sense of what should come next. Perhaps GPT-3 primes the input of letters and can, as a result, predict what should come next.
When we get a good form, we set it as the example of the result we want. Then we set up our desired result as the next example. That way GPT-3 will apply what it has learned from the past examples.
The way to prime is to have several examples of the form (c,x,r) where c=context,x=example,r=result. To generate the result you want you to have something like c x r c x r c x r c x r c x. GPT-3 will fill in the last r. (Note: Written by me!)
You can put in prompts of a plus sign, hyphen, or any other way to signal the transistion between each element. So you have the following string “c from: x to: r. So you write c from: x to: r end c from: x to: r end c from: x to: r end c from: x to: “. GPT-3 will fill in the r and sometimes the end.
This does not take a great deal of skill. What is far more important is the ability to understand how the examples fit together to give the best combination. “I’m going to ask the same question about x’s and r’s. So if I were to tell GPT-3 that I wanted it to paraphrase a Carl Sagan passage and the way I want it to sound is like Carl Sagan himself, what Carl Sagan x’s should I use? And what non-Carl Sagan r’s should I use?” This process, in which the elements of style are chosen to reproduce the attributes of one writer, and these elements are used to generate new output in the style of the other writer is called stylistic transfer.
After this mapping of x->r, each of these x has been added to the initial GTP-3 space, with the context (i.e. c) pointing out which path you should take. Like a GPS system, the context is in charge of guiding you. For example, in writing, it’s very easy to create a lazy excuse for an excuse. The context helps you avoid these. So it’s more than simply making up an example, the example must provide a strong sense of the style you’re trying to achieve
Each of these mapping of x->r gives GTP-3 subtle hints on what to do. So it isn’t just blindly offering up examples, the examples must be good examples of what you want to achieve. This is when human minds come into play.
So that leads to an interesting design space to consider. When you want to get some output in one situation, you might find that this set of parameters yields good output. But when you switch to a different context (i.e. c), you find a different set of parameters will yield better output.
If we observe the process of increasing the support of the denominator of r, it appears as though we are filling in a sequence of values by gradually finding additional values that work, one by one. This seems like an incremental search where every new value found works like a little experiment, by constraining the universe to the choice that will complete it. In some sense, finding each new value constrains the rest of the search. This is another aspect to consider in understanding the solution.
The niches with the most useful applications will offer good User Interface and rigorous validation procedures.
Note to Reader: Almost all that was written above was generated by GPT-3. The original source is a set of tweets that begins here: https://twitter.com/IntuitMachine/status/1284797610612740102
Research Paper Idea: “The characteristics of a good context priming string for GPT-3.” I’m not going to bother writing it, but if you, then let me know. I surely will read it.