Melanie Mitchell

Aug 10, 2020

4 min read

Follow-up to “Can GPT-3 Make Analogies?”

By Melanie Mitchell

This is a very brief follow-up to my earlier post, Can GPT-3 Make Analogies?. After posting this piece, I received many questions and suggestions on Twitter and by email. Due to time constraints, I’m not able to answer all the questions or try all the interesting experiments people suggested, but I’ll answer a few of the most common questions I got.

  1. Several people speculated that GPT-3’s training data included papers or books that discussed Copycat analogy problems, and that it could be using that data to answer the questions. To address this concern, I tried several of the problems discussed in my earlier post, but using different letter strings, ones that I don’t think were ever used in previous publications. I didn’t find any difference in the results reported in my previous post, so I conclude that any inclusion of Copycat analogies in GPT-3’s training data is not likely to be responsible for its performance here.
  2. Others noted that all of my examples used some notion of “successorship” in either the alphabet or numerically, and asked how GPT-3 would respond to other types of letter-string analogies. I tried the following two problems, which involved tripling each letter in the string (using the same GPT-3 settings as in my original post):



On both problems, GPT-3 got the right answer (y y y r r r q q q l l l v v v for problem 1 and e e e q q q for problem 2) on all five trials. GPT-3 indeed seems to get the idea of “triple the letters in the string”.

Next I tried this problem, which involves the concept of reversing a string:


GPT-3 never gave the “reversal” answer on any of the five trials. Here are its answers:

3. The GPT-3 API has a settable parameter called “Temperature”; the instructions say it “controls randomness”. In short, for each “word” in its output, GPT-3 computes probabilities over all possible words (or characters) to output, and the temperature controls how randomly it will choose among these outputs according to their probabilities. Several people suggested that rather than using the default temperature (0.7), I should use the minimum temperature (0). I tried this on several problems, and in all cases the performance was worse (or in one or two cases the same) as in my reported experiments. So minimum temperature will not help the system’s performance on these analogy problems.

4. In my original post, I said this about GPT-3 versus human analogy-making:

When [GPT-3] does succeed, it does so only after being shown some number of “training examples”. To my mind, this defeats the purpose of analogy-making, which is perhaps the only “zero-shot learning” mechanism in human cognition — that is, you adapt the knowledge you have about one situation to a new situation. You (a human, I assume) do not learn to make analogies by studying examples of analogies; you just make them. All the time. Most of the time you are not even aware that you are making analogies.

A few people challenged my assertion, saying that we humans spend our childhood (as well as perhaps our evolutionary history) learning about pattern-recognition and analogy, so analogy-making is not “zero shot” learning, as I claimed.

Just to clarify what I meant: We humans are good at perceiving abstract similarity between different (but analogous) situations. We can do this from a very young age (if not from birth), though our ability at thinking abstractly of course improves with age (and likely over evolution). My point is that we don’t have to be taught how to do this by explicitly being shown analogies in the way I had to show GPT-3, by first giving examples of solved analogy problems. I believe we are born with some innate ability to make abstractions and perceive abstract similarity, and these abilities get better over time as we improve our ability to think abstractly.