Exploring the Logical Reasoning Gap in ChatGPT with a sample

3 min readApr 1, 2023

To highlight the importance of logical reasoning, it is enough to mention that without correct logical reasoning, a language model is unreliable, despite including reliable data. In other words, no matter how large a language model is, if the logical reasoning is unreliable, it would not find more serious usage than a well-interfaced information model. Of course, many applications, like art, games, etc., do not require logical reliability as much as health-related issues like Medical applications or autonomous driving systems.

In this short text, to explore the logical gap in GPT, I mix logic with a simple math problem and put the model to the test. I used the openly available chat GPT3, and it is worth mentioning that GPT4 is much improved in many aspects, shown in a related recent paper: Sparks of AGI: early experiments with GPT4.

My first reaction after reading the paper was to test the logical understanding of the language model firsthand.

So I installed one of the phone apps connected to the OpenAI API and was already ready to jump in.

An interesting fact shown in the examples in the paper is that you can train the model through the user interface just by entering text. This does not replace the main training of the language model and does not remain on the model.

The “running session of model” withholds and acts upon the information you give. This short-term memory/understanding resets when you start a new session, and the given text is limited in size. This feature is necessary for chatbots and separates them from question-answering systems.

Although this feature is made for conversation-following purposes, it can be used to train the model through text without having access to the original huge GPT model and does not need expensive machinery to train the model. With this prelude, let's start to chat.

It understands the meaning of “*” and gives a correct answer while explaining it in plain text.

Now let's do some logic “tuning”:

As mentioned in the paper, the self-explaining and understanding of simple logic are impressive. It learns the new logic for the multiplication of data, explains the logic, and makes examples to show it has really understood it. The explanation of the logic looks inaccurate, though the examples are correct.

Let’s test our short memory trained model now:

Yes! It is the correct answer. So far, so good. Let’s make it a bit more complicated.

Well, it is a long way to go. It took me a while to find where this mistake came from. It looks like it is using the wrong formula it found in the first place. It looks for a number that half of it becomes our number on the left-hand side of the equation. So it tried to break down the equation and solve it step by step but failed because of the wrong assumption it generated based on my tuning text formula given to it.

This blog explored exposing the model to hypothetical math with a special logical rule. The model showed interesting results with a mistake.

The lesson learned is a model that giving correct answers might use wrong logic. Therefore, learned logical explanation is important.

The second lesson is that to get a correct answer when giving the data to the model, one must always check its explanation from its understanding and guide it through its mistakes.

Exploring the Logical Reasoning Gap in ChatGPT with a sample

Written by Afshin Sadeghi