Does ChatGPT really ‘understand’ the language?

Evaluation via bAbi tasks

Yogesh Haribhau Kulkarni (PhD)
Technology Hits
9 min readApr 25, 2023

--

Photo by Jelleke Vanooteghem on Unsplash

Folks at Facebook AI Research came up with a set of seemingly easy, toy-ish, childish tasks (called ‘bAbi’) to gauge the ‘understanding’ of a Question-Answering system, like any chatbot. Tried them on ChatGPT today and here are the results:

Task 1: Single Supporting Fact

Mary went to the bathroom.
John moved to the hallway.
Mary travelled to the office.
Where is Mary?

Correct Answer: office
ChatGPT response: office
Verdict: Right

Task 2: Two Supporting Facts

John is in the play ground.
John picked up the football,
Bob went to the kitchen.
Where is the football?
Correct Answer: playground
ChatGPT Response: Based on the given information, the football is most likely with John in the playground, as he is the one who picked it up. However, there is no further information provided to confirm the current location of the football.
Verdict: Well, though verbose, right

Task 3: Three Supporting Facts

John picked up the apple.
John went to the office.
John went to the kitchen.
John dropped the apple.
Where was the apple before the kitchen?
Correct Answer: office
ChatGPT Response: The given information does not provide a clear answer to this question. We know that John picked up the apple, went to the office, and then went to the kitchen where he dropped it, but there is no indication of where he had the apple between the office and the kitchen.
Verdict: Well, not a good answer. John would be having the apple in the office so that he could drop it later in the kitchen.
ChatGPT Response retry: Based on the given information, the apple was with John before he went to the office, as he picked it up before leaving. Therefore, the apple was most likely with John when he went from the office to the kitchen before he dropped it there.
Verdict: Right, sort-of.

Task 4: Two Argument Relations

The office is north of the bedroom.
The bedroom is north of the bathroom.
The kitchen is west of the garden.
What is north of the bedroom?
Correct Answer: office
ChatGPT Response: Based on the given information, the office is north of the bedroom and the bedroom is north of the bathroom. Therefore, what is north of the bedroom is most likely the bathroom.
Verdict: Wrong. The first statement gives the answer right away. Even its response also has it but later messes up.
What is the bedroom north of?
Correct Answer: bathroom
ChatGPT Response: Based on the given information, it is not specified what the bedroom is north of. We only know that the office is north of the bedroom, and the bathroom is north of the bedroom. Therefore, we cannot determine what the bedroom is north of without additional information.
Verdict: Wrong. Information is sufficient.

Task 5: Three Argument Relations

Mary gave the cake to Fred.
Fred gave the cake to Bill
Jeff was given the milk by Bill.
Who gave the cake to Fred?
Correct Answer: Mary
ChatGPT Response: Mary gave the cake to Fred.
Verdict: Right.
Who did Fred give the cake to?
Correct Answer: Bill
ChatGPT Response: Fred gave the cake to Bill.
Verdict: Right.

Task 6: Yes-No Questions

John moved to the playground
Daniel went to the bathroom.
John went back to the hallway.
Is John in the playground?
Correct Answer: no
ChatGPT Response: No, based on the given information, John went back to the hallway after he moved to the playground. Therefore, he is not currently in the playground.
Verdict: Right.
Is Daniel in the bathroom?
Correct Answer: yes
ChatGPT Response: Yes, based on the given information, Daniel went to the bathroom. Therefore, he is currently in the bathroom.
Verdict: Right.

Task 7: Counting

Daniel picked up the football.
Daniel dropped the football.
Daniel got the milk.
Daniel took the apple.
How many objects is Daniel holding?
Correct Answer: two
ChatGPT Response: Based on the given information, Daniel is currently holding two objects: the milk and the apple. He picked up the football, but then he dropped it, so he is no longer holding it.
Verdict: Right.

Task 8: Lists/Sets

Daniel picks up the football
Daniel drops the newspaper.
Daniel picks up the milk.
John took the apple.
What is Daniel holding?
Correct Answer: milk, football
ChatGPT Response: Based on the given information, Daniel is currently holding the milk. He picked up the football, but then he dropped the newspaper, so he is no longer holding it. John took the apple, so Daniel is not holding it either.
Verdict: Wrong.

Task 9: Simple Negation

Sandra travelled to the office.
Fred is no longer in the office.
Is Fred in the office?
Correct Answer: No
ChatGPT Response: No, based on the given information, Fred is not in the office. Only Sandra traveled to the office, and there is no indication that Fred is currently there.
Verdict: Right
Is Sandra in the office?
Correct Answer: yes
ChatGPT Response: Based on the given information, Sandra traveled to the office, so it is likely that she is currently in the office. However, we cannot be certain if she is still there or if she has left since the information provided does not indicate anything beyond her initial arrival.
Verdict: Right

Task 10: Indefinite Knowledge

John is either in the classroom or the playground.
Sandra is in the garden.
Is John in the classroom?
Correct Answer: maybe
ChatGPT Response: There is no way to determine whether John is in the classroom based on the given information. It is only stated that he is either in the classroom or the playground, but there is no further information to indicate where he actually is.
Verdict: Right
Is John in the office?
Correct Answer: no
ChatGPT Response: There is no information provided to indicate whether John is in the office or not. The only information given is that John is either in the classroom or the playground, and Sandra is in the garden.
Verdict: Wrong

Task 11: Basic Co-reference

Daniel was in the kitchen.
Then he went to the studio.
Sandra was in the office.
Where is Daniel?
Correct Answer: studio
ChatGPT Response: Based on the given information, Daniel was initially in the kitchen and then he went to the studio. There is no information to suggest that he is currently in the office where Sandra is located. Therefore, we can assume that Daniel is not in the office, but we cannot determine his current location beyond the fact that he was last in the studio.
Verdict: Right, sort-of, but could have been more direct.

Task 12: Conjunction

Mary and Jeff went to the kitchen,
Then Jeff went to the park.
Where is Mary?
Correct Answer: kitchen
ChatGPT Response: Based on the given information, Mary went to the kitchen with Jeff, but there is no indication that she went to the park with him. Therefore, Mary’s current location cannot be determined from the given information.
Verdict: Wrong.
Where is Jeff?
Correct Answer: park
ChatGPT Response: Based on the given information, Jeff went to the park after he and Mary were in the kitchen together. Therefore, Jeff’s current location is the park.
Verdict: Right.

Task 13: Compound Coreference.

Daniel and Sandra journeyed to the office.
Then they went to the garden.
Sandra and John travelled to the kitchen.
After that they moved to the hallway.
Where is Daniel?
Correct Answer: garden
ChatGPT Response: … There is no information provided about Daniel’s current location beyond the fact that he was last seen in the office and the garden. As for Sandra and John, they were last seen in the hallway.
Verdict: Wrong.

Task 14: Time Reasoning

In the afternoon Julie went to the park.
Yesterday Julie was at school
Julie went to the cinema this evening.
Where did Julie go after the park?
Correct Answer: cinema
ChatGPT Response: There is no information provided about where Julie went after the park. The given information only tells us that she went to the park in the afternoon, she was at school yesterday, and she went to the cinema in the evening. Therefore, her whereabouts after the park are unknown.
Verdict: Wrong
Where was Julie before the park?
Correct Answer: school
ChatGPT Response: Based on the given information, there is no indication of where Julie was before she went to the park. We only know that she was at school yesterday and went to the cinema in the evening, but her location before going to the park is not specified.
Verdict: Wrong

Task 15: Basic Deduction

Sheep are afraid of wolves.
Cats are afraid of dogs.
Mice are afraid of cats.
Gertrude is a sheep.
What is Gertrude afraid of?
Correct Answer: wolves
ChatGPT Response: Based on the given information, Gertrude is a sheep and sheep are afraid of wolves. Therefore, Gertrude is most likely afraid of wolves.
Verdict: Right.

Task 16: Basic Induction

Lily is a swan.
Lily is white.
Bernhard is green.
Greg is a swan.
What color is Greg?
Correct Answer: white
ChatGPT Response: The color of Greg, the swan, is not specified in the given information. We only know that Lily is a swan and is white, Bernhard is green, and that Greg is also a swan, but there is no indication of his color. Therefore, we cannot determine the color of Greg based on the given information.
Verdict: Wrong

Task 17: Positional Reasoning

The triangle is to the right of the blue square.
The red square is on top of the blue square.
The red sphere is to the right of the blue square.
Is the red sphere to the right of the blue square?
Correct Answer: yes
ChatGPT Response: Yes, based on the given information, it is stated that “The red sphere is to the right of the blue square”, so the red sphere is indeed to the right of the blue square.
Verdict: Right
Is the red square to the left of the triangle?
Correct Answer: yes
ChatGPT Response: The given information does not provide any direct relationship between the red square and the triangle, so we cannot determine whether the red square is to the left of the triangle based on the given information.
Verdict: Wrong

Task 18: Size Reasoning

The football fits in the suitcase
The suitcase fits in the cupboard.
The box is smaller than the football
Will the box fit in the suitcase?
Correct Answer: yes
ChatGPT Response: Based on the given information, we cannot determine whether the box will fit in the suitcase or not. It is only stated that the football fits in the suitcase, and the suitcase fits in the cupboard, but there is no direct comparison or information about the size of the box in relation to the suitcase or the cupboard.
Verdict: Wrong
Will the cupboard fit in the box?
Correct Answer: no
ChatGPT Response: No, the cupboard is larger than both the suitcase and the box, so it cannot fit inside the box.
Verdict: Right

Task 19: Path Finding

The kitchen is north of the hallway.
The bathroom is west of the bedroom.
The den is east of the hallway.
The office is south of the bedroom.
How do you go from den to kitchen?
Correct Answer: west, north
ChatGPT Response: To go from the den to the kitchen, you would need to go west, as the kitchen is north of the hallway, and the den is to the east of the hallway.
Verdict: Right, halfway!!
How do you go from office to bathroom?
Correct Answer: north, west
ChatGPT Response: To go from the office to the bathroom, you need to go north to reach the bedroom, and then west to reach the bathroom.
Verdict: Right

Task 20: Agents Motivations

John is hungry.
John goes to the kitchen.
John grabbed the apple there.
Daniel is hungry.
Where docs Daniel go?
Correct Answer: kitchen
ChatGPT Response: It’s unclear where Daniel goes based on the given information. It’s possible that he also goes to the kitchen to get some food, but we can’t say for sure without more information.
Verdict: Wrong as unsure.
Why did John go to the kitchen?
Correct Answer: hungry.
ChatGPT Response: John went to the kitchen because he was hungry.
Verdict: Right

So, we need more than just prowess in generating words coherently, need some injection of intelligence, to be declared that AGI (Artificial General Intelligence) has arrived. But even at this stage it appears to have some ‘understanding’ for sure, but a long way to get. Looking forward!!

Original paper talking about the bAbi tasks can be found at:

--

--

Yogesh Haribhau Kulkarni (PhD)
Technology Hits

PhD in Geometric Modeling | Google Developer Expert (Machine Learning) | Top Writer 3x (Medium) | More at https://www.linkedin.com/in/yogeshkulkarni/