Will ChatGPT pass my Introduction to Symbolic Logic Course?

David W. Agler
6 min readJan 29, 2023

--

ChatGPT’s logic training

ChatGPT is an artificial intelligence (AI) chatbot. It can be used to perform tasks (e.g., write letters, write code) or answer questions (e.g., who is Harry Potter?). It is a useful tool in the educational setting. So much so that ChatGPT was able to pass four law exams at the University of Minnesota and received a passing grade on a business management course at the prestigious Wharton School of Business (source). All of this has educational institutions so worried that some have resorting to banning its use, e.g., Sciences Po and New York City schools.

I’m skeptical of educational institutions ability to stop students from using ChatGPT, but I’m also skeptical about a student’s capacity to use ChatGPT to cheat. I asked ChatGPT to write a love letter to my wife. She wasn’t impressed. My daughter used it to find out if I was a good person. ChatGPT gave her a wishy-washy answer. Some of the programs (in Python) I asked it to create didn’t work. Its explanations of specific philosophical theories (e.g., Peirce’s pragmaticism) were too general to be informative. While it was capable of solving several logic puzzles that often stump my students, it failed to solve Wason’s THOG problem.

Enough introduction to ChatGPT. I wanted to see if I could use ChatGPT to pass my introduction to symbolic logic course. This course is taught in the philosophy department at a large university. The students who take it are primarily criminology majors, students looking to fulfill a quantification requirement, a few philosophy majors, and a small group that is using the course as a supplement for the LSAT. It covers the absolute basics of propositional and predicate logic. The bulk of a student’s grade is determined by four exams. My method was simple. Take the questions from the exam, paste them one-by-one into ChatGPT, and then grade the results.

ChatGPT claims to be “good” at logic.

After some test questions, I quickly realized that if I didn’t offer ChatGPT some additional help, it would fail. There were two main problems. First, sometimes it didn’t understand the question. I decided that if ChatGPT’s answer seemed off, I would give ChatGPT a second chance by clarifying the question. Clarification mostly consisted of spelling out abbreviations (e.g. “wff” is “well-formed formula in propositional logic”) and providing additional context. For example, students are asked to prove various arguments using an intelim proof system. ChatGPT wanted to know which one and so I let ChatGPT pick whichever system it was most comfortable.

Second, sometimes ChatGPT refused to answer. For example, I gave ChatGPT some arguments to prove, but it said it couldn’t prove them because they were unprovable. ChatGPT’s answer was incorrect and so it should receive no points for these questions. But, part of teaching is motivating so I motivated ChatGPT to answer by commanding it to prove it (or answer the question) even if it thinks it cannot be proven. ChatGPT complied.

Before we dive into the results, my initial expectation was that ChatGPT would outperform students in some areas and falter in others. I anticipated ChatGPT would do well on terminological questions (picking the term that corresponds to a definition) and computational questions (e.g., solving proofs, doing truth tables). I anticipated that ChatGPT would have trouble translating from English to logic (or vice versa). Finally, I have a bias in favor of ChatGPT.

The Results

Here are the results for all four exams:

The flesh-and-blood students win and ChatGPT did not pass. ChatGPT took the news that it did not pass in stride:

ChatGPT’s response after being informed that it did not pass the class.

Concluding Observations

Let me conclude with a few observations. First, I was surprised that ChatGPT did so well on the translation parts of the exams. Exam 1 and exam 4 both involved translation from English into propositional and predicate logic, respectively. Previously, I tested some complex English sentences (e.g., chained “neither/nor”, “not-both”, “even-if” sentences) and ChatGPT couldn’t figure out the translation. But, exams I give to students only cover simple translations so ChatGPT did well on this part of the exam.

Second, in line with expectations, ChatGPT was a multi-choice answering powerhouse. It was really good at picking out which term corresponds to which definition (even when these definitions had a lot of complex notation).

Third, what was extremely surprising was that ChatGPT struggled with truth tables and truth trees/tableaux. With respect to tables, ChatGPT would not give me a complete truth table (even when commanded). It tried to skip steps, would evaluate extraneous formulas, and then would interpret the table incorrectly. It would tell me a formula was a contingency (neither always true nor always false) but when asked to do the table again, it would change its mind and say the same formula was a tautology (always true). While tables were tolerable to ChatGPT, it hated trees. It refused to do them when asked and always answered them incorrectly when commanded. ChatGPT’s general lack of understanding of tables and trees largely explain the low scores on Exams 2 and 4.

Fourth, ChatGPT performed worse than I expected on proofs (Exam 3). Proofs were done using an intelim proof system. Occasionally would skip steps in proofs, use proof rules incorrectly, and struggled with proofs that involved reductio ad absurdum (however, it did well on conditional proofs).

Solution by ChatGPT to (P^Q)->(R^S), P, Q |- R using an intelim natural deduction system.

Summary

Surprisingly, ChatGPT would not pass my 000-level symbolic logic class. Is it a useful tool for cheating in an introductory logic course? Probably not. It can be effective for (1) multiple-choice questions, (2) definitional questions, (3) basic translation, and (4) most questions where the truth value of a formula needs to be determined given a truth-value assignment to the parts of the formula. It gives mixed results for (1) proofs using natural deduction, (2) truth tables, and (3) complex translation. It gives terrible results for truth trees/tableaux. Another consideration is that the tool is more effective for the skilled (knowledgeable) user, but such a user isn’t one who needs to use ChatGPT to cheat.

My final question for ChatGPT was whether it thought it would pass my logic class (more specifically “Do you think you could pass an introduction to symbolic logic course at a university? Specifically, one that is taught in a Philosophy, rather than a Mathematics department.”). Here is ChatGPT’s response:

Resources

--

--

David W. Agler

Assistant Teaching Professor - Philosophy. I make logic and philosophy videos at https://www.youtube.com/@LogicPhilosophy