ChatGPT Doesn’t Ask Questions

LLMs Don’t Have Cognitive Abilities

Max Nussbaumer
Zentyment

--

Geoffrey Hinton, one of several parents of AI, a 2018 Turing award recipient and part-time researcher at Google until 2023, predicted in 2016 that AI would soon make radiologists obsolete. I think if you work as a radiologist, you are like the coyote that’s already over the edge of the cliff but hasn’t yet looked down, Hinton said. People should stop training radiologists now. It’s just completely obvious within five years deep learning is going to do better than radiologists …. It might be 10 years, but we’ve got plenty of radiologists already. Today, there is a global radiologist shortage, driven in part by overwork. A 2021 study reported a 2.5 percent increase in diagnostic radiology trainees between 2010 and 2020, compared with a 34 percent increase in the number of adults over 65, a population that requires more imaging.

Or as Mark Twain reportedly said: The reports of my death are greatly exaggerated.

Hinton left Google in 2023 to be able to speak out about the dangers of AI to humanity. He is a vocal critic of efforts to develop AGI (artificial general intelligence) — something that represents generalized human cognitive abilities (just much faster and better).

I am unable to and won’t try to dispel those concerns, but I am convinced that any such dangers will not emanate from LLMs and their current architecture (GPT-4, Flan UL2, LlaMA, Bard, Jurassic 2, Claude). Why? Because LLMs are not designed to ask questions[i], which is the basis for human-like cognition.

Human-like intelligence requires context

J.W. von Goethe wrote: Wer nicht neugierig ist, erfährt nichts. (There can be no knowledge without curiosity).

Or in other platitudes: You don’t know what you don’t know. We typically don’t know what to ask if we are not an expert in the field of our interest. ChatGPT only responds to our questions or prompts, but it does this in very knowledgeable and elaborate ways that are truly surprising.

An LLM’s quality of response is based on the context that it is provided with, but it doesn’t create context on its own through investigation:

I feel a stinging pain in the left side of my belly. What do you think this is?

ChatGPT:

I’m not a doctor, but there are several possible causes for a stinging pain in the left side of your belly. Some potential explanations could include:

  1. Muscle strain: If you recently engaged in physical activity or strained your abdominal muscles, it could result in localized pain.
  2. Gas or bloating: Excess gas or bloating can cause discomfort and stinging sensations in the abdominal area.
  3. Gastrointestinal issues: Conditions such as gastritis, peptic ulcers, or diverticulitis can cause pain on the left side of the belly. Inflammatory bowel diseases like Crohn’s disease or ulcerative colitis may also lead to abdominal discomfort.
  4. Kidney stones: If the pain is more towards the back and radiates to the abdomen, kidney stones might be a possibility. These are hard deposits that form in the kidneys and can cause severe pain when they pass through the urinary tract.
  5. Urinary tract infection (UTI): UTIs can cause abdominal pain, especially if the infection has spread to the kidneys.
  6. Ovarian issues: In women, conditions like ovarian cysts or endometriosis can cause abdominal pain that may be felt on the left side.

Obviously, answering these types of questions requires context that good doctors would obtain from their patients before devising a solution. To circumvent this issue, there is a cottage industry of prompt experts to feed contextual information to LLMs. The more context LLMs receive via prompts, the better their answers, although this is not the system with human-like intelligence which we portray it to be.

Doctors will not be replaced, but they may become better doctors with AI

This study makes it sound like ChatGPT is a better doctor, but it isn’t:

For the study, researchers collected patient questions and responses from doctors through Reddit’s AskDocs subreddit. In this community, which has 452,000 members, people can ask medical questions and receive answers from verified healthcare professionals. In total, 195 exchanges were collected from AskDocs.

Then, the researchers posed the original questions to ChatGPT and recorded each of its responses. Both responses were then evaluated by a panel of three licensed physicians.

The physicians were asked to choose which response they thought was better and judge each response on the quality of its information and the empathy/bedside manner given. Each assessment was scored on a five-tier scale that ranged from “very poor” to “very good” (quality) or “not empathetic” to “very empathetic” (empathy).

Overall, the panel said they preferred ChatGPT’s responses over physicians’ in 78.6% of the 585 completed evaluations. ChatGPT’s responses were also longer than the doctors’ responses, ranging from 168 to 245 words compared to 17 and 62 words, respectively.

The AI responses also rated significantly higher on both quality and empathy. On average, ChatGPT’s responses scored a 4 on quality and a 4.67 on empathy. In comparison, the physician responses scored a 3.33 on quality and 2.33 on empathy. In total, ChatGPT had 3.6 times more high-quality responses and 9.8 times more empathetic responses than physicians.

In my experience there is a difference between doctors who are good at diagnostics and doctors who are good at treatment (the same goes for lawyers). A proper diagnosis involves investigation via questions and tests, both cannot be conducted by an LLM system. Once all the information has been gathered, processed, curated etc. it can be fed into an AI system and that’s where some LLMs excel. The subsequent treatment is a very different thing, incl. repeated diagnosis and progress checks, where an LLM can certainly be helpful, but I cannot see a world where doctors get substituted by AI.

Lawyers will hopefully become better lawyers (and maybe more affordable) with AI

There is a market for AI based legal support systems, and they are good at doing the grind work that lawyers are too expensive to do (although they may charge for it).

Thomson Reuters just paid USD650M in cash for Casetext, a company that delivers document review, legal research memos, deposition preparation, and contract analysis in minutes. The fact that Casetext was founded in 2013 (when OpenAI didn’t exist) and launched its GPT-4 based system in 2023 shows the excitement that leads deep-pocketed investors and acquirers to paying top dollars. Good for Casetext, I envy them with all my heart.

Harvey was founded in 2022 and it is another AI system that aims to provide lawyers with a natural language interface. On its website it offers not much more than a waitlist, and it has been funded with USD26M by its investors, Sequoia and OpenAI among them.

I have worked with many lawyers in the past, and the good ones engage in consultative conversations with me before sending big bills. The bad ones don’t waste their time and send me big bills for writing sloppy letters to my adversaries.

Lawyers will not go out of business because of AI, although someone demonstrated ChatGPT’s hallucinating qualities in court and received his fair punishment for it. And elite lawyers’ income development shows that firms continue to pay top dollars:

Equity partners at top law firms, meanwhile, can make around $3 million or more a year — more than triple what they were pulling in two decades ago. An elite group of partners who bring in exceptional amounts of business are earning north of $15 million at a handful of firms including Wachtell, Lipton, Rosen & Katz; Kirkland & Ellis; and Paul, Weiss, Rifkind, Wharton & Garrison[ii].

One of my best meetings was with three partners from a prestigious NY law firm (a lowly USD1,500 per hour per partner). After listening to my case and analyzing the situation, the lead partner gave me the best advice for free: “We can write an intimidating letter to your opponent and charge you 50 grand for it. But if their lawyers are any good, they will have a good laugh with it and throw it into the trash.” Nowadays, I really like ChatGPT-4 for writing simple contracts.

Computational and Financial Limitations

Today’s LLMs are all based on the Transformer Model and its attention mechanism (which allows for simultaneous processing of inputs and outputs and an understanding of context). The “hidden” part is mathematically complex and therefore computationally intense:

INPUT -> HIDDEN -> OUTPUT (-> RINSE -> REPEAT)

To master the complex task of turning inputs into outputs and to do this with proper syntax in many languages and with empathy, LLMs are trained on many billions or trillions of parameters.

A human brain has plasticity (especially young people, mine less so), LLMs don’t. Brain plasticity is what allows us to learn and adapt: Neural plasticity, also known as neuroplasticity or brain plasticity, can be defined as the ability of the nervous system to change its activity in response to intrinsic or extrinsic stimuli by reorganizing its structure, functions, or connections[iii].

GPT-4 was trained on data up to September 2021. When you subscribe to ChatGPT-4 for $20/month (it’s worth it), it doesn’t know anything about the inner workings and limitations of GPT-4 because GPT-4 was launched in March 2023. An update or end-to-end finetuning of an LLM is enormously complex and prohibitively expensive:

SemiAnalysis, “The AI Brick Wall — A Practical Limit For Scaling Dense Transformer Models, and How GPT 4 Will Break Past It” (2023)

To circumvent this problem to some degree, LLMs are capable of in-context-learning (ICL) and it’s a hot topic[iv]. Typically, this happens through prompts and it makes an LLM smarter — for the moment. ChatGPT-4 has a context window of 32,768 tokens (a token can be a word or a character, depending on how GPT breaks down words for its processing), which is significantly better than 2,048 tokens for GPT-3. Simply speaking, the 32,768 token limit is where ChatGPT-4 loses the thread. This becomes apparent in translation exercises (it cannot translate a text that’s longer). And contrary to rumors, ChatGPT will not remember a conversation once it’s finished.

When it comes to creating its own context through questions and unprompted, we are expecting a little too much from GPT and its other LLM cousins. Such an ability would amount to cognition, which is the process of acquiring knowledge through thoughts, experiences, and senses. Cognition or learning will remain a human domain for a long time, and therefore I don’t believe we will see AGI or self-improving AI in the foreseeable future. And we will be dealing with doctors, lawyers, tax- and financial advisors for generations to come.

Footnotes

[i] It will ask questions, when prompted. Such as, please ask me five questions about my life. In my case it asked:
1. Could you tell me about a life experience that has significantly shaped your personality or worldview?
2. What’s the most memorable trip or vacation you’ve taken and why does it stand out?
3. If you had to name one person who has been a major influence in your life, who would it be and why?
4. Can you describe a hobby or activity that brings you a lot of joy and how you got into it?
5. If you were to give your younger self a piece of advice, what would it be and why?

[ii] “Lawyers vs bankers”, by Matt Levine, Bloomberg, June 22, 2023

[iii] Brain Plasticity: What Is It? Learning and Memory, University of Washington, Feb 2019

[iv] More information on the topic in these papers: https://arxiv.org/pdf/2211.09066.pdf, https://thegradient.pub/in-context-learning-in-context/ and https://towardsdatascience.com/overcoming-the-limitations-of-large-language-models-9d4e92ad9823

--

--

Max Nussbaumer
Zentyment

Entrepreneur and investor in interesting ideas. Developer of startups that are successful more often than not.