Is ChatGPT Getting Worse?

A dive into user and company opinion.

Shannon Li
NYU Data Science Review
6 min readMay 10, 2024

--

Image taken from Unsplash

ChatGPT has permeated nearly every aspect of our society — from being used to write high school essays to conducting advanced analysis for millions of pieces of data. However, if you’ve recently attempted to get a little extra help on your homework, and ChatGPT’s answer has left you scratching your head in confusion, you’re not alone.

Is ChatGPT getting dumber? What was initially celebrated as a shiny new technology with endless applications into everyday life, including companies, advertisements, video games, and homework, has now garnered much controversy in terms of its effectiveness. Many, including scientists, researchers, OpenAI employees, and the everyday ChatGPT user like yourself, have speculations, opinions, grievances, and more.

In a recent study conducted by researchers at Stanford University and the University of California Berkeley, they found that “the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time.” So, not an inherently negative thing right? Well, that depends.

For example, in March of 2023, GPT-4 was reasonable at identifying prime vs. composite numbers (84% accuracy) but, in June of 2023, GPT-4 was poor on these same questions (51% accuracy) [1]. The study claims that this is “partly explained by a drop in GPT-4’s amenity to follow chain-of-thought prompting.” Fellow Medium blogger, Udit Gupta, describes this type of prompting as, “a structured and organized progression of prompts or cues that guide an individual’s thinking process in a sequential and logical manner,” with steps essentially spelled out for the individual, ChatGPT in this case [2]. On a positive note, the study reports that GPT-3.5 improved on this prime number task, and performed better in June than in March [1]. However, according to this same study, “both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March,” with “evidence that GPT-4’s ability to follow user instructions has decreased over time [1].”

Photo by Markus Spiske on Unsplash

The study provides compelling evidence, but the fact remains that it is merely a single measly article in the face of the entire OpenAI company vehemently denying any claims of ChatGPT’s capabilities diminishing.

On July 13th, 2023, OpenAI VP of Product Peter Welinder posted on X: “No, we haven’t made GPT-4 dumber. Quite the opposite: we make each new version smarter than the previous one. Current hypothesis: When you use it more heavily, you start noticing issues you didn’t see before [3].”

Similarly, AI researcher Simon Willison is also a non-believer in this new wave of anti-GPT sentiment. “Willison thinks that any perceived change in GPT-4’s capabilities comes from the novelty of LLMs wearing off. that the technology has become more mundane, its faults seem glaring” Willison “[doesn’t] find it very convincing…a decent portion of their criticism involves whether or not code output is wrapped in Markdown backticks or not.” He also finds other problems with the paper’s methodology, noting that “it looks … like they ran temperature 0.1 for everything.” Temperature is a parameter that calculates the randomness of output generated by generative AI, with a higher temperature resulting in more diverse output, and lower temperature resulting in more predetermined/focused output [6]. Willison elaborates that “it makes the results slightly more deterministic, but very few real-world prompts are run at that temperature, so I don’t think it tells us much about real-world use cases for the models.” [3]

On the opposite side of the spectrum, with numerous reports of laziness, stubbornness, inaccuracy, and ambiguity, users have taken to the OpenAI community forum to complain about ChatGPT. In one of these popular forums, titled “GPT-4 is getting worse and worse every single update”, in November of 2023, user “haseeb_heaven” posts that “I have noticed that too and for coding tasks it getting much worse…ChatGPT never gives full source code and often left with placeholder saying fill your own code now and doesn’t even convert or translate any projects [sic].” Within this forum and across the internet, it’s also a common complaint that ChatGPT will refuse to directly convert/translate projects and instead offer guidance in converting, supposedly “due to current limitations of the system.” [4]

Around the same time, user “VincB’’ complains that “GPT4 answers are now ridiculously long, and sometime[s], just wrong, or non-sense. It now lose track of what was discussed previousl [sic]…I am starting to think about cancelling my subscription.” While “rj10” personifies GPT-4 by making the apt comparison that it feels like they are “arguing with a stubborn toddler.” [4]

Photo by Vidar Nordli-Mathisen on Unsplash

The argument could be made that since the clock struck 12 on New Year’s Day, when we crossed into 2024, that something magical happened and ChatGPT is now newly improved and better than ever! However, like Cinderella’s ugly reality, a deeper dive into other OpenAI forums suggest that the opposite is true. Community members of the forum titled “Why is ChatGPT getting from bad to worst? [sic]”, note that GPT quality has dropped even further this year. In fact, On February 7th, 2024, user “darkhorseai” posted that “Last year, using ChatGPT was a breeze…Then January came and it became ‘lazy’. It would just summarise in a short paragraph without much [sic] details… It doesn’t listen to its custom instructions and my answer is short and vague. When I asked it to re-do, it … replie [sic], “Unfortunately, I can’t fulfill this request.” “Aaldick” says that they “experienced a big drop in response quality from GPT-4 as well (as of Jan 2024).” [5]

As recently as March 5th, “steve_smile” notes that their “final straw came when [they] asked a simple yes-or-no question, only for ChatGPT to say “no” — ironically, the most accurate and direct answer I’ve received after countless attempts to get useful assistance.” Other users such as “TLD” hypothesize about the reasoning behind the unraveling quality of ChatGPT, going from “doing really impressive data modelling to having a tool which guesses and lies hoping it wont [sic] be caught doing it“ citing “pressure from investors to stay at the forefront” without “the support, experience to operate a scaling business in growth and infrastructure in place, so are [sic] deprioritizing areas generating less revenue, such as personal accounts, to keep capacity free for the revenue generating channels.” [5]

Photo by BoliviaInteligente on Unsplash

Upon my review of these complaints, quotes, and considerations, one might wonder if my own opinions have changed. With a not-so-high initial view of GPT-4 as an avid user myself, I can’t say that I’ve been swayed from my disappointment with the development of this OpenAI software. In fact, as I looked into user opinion, I noticed that most of the positive viewpoints on ChatGPT’s updates come from those within OpenAI, or those in kahoots with OpenAI. Based on my observations, it seems to me that the majority of ChatGPT users have expressed the same dissatisfaction in the application’s capabilities that led me to writing this article in the first place. However, I am curious to see, given all of this public backlash against the GPT updates, how OpenAI will handle ChatGPT revisions in the future.

The answer to the question now remains in your hands, dear ChatGPT user. Are you using GPT-3.5 or GPT-4? Do you believe the skepticism of certain spokespeople and researchers? Or the posts by fellow frustrated GPT users like yourself? Will GPT-5 be even less accurate, more finicky, and sassier than GPT-4? Only time and user experience will tell.

Works Cited

  1. Chen, Lingjiao; Zaharia, Matei; Zou, James, “Quantum Computing and the Limits of the Efficiently Computable.” 19 July 2023, https://arxiv.org/pdf/2307.09009.pdf.
  2. Gupta, Udit. “Chain of Thought Prompting.” Medium, 15 October 2023, https://medium.com/@uditgupta5050/chain-of-thought-prompting-853fbfc8e43c.
  3. Kooser, Amanda. “Is Stanford Right? Is ChatGPT Really Getting Dumber?” Futurism, 24 April 2023, https://futurism.com/the-byte/stanford-chatgpt-getting-dumber.
  4. “Why is ChatGPT Getting from Bad to Worst?” OpenAI Community, https://community.openai.com/t/why-is-chatgpt-getting-from-bad-to-worst/617490/26.
  5. “GPT-4 Is Getting Worse and Worse Every Single Update.” OpenAI Community, https://community.openai.com/t/gpt-4-is-getting-worse-and-worse-every-single-update/508470.
  6. “Cheat Sheet: Mastering Temperature and Top-p in ChatGPT API.” OpenAI Community, April 2023, https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api/172683.

--

--