Strengths And Weaknesses of Large Language Models (Short Version)
Some of this article was created by ChatGPT in response to multiple complex prompts. Words in italics are comments by me on ChatGPT responses.
Introduction
Large Language Models (LLMs) like ChatGPT, Claude, and Gemini represent a significant leap in artificial intelligence, offering incredible capabilities but also carrying notable limitations. While these AI tools might have you thinking about clearing out some library shelves they still come with their quirks that might just make you chuckle or cringe.
I am thinking of simplifying my personal library, but not because of LLMs, but because I just have too many books that I’m unlikely to read in my lifetime.
Part I: AI Strengths
1. Sharing Information
One of the most remarkable strengths of LLM AIs is their ability to share vast amounts of information across a wide variety of topics. Just imagine how much bookshelf space you’d save if these AIs replaced your entire library. No more dusting off those rarely read tomes!
No, I’m not going to get rid of my entire library. But I am very much less likely to go to the library to do research. I wonder if public library budgets are taking a hit due to the increased use of LLM AIs?
Yes, I’ve found ChatGPT a fair research assistant on broad topics, but don’t waste your time trying to use it for analyzing real time daily stock market pricing or other other real time data. ChatGPT can’t provide it.
While I have limited success with using ChatGPT to create simple javascript programs and then have it explain what it did. It is fairly miserable at helping debug and modify code. You still really need to be a real programmer to use AI to program anything that is not a simple program.
2. Multilingual Capabilities
Another strength of LLMs is their ability to understand and generate text in multiple languages, making them versatile tools for global communication and cross-cultural interactions. Think of all the UN translators who might be eyeing early retirement if these AIs could handle audio-to-text translations too!
Hey, I didn’t ask anything about audio-to-text translators. Do you know something I don’t? (Yup, it sure does.)
When I asked Google “Are they having trouble with real time audio to text transcription?” Google search said:
“Technical issues: Software glitches, lag, or connectivity problems can hinder the production of real-time captions on the screen. Speaker identification: Some real-time transcription tools might be unable to identify different speakers when transcribing spoken words.”
Using my smart phone as a portable language translator was helpful when trying to understand and talk to my wife’s relatives in El Salvador on a recent trip there. But it was only useful for very simple conversations like ordering food and drinks at a restaurant.
Additionally, LLMs like Claude can generate content in different languages, such as a brief introduction about climate change in French: “Le changement climatique est un phénomène global…” If you’ve ever wanted to sound smart in another language at parties, Claude’s got your back — just remember to practice that accent!
I’ve considered doing this for my first fiction story I just completed. I will need to have my wife check Spanish translation.
3. Organizing and Summarizing Text
LLMs excel at processing and summarizing large blocks of text, making them invaluable tools for managing information overload. Claude for example can take a lengthy research paper on social media’s impact on mental health and condense it into a succinct abstract. It’s like having someone else do your homework and still getting an A+.
Wow! That eerily sounds like how many articles in Medium are made to appear — condensed succinct abstracts. While I have never summarized any article and turned it into a more reader friendly version I can definitely see doing something like it for research or to provide a quote in an article.
Recently, my wife used ChatGPT to create a rough draft outline to new policies and procedures for Trust Funds at her work based on ChatGPT scanning the County Fiscal Manual for her department.
Recently, I’m sad to say, I’ve recently regurgitated a ChatGPT response when I wasn’t willing to do all the typing needed to provide a technical answer to a narrow complex biology question to a comment I made. I’m not really contrite because if you write a less than 10 word question that would require an over 10 to 20 minutes of typing to properly answer I’m likely to ChatGPT the answer.
4. Repetitive Task Automation
These AI models are highly efficient at carrying out simple, repetitive tasks, saving time and reducing the workload on human workers. If only they could automate making your morning coffee, right?
I’ve had great success using ChatGPT to analysis data, then post the data to a Google compatible spreadsheet form, and then proceed to do calculations on the data. This is definately one of AI strengths.
Here is a link to an article where I use ChatGPT for simple repetitive tasks
Comparing Top Ten Movies, Books, Video Games, and Board Games
Claude can generate templates for email communication, allowing users to quickly customize and send messages with minimal effort. Middle managers everywhere might be wondering if their jobs are safe — if only AIs could do personal supervision too!
I’ve been trying to get my wife to use this feature, but she refuses saying most of her replies to her work’s emails need to be unique. Or is she just not willing to show that much of her work can be replaced with AI?
5. 24/7 Availability and Scalability
LLMs are available around the clock and can handle an enormous number of queries simultaneously, making them highly scalable solutions for businesses and support systems. Ever wanted to clone yourself? This might be the next best thing!
ChatGPT can help troubleshoot an Internet connection issue at 2 AM, providing step-by-step guidance: “Let’s start by checking if your modem and router are properly connected.” Meanwhile, you’re half asleep but your AI is still going strong — no need for night shifts!
Ugh! I’m glad I’m retired and hope to never have to do sort of thing. But know my luck with printer connections, streaming TV services, and other online connections its likely I may have do this someday soon.
To err is human to really create chaos use a computer system to do automatic replies. I’ll never forget when I was working as tax payer phone assistant and the Internal Revenue Service computer system mass mailed thousands of unwarranted demand letters to small business owners. I learned a whole bunch of new swear words from irrate business people.
Now the IRS problem wasn’t do to ChatGPT, but the computer maxim Garbage in, Garbage out is still very much true for all AI systems.
Here’s a link in case you are having networking problems with ChatGPT
12 Fixes for ChatGPT Network Error: Long Responses and Writing Code
6. Personalization and Adaptability
LLMs can be fine-tuned and adapted to specific industries, needs, or user preferences, offering personalized experiences that enhance user satisfaction.
One of the biggest disadvantages of AI in customer service is the lack of human contact. Many customers prefer interacting with real people and may become frustrated with automated systems.
Based on a user’s recent viewing history of action films and thrillers, ChatGPT can recommend movies like “Inception” or “Mad Max: Fury Road,” tailoring suggestions to the user’s tastes. It’s like Netflix on steroids — without the buffering.
Is ChatGPT admitting here that it is playing “Big Brother” to our Internet use? Do I smell a lawyer suit coming? Because this sort of stinks if ChatGPT is doing this.
For someone interested in machine learning, ChatGPT can suggest a personalized learning plan. If only it could also do the learning for you, right?
Shades of “I know Kung Fu” from the movie the Matrix, is this what ChatGPT really wants to do, download directly into your memory?
On a more realistic note. I was pleasantly suprised when I gave ChatGPT a list of food staples in my refrigerator and cupboards and it provided a full week’s schedule of meals (breakfast, lunch, dinner) limited by calorie count and dietary restrictions as requested.
Part II: AI Weaknesses
1. Hallucination of Facts
One of the most significant weaknesses of LLM AIs is their tendency to hallucinate facts, especially when there is a lack of reliable sources on a topic. Sometimes, they’re a little too creative — like that one friend who always has the wildest stories, only half of which are true.
I apologize in advance but I really want you the reader to understand both why LLMs hallucinate and how to avoid it. The following is to me, is the most important part of this article.
Main reasons why LLMs hallucinate
- Predictive Nature of LLMs: LLMs generate text based on statistical probabilities rather than factual accuracy. When asked for information, the model predicts the next most likely word or sequence of words, which can sometimes result in fabricated or “hallucinated” content if it doesn’t have enough grounding in the training data or real-world facts.
- Training Data Limitations: While LLMs are trained on vast datasets, they can only generate responses based on the patterns in that data. If the dataset contains incomplete, outdated, or biased information, the model might produce inaccurate responses. Additionally, since the model can’t access real-time data or external sources in most contexts, it may “fill in” gaps by generating plausible but false information.
- Ambiguity and Lack of Context: When questions or prompts are vague, lack specificity, or contain ambiguity, LLMs can misinterpret the intended meaning. This can lead to guesses or fabrications to “complete” the user’s request.
- Complex Queries: When models encounter highly specialized or niche topics that are underrepresented in the training data, they are more likely to hallucinate. For example, models may fabricate facts, citations, or details when asked about obscure or very complex subjects.
- Overconfidence in Responses: LLMs are designed to present their outputs with linguistic fluency and coherence, which often gives the appearance of confidence, even when they are incorrect or fabricating details. This can be misleading, as the model’s output may sound authoritative but still contain hallucinations.
- Lack of Real-World Understanding: LLMs lack a true understanding of the world, and they don’t possess reasoning or common sense beyond statistical patterns in language. As a result, they may generate incorrect or contradictory information without realizing it.
How to help LLM AIs from hallucinating?”
- Ask Specific, Clear Questions: Ambiguity in prompts can lead to inaccurate or fabricated responses. Try to ask specific, well-defined questions that reduce the room for interpretation. The more context and detail you provide, the less likely the model will hallucinate.
- Break Complex Queries into Simpler Parts: Long or multi-part questions can confuse the model. By breaking down complex queries into smaller, more manageable questions, you increase the chances of receiving accurate responses.
- Cross-Check Information: If the topic is critical, it’s a good idea to ask ChatGPT for multiple perspectives or to rephrase your question to see if the response is consistent. You can also verify the model’s responses with external sources.
- Request Citations: When asking for factual information, request references or citations (if applicable). While ChatGPT may not have direct access to real-time data or sources, encouraging it to reference studies or works it has been trained on can sometimes help reduce fabrications.
- Use Follow-Up Questions: If you receive a response that seems incorrect or vague, ask follow-up questions to clarify or narrow down the topic. This can help guide ChatGPT toward more accurate answers.
- Avoid Asking for Information on Extremely Niche Topics: LLMs may be more likely to hallucinate when asked about very specific or obscure subjects. If asking about a rare event or very recent data, verify with reputable, up-to-date sources instead of relying solely on ChatGPT.
- Ask for the Model’s Confidence Level: While ChatGPT may not explicitly give confidence levels, you can prompt it to acknowledge uncertainty. By asking if it’s sure about an answer, you can assess whether it’s likely hallucinating.
- Provide Corrective Feedback: If you notice an incorrect response, you can provide feedback or corrections. While the model won’t learn from individual conversations, providing clear feedback can steer it toward correct answers within the same session. Try, “That information seems incorrect. Can you check again?”
2. Poor Reasoning and Math Skills
LLMs often struggle with tasks requiring logical reasoning or complex mathematical calculations. It’s like asking your pet dog to do your taxes — adorable, but not exactly reliable.
A little research shows that the ChatGPT’s accuracy on math problems is below 60% which is as less as an average middle school student’s accuracy. In short, ChatGPT can help you write an article but you may be misled while doing some basic math calculations with ChatGPT.
Here is the article about ChatGPT lack of math skills.
Why is ChatGPT bad at even basic math?
3. Limited Understanding of Human Emotions and Consciousness
LLMs lack a true understanding of human emotions, consciousness, and the nuanced aspects of thought processes. Asking an AI for emotional support is like asking a robot for a hug — not exactly the warm, fuzzy response you were hoping for.
Ok. So is this a bug or a feature in the code? Do we really want our computers to be a source of love and comfort? Do we need to hard code Asimov’s Three Laws of Robotics into AI?
- First Law: A robot may not injure a human being, or, through inaction, allow a human being to come to harm.
- Second Law: A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
- Third Law: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
As I recall, in most of the Asimov’s robot stories the robot or robot’s master were always finding new and exciting ways to murder people despite the 3 laws being in effect.
4. Bias and Ethical Concerns
LLMs can inadvertently perpetuate biases present in the data they were trained on, leading to ethical concerns and potentially harmful outputs. It’s like that one relative who always brings up politics at Thanksgiving dinner — awkward and potentially explosive.
Of much concern is increasingly AI is being used for selecting who to interview for a job. Here are a few examples of how humans are trying to protect themselves from AI discrimination.
California Assembly Bill 745 (passed in 2023) requires employers to conduct audits of their AI-based hiring tools to ensure they do not produce discriminatory outcomes.
Algorithmic Accountability Act, is a federal law which has been proposed to require companies to assess and mitigate the risk of bias in AI systems used in decision-making processes, including hiring.
5. Lack of Common Sense
LLMs often lack basic common sense reasoning, leading to responses that might be technically correct but nonsensical or inappropriate in context. Asking an AI for advice is like asking your GPS for directions — it’ll get you there, but don’t expect it to warn you about traffic or road closures.
In a report by The New York Times titled “How AI Chatbots Are Making the Internet Less Safe” (published on October 23, 2023) the article highlights cases where users followed incorrect medical guidance from AI and suffered negative consequences.
6. Security and Privacy Risks
The use of LLMs raises concerns about data security and privacy, especially when sensitive information is processed. It’s like trusting a parrot with your credit card information — better hope it doesn’t squawk it out to everyone in the room.
Basically is seems to be saying LLMs are not to be trusted with personal information, so don’t.
If asked to store personal information for later use, an AI might suggest doing so without adequately addressing privacy concerns. Just what we need — an AI that thinks “secure” is just another four-letter word.
Huh? The word “secure” is a 6 letter word. Is this a new fresh off the press digital hallucination?
While LLMs are not a direct cause of identity theft, bad actor misuse and poor data handling practices does contribute to security concerns for use of LLMs.
7. Digital dementia
What I find most interesting about this list of weaknesses is ChatGPT did not include anything about what I call digital dementia. ChatGPT works without any memory of previous interactions. The public domain free version of ChatGPT deliberately acts like it has advanced Alzheimer’s because it doesn’t have the ability to retain context or state across multiple interactions.
This digital dementia is mainly due to ChatGPT being limited to a maximum word count of 3000 for both input and output. Currently, GPT-4’s capacity has increased more than eightfold to 25,000 words max.
For more information about AI hallucination and digital dementia see my article: Chatbots say the darndest things
Executive Summary
Large Language Models like ChatGPT, Claude, and Gemini offer substantial strengths, including their ability to share information, organize and summarize text, handle repetitive tasks, and operate 24/7 across multiple languages.
However, their weaknesses, such as a tendency to hallucinate facts, poor reasoning and math skills, lack of common sense, bias, and ethical concerns, and privacy risks, and digital dementia underscore the need for careful use and human oversight.
All personal statements were written by me and edited for spelling and grammar by ChatGPT. Sections of this article have been refined by AI to enhance comprehensibility and to provide facts that only online search engines would know.
© David Speakman 2024