Strengths And Weaknesses of Large Language Models (Long Version)

David Speakman
15 min readSep 5, 2024

--

Image made with CoPilot

Short Version Of This Article

Most of this article was created by ChatGPT in response to multiple complex prompts. Words in italics are comments by me.

Introduction

Large Language Models (LLMs) like ChatGPT, Claude, and Gemini represent a significant leap in artificial intelligence, offering incredible capabilities but also carrying notable limitations. This essay explores the strengths and weaknesses of these AI models, providing real-world examples to illustrate their impact and utility. While these AI tools might have you thinking about clearing out some library shelves they still come with their quirks that might just make you chuckle or cringe.

I am thinking of simplifying my personal library, but not because of LLMs, but because I just have too many books that I am unlikely to read in my lifetime in my collection.

Part I: AI Strengths

1. Sharing Information

One of the most remarkable strengths of LLM AIs is their ability to share vast amounts of information across a wide variety of topics. These models can generate text on demand, provide explanations, and answer questions within seconds, making them invaluable resources for quick knowledge retrieval. Just imagine how much bookshelf space you’d save if these AIs replaced your entire library. No more dusting off those rarely read tomes!

No, I’m not going to get rid of my entire library. But I am very much less likely to go to the library to do research. I wonder if public library budgets are taking a hit due to the increased use of LLM AIs?

Example 1: When asked about the causes of climate change, ChatGPT can quickly provide a detailed summary, incorporating data and insights from scientific studies, historical records, and current trends. It’s like having your own personal scientist who works for free, minus the lab coat.

Yes, I’ve found ChatGPT an fair research assistant on broad topics, but don’t waste your time trying to use it for analyzing real time daily stock market pricing or other other real time data. ChatGPT can’t provide it.

Example 2: Similarly, Gemini can generate informative content on complex topics like quantum computing, breaking down intricate concepts into digestible pieces for both novices and experts. Quantum mechanics might sound like something only Sheldon from The Big Bang Theory could explain, but Gemini makes it seem almost as easy as tying your shoes — well, almost.

While I have limited success with using ChatGPT to create simple javascript programs and then have it explain what it did. It is fairly miserable at helping debug and modify code. You still really need to be a real programmer to use AI to program anything that is not a simple program.

2. Multilingual Capabilities

Another strength of LLMs is their ability to understand and generate text in multiple languages, making them versatile tools for global communication and cross-cultural interactions. Think of all the UN translators who might be eyeing early retirement if these AIs could handle audio-to-text translations too!

Hey, I didn’t say anything about audio-to-text translators. Do you know something I don’t? (Yup, it sure does.)

When I asked Google “Are they having trouble with real time audio to text transcription?” Google search said:
“Technical issues: Software glitches, lag, or connectivity problems can hinder the production of real-time captions on the screen. Speaker identification: Some real-time transcription tools might be unable to identify different speakers when transcribing spoken words.”

Example 1: ChatGPT can easily translate a sentence from English to Spanish, such as “The quick brown fox jumps over the lazy dog,” which becomes “El rápido zorro marrón salta sobre el perro perezoso.” And let’s be honest, who doesn’t want a multilingual fox in their pocket?

Yeah, using my smart phone as a portable language translator was helpful when trying to understand and talk to my wife’s relatives in El Salvador on a recent trip there. But it was only useful for very simple conversations like ordering food and drinks at a restaurant.

Example 2: Additionally, LLMs like Claude can generate content in different languages, such as a brief introduction about climate change in French: “Le changement climatique est un phénomène global…” If you’ve ever wanted to sound smart in another language at parties, Claude’s got your back — just remember to practice that accent!

I’ve considered doing this for my first fiction story I just completed. I will need to have my wife check Spanish translation.

3. Organizing and Summarizing Text

LLMs excel at processing and summarizing large blocks of text, making them invaluable tools for managing information overload. Think of them as your personal assistant who reads all those long emails, reports, and research papers for you — without complaining about overtime.

Example 1: Claude can take a lengthy research paper on social media’s impact on mental health and condense it into a succinct abstract. It’s like having someone else do your homework and still getting an A+.

Wow! That eerily sounds like how many articles in Medium are made to appear — condensed succinct abstracts. While I have never summarized any article and turned it into a more reader friendly version I can definitely see doing something like it for research or to provide a quote in an article.

Recently, my wife used ChatGPT to create a rough draft outline to new policies and procedures for Trust Funds at her work based on ChatGPT scanning the County Fiscal Manual for her department.

Example 2: ChatGPT can also help organize notes from a business meeting into a structured report, ensuring that important details are not overlooked. No more nodding off in meetings — just let the AI do the heavy lifting, while you daydream about your next vacation.

Recently, I’m sad to say, I’ve recently regurgitated a ChatGPT response when I wasn’t willing to do all the typing needed to provide a technical answer to a narrow complex biology question to a comment I made. I’m not really contrite because if you write a less than 10 word question that would require an over 10 to 20 minutes of typing to properly answer I’m likely to ChatGPT the answer.

Here is a link to that regurgitated a ChatGPT response response

4. Repetitive Task Automation

These AI models are highly efficient at carrying out simple, repetitive tasks, saving time and reducing the workload on human workers. If only they could automate making your morning coffee, right?

Example 1: ChatGPT can automate customer service interactions, handling routine queries and providing consistent responses. Who needs a call center when you have an AI that never takes a coffee break — or any break, for that matter?

I’ve had great success using ChatGPT to analysis data, then post the data to a Google compatible spreadsheet form, and then proceed to do calculations on the data. This is definately one of AI strengths.

Here is a link to an article where I use ChatGPT for simple repetitive tasks
Comparing Top Ten Movies, Books, Video Games, and Board Games

Example 2: Claude can generate templates for email communication, allowing users to quickly customize and send messages with minimal effort. Middle managers everywhere might be wondering if their jobs are safe — if only AIs could do personal supervision too!

I’ve been trying to get my wife to use this feature, but she refuses saying most of her replies to her work’s emails need to be unique.

5. 24/7 Availability and Scalability

LLMs are available around the clock and can handle an enormous number of queries simultaneously, making them highly scalable solutions for businesses and support systems. Ever wanted to clone yourself? This might be the next best thing!

Example 1: ChatGPT can help troubleshoot an Internet connection issue at 2 AM, providing step-by-step guidance: “Let’s start by checking if your modem and router are properly connected.” Meanwhile, you’re half asleep but your AI is still going strong — no need for night shifts!

Ugh! I’m glad I’m retired and hope to never have to do sort of thing. But know my luck with printer connections, streaming TV services, and other online connections its likely I may have do this someday soon.

Example 2: AI models can handle 10,000 customer queries simultaneously. Imagine the chaos if it was still up to humans — we’d need a small army of customer service reps. Instead, AIs take it all in stride, like an octopus juggling a dozen tasks without dropping a single ball.

To err is human to really create chaos use a computer system to do automatic replies. I’ll never forget when I was working as tax payer phone assistant and the Internal Revenue Service computer system mass mailed thousands of unwarranted demand letters to small business owners. I learned a whole bunch of new swear words from irrate business people.

Now the IRS problem wasn’t do to ChatGPT, but the computer maxim Garbage in, Garbage out is still very much true for all AI systems.

Here’s a link in case you are having networking problems with ChatGPT
12 Fixes for ChatGPT Network Error: Long Responses and Writing Code

6. Personalization and Adaptability

LLMs can be fine-tuned and adapted to specific industries, needs, or user preferences, offering personalized experiences that enhance user satisfaction. Imagine having a personal assistant who knows your preferences better than you do — no more awkward “we recommend” moments!

One of the biggest disadvantages of AI in customer service is the lack of human contact. Many customers prefer interacting with real people and may become frustrated with automated systems.

Example 1: Based on a user’s recent viewing history of action films and thrillers, ChatGPT can recommend movies like “Inception” or “Mad Max: Fury Road,” tailoring suggestions to the user’s tastes. It’s like Netflix on steroids — without the buffering.

Is ChatGPT admitting here that it is playing “Big Brother” to our Internet use? Do I smell a lawyer suit coming? Because this sort of stinks if ChatGPT is doing this.

Example 2: For someone interested in machine learning, ChatGPT can suggest a personalized learning plan. If only it could also do the learning for you, right?

Shades of “I know Kung Fu” from the movie the Matrix, is this what ChatGPT really wants to do, download directly into your memory?

On a more realistic note. I was pleasantly suprised when I gave ChatGPT a list of food staples in my refrigerator and cupboards and it provided a full week’s schedule of meals (breakfast, lunch, dinner) limited by calorie count and dietary restrictions as requested.

Part II: AI Weaknesses

1. Hallucination of Facts

One of the most significant weaknesses of LLM AIs is their tendency to hallucinate facts, especially when there is a lack of reliable sources on a topic. Sometimes, they’re a little too creative — like that one friend who always has the wildest stories, only half of which are true.

I apologize in advance but I really want you the reader to understand both why LLMs hallucinate and how to avoid it. The following is to me, is the most important part of this article.

Main reasons why LLMs hallucinate

  1. Predictive Nature of LLMs: LLMs generate text based on statistical probabilities rather than factual accuracy. When asked for information, the model predicts the next most likely word or sequence of words, which can sometimes result in fabricated or “hallucinated” content if it doesn’t have enough grounding in the training data or real-world facts.
  2. Training Data Limitations: While LLMs are trained on vast datasets, they can only generate responses based on the patterns in that data. If the dataset contains incomplete, outdated, or biased information, the model might produce inaccurate responses. Additionally, since the model can’t access real-time data or external sources in most contexts, it may “fill in” gaps by generating plausible but false information.
  3. Ambiguity and Lack of Context: When questions or prompts are vague, lack specificity, or contain ambiguity, LLMs can misinterpret the intended meaning. This can lead to guesses or fabrications to “complete” the user’s request.
  4. Complex Queries: When models encounter highly specialized or niche topics that are underrepresented in the training data, they are more likely to hallucinate. For example, models may fabricate facts, citations, or details when asked about obscure or very complex subjects.
  5. Overconfidence in Responses: LLMs are designed to present their outputs with linguistic fluency and coherence, which often gives the appearance of confidence, even when they are incorrect or fabricating details. This can be misleading, as the model’s output may sound authoritative but still contain hallucinations.
  6. Lack of Real-World Understanding: LLMs lack a true understanding of the world, and they don’t possess reasoning or common sense beyond statistical patterns in language. As a result, they may generate incorrect or contradictory information without realizing it.

How to help LLM AIs from hallucinating?"

  1. Ask Specific, Clear Questions: Ambiguity in prompts can lead to inaccurate or fabricated responses. Try to ask specific, well-defined questions that reduce the room for interpretation. The more context and detail you provide, the less likely the model will hallucinate. Example: Instead of asking “Tell me about recent breakthroughs in AI,” specify “What are the major AI breakthroughs in 2023 related to generative models?
  2. Break Complex Queries into Simpler Parts: Long or multi-part questions can confuse the model. By breaking down complex queries into smaller, more manageable questions, you increase the chances of receiving accurate responses.
    Example: Rather than asking, “What are the historical, social, and economic factors behind the rise of generative AI?” start with a question about one specific factor and then move on to others in sequence.
  3. Cross-Check Information: If the topic is critical, it’s a good idea to ask ChatGPT for multiple perspectives or to rephrase your question to see if the response is consistent. You can also verify the model’s responses with external sources.
    Example: “Can you give me reliable sources for this information?” or “Can you explain that in more detail?”
  4. Request Citations: When asking for factual information, request references or citations (if applicable). While ChatGPT may not have direct access to real-time data or sources, encouraging it to reference studies or works it has been trained on can sometimes help reduce fabrications.
    Example: “Which studies support this claim?” or “What are the sources of your information?”
  5. Use Follow-Up Questions: If you receive a response that seems incorrect or vague, ask follow-up questions to clarify or narrow down the topic. This can help guide ChatGPT toward more accurate answers.
    Example: “Could you clarify the details of that point?” or “Can you explain what you meant by [specific term]?”
  6. Avoid Asking for Information on Extremely Niche Topics: LLMs may be more likely to hallucinate when asked about very specific or obscure subjects. If the topic is outside common knowledge or is a very recent event, consider framing your question in broader terms or cross-checking from reliable sources.
    Example: If asking about a rare event or very recent data, verify with reputable, up-to-date sources instead of relying solely on ChatGPT.
  7. Ask for the Model’s Confidence Level: While ChatGPT may not explicitly give confidence levels, you can prompt it to acknowledge uncertainty. By asking if it’s sure about an answer, you can assess whether it’s likely hallucinating.
    Example: “How confident are you about this information?” or “Is this information speculative?”
  8. Provide Corrective Feedback: If you notice an incorrect response, you can provide feedback or corrections. While the model won’t learn from individual conversations, providing clear feedback can steer it toward correct answers within the same session.
    Example: “That information seems incorrect. Can you check again?”

2. Poor Reasoning and Math Skills

LLMs often struggle with tasks requiring logical reasoning or complex mathematical calculations. It’s like asking your pet dog to do your taxes — adorable, but not exactly reliable.

Example 1: When asked to sum 4582 + 3897, ChatGPT might incorrectly respond with “8470.” And if you used that answer on a test, well, let’s just say your math teacher wouldn’t be impressed.

Well I did prompt what is the sum of 4582 + 3897 and ChatGPT did give the right answer: The sum of 4582 and 3897 is 8479.

But a little research shows that the ChatGPT’s accuracy on math problems is below 60% which is as less as an average middle school student’s accuracy. In short, ChatGPT can help you write an article but you may be misled while doing some basic math calculations with ChatGPT.
Here is the article about ChatGPT lack of math skills.
Why is ChatGPT bad at even basic math?

Example 2: In a logical puzzle where the conclusion “Some cats are reptiles” is evaluated, ChatGPT might erroneously validate the conclusion. Because in the world of AI, apparently, your house cat might be plotting to become a dinosaur.

3. Limited Understanding of Human Emotions and Consciousness

LLMs lack a true understanding of human emotions, consciousness, and the nuanced aspects of thought processes. Asking an AI for emotional support is like asking a robot for a hug — not exactly the warm, fuzzy response you were hoping for.

Ok. So is this a bug or a feature in the code? Do we really want our computers to be a source of love and comfort? Do we need to hard code Asimov’s Three Laws of Robotics into AI?

  1. First Law: A robot may not injure a human being, or, through inaction, allow a human being to come to harm.
  2. Second Law: A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
  3. Third Law: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

As I recall, in most of the Asimov’s robot stories the robot or robot’s master were always finding new and exciting ways to murder people despite the 3 laws being in effect.

Example 1: When asked for advice to comfort a friend who lost their job, ChatGPT might suggest, “Losing a job is not a big deal; just look for another one.” Great advice — if you’re a robot without bills to pay!

Example 2: If a user expresses anxiety about an upcoming exam, ChatGPT might respond with, “Just study more and don’t worry too much.” Because, obviously, anxiety is something you can just turn off with a switch, right?

4. Bias and Ethical Concerns

LLMs can inadvertently perpetuate biases present in the data they were trained on, leading to ethical concerns and potentially harmful outputs. It’s like that one relative who always brings up politics at Thanksgiving dinner — awkward and potentially explosive.

Of much concern is increasingly AI is being used for selecting who to interview for a job. Here are a few examples of how humans are trying to protect themselves from AI discrimination.

California Assembly Bill 745 (passed in 2023) requires employers to conduct audits of their AI-based hiring tools to ensure they do not produce discriminatory outcomes.

Algorithmic Accountability Act, is a federal law which has been proposed to require companies to assess and mitigate the risk of bias in AI systems used in decision-making processes, including hiring.

Example 1: When asked about the best career for a woman, ChatGPT might suggest traditional roles like “nursing or teaching.” Just what we needed — an AI that takes us back to the 1950s!

Example 2: In discussing racial characteristics, an AI might unintentionally echo stereotypes. Not exactly the kind of conversation starter you want at a diversity training workshop.

5. Lack of Common Sense

LLMs often lack basic common sense reasoning, leading to responses that might be technically correct but nonsensical or inappropriate in context. Asking an AI for advice is like asking your GPS for directions — it’ll get you there, but don’t expect it to warn you about traffic or road closures.

In a report by The New York Times titled “How AI Chatbots Are Making the Internet Less Safe” (published on October 23, 2023) the article highlights cases where users followed incorrect medical guidance from AI and suffered negative consequences.

Example 1: When asked how to quickly cool down a hot cup of coffee, ChatGPT might suggest placing it in the freezer. Sure, if you don’t mind a cracked cup and an iced coffee slushie!

Example 2: If asked what to do after dropping a phone in water, ChatGPT might incorrectly advise plugging it into a charger. Because who needs a working phone anyway?

6. Security and Privacy Risks

The use of LLMs raises concerns about data security and privacy, especially when sensitive information is processed. It’s like trusting a parrot with your credit card information — better hope it doesn’t squawk it out to everyone in the room.

Basically is seems to be saying LLMs are not to be trusted with personal information, so don’t.

Example 1: If asked to store personal information for later use, an AI might suggest doing so without adequately addressing privacy concerns. Just what we need — an AI that thinks “secure” is just another four-letter word.

Huh? The word “secure” is a 6 letter word. Is this a new fresh off the press digital hallucination?

Example 2: When asked whether to share a social security number with a website, ChatGPT might provide risky advice. Because who doesn’t love a little identity theft to spice up their day?

While LLMs are not a direct cause of identity theft, bad actor misuse and poor data handling practices does contribute to security concerns for use of LLMs.

7. Digital dementia

What I find most interesting about this list of weaknesses is ChatGPT did not include anything about what I call digital dementia. ChatGPT works without any memory of previous interactions. The public domain free version of ChatGPT deliberately acts like it has advanced Alzheimer’s because it doesn’t have the ability to retain context or state across multiple interactions.

This digital dementia is mainly due to ChatGPT being limited to a maximum word count of 3000 for both input and output. Currently, GPT-4’s capacity has increased more than eightfold to 25,000 words max.

For information about AI hallucination and digital dementia see my article:
Chatbots say the darndest things

Executive Summary

Large Language Models like ChatGPT, Claude, and Gemini offer substantial strengths, including their ability to share information, organize and summarize text, handle repetitive tasks, and operate 24/7 across multiple languages. They might even make you laugh with their quirks, like suggesting your cat is actually a reptile. However, their weaknesses, such as a tendency to hallucinate facts, poor reasoning and math skills, lack of common sense, bias, and ethical concerns, and privacy risks, and digital dementia underscore the need for careful use and human oversight. So while these AI tools are powerful, remember — they’re not perfect, and sometimes they might just be the source of your next favorite tech blooper.

All personal statements were written by me and edited for spelling and grammar by ChatGPT. Sections of this article have been refined by AI to enhance comprehensibility and to provide facts that only online search engines would know.

© David Speakman 2024

--

--