What is an LLM?

All you need to know about LLMs and Generative AI

Zaigam Akhtar
ILLUMINATION
12 min readJul 13, 2024

--

Image by Solen Feyissa on Pexels

Back in 2022, when ChatGPT first took the world by storm, I had no idea what the heck is an LLM. Like many, I’d never heard of the term before. Fast forward to 2024, LLMs are ubiquitous and GenAI is the talk of the town.

Now, as seems to be the nature of the current AI zeitgeist, you can’t escape from such terminology, especially if you hail from a technical background.

The plus point here is that knowing the nitty-gritty of the Generative AI world and how these models function will help you indefinitely. No matter if you’re an Engineer, a Tech Writer, or even a Marketer.

Not to beat around the bush anymore, let’s jump right into the vast and fascinating world of Large Language Models and Generative AI.

Breaking Down LLMs

First things first, there are 3 key things about LargeLanguageModels:

  1. They’re LARGE, like really frickin’ large (in terms of Data Volume).
  2. They deal with the intricacies and formation of language, that is written language for the most part. (While LLMs primarily focus on written language, some advanced models are being trained on code and other forms of data. The ability to handle spoken language is still under development but improving.)
  3. They are highly sophisticated computation models.

Let’s look at these one by one with a finer lens. But before that, we must understand “Foundation Models”.

Foundation Models

A foundation model, aka large AI model, is a machine learning model trained on vast datasets to perform diverse tasks.

In other words, Foundation models are large-scale, adaptable AI models trained on broad data sets. These generative AI models can produce human-like language from various inputs and are built on complex neural networks like transformers, GANs, and variational encoders.

Some common foundation models include GPT-3, BERT, PaLM, Jurassic-1 Jumbo, LaMDA, etc.

The reason that we need to understand foundation models before we jump to LLMs is because Large Language Models are a subset of foundation models.

The difference between the two lies in the scope of data and the purpose they’re used for. Foundation models are more versatile than LLMs, allowing them to handle a multitude of tasks. For instance, a foundation model can be employed to develop a chatbot, translate languages, or craft creative content. In contrast, LLMs are usually specialized for one or two specific functions, like text generation or language translation.

Whilst foundation models are used for a broader spectrum of output, they’re a bit undercooked. LLMs on the other hand, are more developed and stable, in that their accuracy rate is better than foundation models.

Put simply, foundation models are like Swiss Army knives, they’re more general-purpose, whereas LLMs are like specialized tools, designed for specific purposes.

Image generated in Dall.E3

The image above was generated using a foundation model. This wouldn’t have been possible with an LLM, up until recently when OpenAI integrated Dall-E into ChatGPT for image generation. Mind you, That is NOT to say that LLMs can now perform tasks other than text generation — it’s just that things are only getting more integrated.

This is where “Multimodal models” come into the picture. As the name suggests, multimodal models can process and understand information from multiple sources, or modalities, such as images, text, audio, and videos to make better predictions.

Think of them as highly advanced foundation models. This Microsoft Paper titled — “Multimodal Foundation Models: From Specialists to General-Purpose Assistants”, provides a detailed survey of the taxonomy and evolution of multimodal foundation models, particularly those demonstrating vision and vision-language capabilities.

I highly recommend reading it once, if you want a thorough understanding of multimodal foundation models.

Now that we understand the foundation and Multimodal models, let’s break down the 3 pillars of LLMs- “LARGE, LANGUAGE & MODELS”.

  1. LARGE

— Scale of Data: LLMs are trained on vast amounts of text data, often encompassing diverse sources like books, articles, websites, and more. This extensive data helps the models learn various language patterns, contexts, and nuances.

— Model Size: LLMs' architecture involves many parameters, often in the billions. These parameters allow the models to capture complex relationships and patterns within the data, enabling them to generate more accurate and contextually relevant responses.

2. LANGUAGE

— Understanding and Generation: LLMs are designed to comprehend and generate human language. They can process and produce coherent and contextually appropriate text, mimicking human-like language abilities.

— Multilingual Capabilities: Many LLMs are trained on data from multiple languages, allowing them to understand and generate text in various languages, making them versatile tools for global applications.

3. MODELS

— Deep Learning Architectures: LLMs are built on advanced neural network architectures, such as transformers. Such models leverage layers of interconnected nodes to process and learn from data, enabling sophisticated language understanding and generation.

— Transfer Learning: LLMs utilize transfer learning, where a pre-trained model on a large corpus is fine-tuned for specific tasks. This approach enhances the model’s performance on particular applications while benefiting from the extensive general knowledge it acquired during initial training.

Next time somebody asks you about LLMs, you can say — LLMs excel in understanding and generating human language, enabling them to perform various language-related tasks such as translation, summarization, and content generation with high accuracy.

LLMs vs Generative AI

Image generated in Dall.E3

Generative AI is an umbrella term that includes, well — any kind of generation. It could be text, images, audio, video, etc. You know, any sort of content or art. Though I don’t agree firmly with that artwork part (more on that later).

LLMs as we know, are primarily used for human-like text generation. Both of them are subsets of artificial intelligence but serve different purposes and functionalities. The following table will sum it up clearly for you.

Generated in GPT4o

In summary, not all GenAI uses LLMs behind the scenes, neither are all LLMs used for generative tasks.

As evident from our classification table above, some applications of LLMs focus on understanding, classifying, or transforming existing text rather than generating new content. Thus, while all LLMs are a subset of Generative AI, not all uses of LLMs are generative in nature.

What are hallucinations in LLMs?

Image by IceDZerO on DeviantArt (Creative Commons License)

LLMs are powerful, but they’re not without their shortcomings.

Remember when Google introduced Bard in 2023? It answered a 9-year-old’s question about the James Webb Space Telescope’s discoveries. Bard incorrectly claimed the telescope took the first pictures of an exoplanet. In reality, the European Southern Observatory’s Very Large Telescope achieved this milestone in 2004, as confirmed by NASA.

The thing is, sometimes LLMs behave weird. They may produce results that are factually incorrect, nonsensical, or irrelevant. This is termed as so-called “hallucinations” in LLMs.

However, I don’t like the term hallucination because it’s not completely accurate when it comes to LLMs. As R. Paulo Delgado has stated in his recent article on ChatGPT —

“Hallucinations aren’t hallucinations. They’re software bugs.”

He explains that the term makes AI look more human which it is not! You see, AI may be able to somewhat replicate human-like output but at its core, it’s still running on algorithms and mathematical computations.

It is still generating stuff from pre-fed data. Nothing is being created here. The point is when LLMs produce undesired output, it’s called a — “hallucination” by the consensus. It’d rather be more apt to call it a “glitch” however.

That is why I mentioned earlier that I don’t think content generated by art should be deemed as “art”. It’s only a series of output juxtaposed and presented to look appealing.

It’s a very polished form of output — generated solely from data that is fed to the model. Be it with ChatGpt, Gemini, or any other Generative AI tool out there.

Not to sound like an orthodox, but when it comes to creative ventures, using AI feels like a sham.

Sure, it can be of great aid in research and support but that’s about as good as it gets. You know why?

Cuz when you deal with art, you need something more than skills — you need soul. 🩶

Best Practices to Reduce Hallucinations in LLMs

Courtesy: AdultSwim

To make the best use of LLMs, you gotta follow a few strategies to minimize hallucinations. You must also learn how to detect hallucinations in LLMs.

Detecting hallucinations in a Large Language Model is simple and you can use one of the following approaches:

  • Cross-Verification: Compare the model’s output with reliable sources to check for consistency and accuracy.
  • Fact-Checking Tools: Use automated fact-checking tools and databases to verify information.
  • Model Feedback: Implement feedback loops where users can flag incorrect information, helping to refine the model.

You can effectively identify hallucinations in LLM outputs by combining these methods.

That said, we all know that “prevention is better than cure” isn’t it? So let’s explore the strategies to mitigate the possibility of hallucinations in the first place.

Crafting Better Prompts

Generative AI is only as good as your instructions. A well-structured prompt will result in minimal hallucinations. So, stick with the following techniques:

— Chain-of-Thought Prompting: Encourages the model to break down reasoning into steps, improving accuracy in complex tasks.
— Few-Shot Prompting: Uses selected examples to guide the model, helping it produce more factually grounded responses.

Retrieval-Augmented Generation (RAG)

— Combines information retrieval with LLMs to produce accurate outputs.
— Uses benchmarks like Retrieval-Augmented Generation Benchmark (RGB) and RAGTruth to test and reduce hallucinations.

Few-Shot and Zero-Shot Learning

The input data and the procedure used to train the LLM is highly influential for the probability of hallucinations.

— Few-Shot Learning: Provides a few examples to help the model understand the context and desired output, reducing irrelevant or incorrect information.
— Zero-Shot Learning: Allows the model to infer responses without explicit examples, preventing unsupported assumptions.

Fine-Tuning LLMs

Fine Tuning reduces hallucinations in LLMs by adjusting the model’s patterns using curated data to align with specific contexts, improving factual accuracy and coherence.

What is Fine Tuning?

Photo by Loren Biser on Unsplash

Fine-tuning is a method in deep learning when a pre-trained model is further trained on a specific, often smaller, dataset to adapt it to a particular task or domain.

For instance, if you have a pre-trained language model like GPT-3. You can fine-tune this model on a dataset of legal documents to create a specialized model that understands legal terminology and can assist in drafting legal texts. This fine-tuned model will perform better in the legal domain compared to the general pre-trained model.

The core idea behind fine-tuning is refining an existing pre-trained model. One that already has an extensive pool of knowledge, is more efficient and cost-effective than creating a new model from the ground up for a specific task.

If you want to learn how to fine-tune a model, head over to: https://huggingface.co/docs/transformers/en/training

What is RLHF?

RLHF can be considered a specialized form of fine-tuning, in that it also involves refining a pre-trained model. However, the techniques and objectives differ.

Reinforcement Learning from Human Feedback (RLHF) is a technique used to train AI models using human feedback in the learning process. This approach combines traditional reinforcement learning with human evaluations to guide the model toward desired behaviors and improve its performance on specific tasks.

Suppose you have a language model that generates responses in a customer service chatbot. Using RLHF, you can have human reviewers evaluate the chatbot’s responses based on criteria like helpfulness, politeness, and accuracy. The feedback from these reviewers is then used to adjust the model’s behavior, improving its ability to handle customer queries effectively.

How understanding LLMs can help You?

I made a statement at the beginning, saying that learning the intricacies of LLMs can help you regardless of your profession. Let’s see how.

1. Enhancing Professional Skills

  • Engineers and Developers:

— Automation: LLMs can automate repetitive tasks such as code generation, debugging, and documentation.
— Innovation: Understanding LLMs enables the creation of more sophisticated AI applications, from chatbots to personalized recommendation systems.
— Contribution: to the developments being made in GenAI requires you to have a firm knowledge of LLMs.

  • Writers and Content Creators:

— Content Generation: LLMs can assist in generating quick content, including articles, blogs, and social media posts, saving time and effort. I believe they’re a much better research tool than generation.
— Editing and Proofreading: LLMs can enhance writing by providing grammar checks, stylistic suggestions, and improving overall content quality (if you’re smart enough to skim the good from bad).

  • Marketers:

— Personalization: LLMs can analyze customer data to create personalized marketing campaigns, improving engagement and conversion rates.
— SEO Optimization: Understanding LLMs can help you in optimizing content for search engines, enhancing online visibility and traffic.

2. Improving Decision-Making

  • Business Leaders:

— Data Analysis: LLMs can process vast amounts of data to provide insights and predictions, aiding in strategic decision-making.
— Customer Insights: Analyzing customer feedback and sentiment using LLMs helps in understanding market trends and customer needs.

  • Educators:

— Curriculum Development: LLMs can assist in creating educational materials tailored to different learning styles and needs.
— Student Support: Chatbots powered by LLMs can provide round-the-clock assistance to students, answering queries and providing resources.

Whether you’re an engineer automating tasks, a writer enhancing your content, a marketer personalizing campaigns, or a business leader making strategic decisions, the knowledge of LLMs can significantly upscale your effectiveness and productivity.

What is the Future of LLMs and Generative AI?

Photo by Giu Vicente on Unsplash

There have multiple accounts in the past when scientists and engineers mispredicted the AI advent. And then again some predictions (which felt right out of science fiction then) became an actuality.

In the 1950s, the pioneers of artificial intelligence were highly optimistic, believing that AI with human-like capabilities was just around the corner, possibly within months. But we all know it took quite a while to be where we are right now, and we have a long way to go.

Another instance is from the 80s when Japan launched the Fifth Generation Computer Systems project to create machines capable of human-like reasoning by the 1990s. Despite significant investment and effort, the project failed to achieve its ambitious objectives.

Let’s now take an example where a prediction came true —

In the early 2000s, AI researchers like Geoffrey Hinton, Yann LeCun, and Yoshua Bengio predicted that deep learning, particularly neural networks with many layers, would lead to significant breakthroughs in AI. They believed that these networks could surpass previous AI methods in tasks such as image and speech recognition with enough computational power and data.

If you look around, we’re right in the middle of the prediction. So much so that it’d be hard to imagine going without the ease of AI integrated into our day-to-day technology. To name a few such instances:

  • Image Recognition: Systems like Google’s DeepMind and Facebook’s facial recognition software.
  • Speech Recognition: Technologies like Apple’s Siri, Amazon’s Alexa, and Google Assistant.
  • Natural Language Processing: Models such as Google’s BERT and OpenAI’s GPT series.

Long story short, predicting the future of Generative AI is not that simple. However, looking at the trends and proliferation rate of it all, it’s obvious that AI is going to be integrated into our lives more than ever. The real question is — how are we going to adapt to that world?

Final Thoughts

The world of AI is vast and ever-expanding, hence it’s impractical to cover everything in one article. However, for starters, this post should suffice. There’s much to learn and I recommend keeping an eye on new releases and announcements.

That said, here’s a list of a few brilliant resources for you to keep up with the advancements in the world of GenAI and LLMs:

There is a plethora of other sources out there. These are just a few I like and use for my research. You can check these out and comment about your preferred sources for learning about Generative AI or even AI in general.

As we progress forward, it’s essential to continue learning and exploring the latest developments in AI. Engaging with resources, participating in discussions, and applying this knowledge will empower you to make the most of the opportunities presented by LLMs and Generative AI.

If you liked this post or found it insightful, please take a minute to press the clap button, it increases the post visibility for other medium users. Thank You :)

My social links: LinkedIn| Twitter | Instagram

--

--

Zaigam Akhtar
ILLUMINATION

A storyteller trying to find stories in people, places, & experiences worth sharing. I write on a whim about Tech, Books, Films, Self-Improvement, & Poetry. 🌻