LLM: Introduction and Guide to Develop Applications

8 min readJun 10, 2024

This guide aims to understand the core technologies essential for building applications powered by Large Language Models (LLMs). It is suitable for both developers and machine learning specialists who have a basic understanding of the concepts. Ready to dive in? Let’s unleash the power of LLMs together!

Grab your popcorn, sit comfortably and let’s go 🚀…

Introduction to Large Language Models :

I think you’ve already heard a thousand times what an LLM is, so please listen to me one more time 😂 …

Large language models (LLMs), like those used in OpenAI’s ChatGPT , are deep neural network models that have been developed over the past few years.

Large Language Models (LLMs) are really good at working with human language — they can create, understand, and make sense of text in a way that seems smart and fits the context.

Large Language Model (LLM) is a HUGE neural network that predicts the next token based on what was previously predicted.

When we talk about these models understanding language, we really mean they are good at handling text. They don’t actually think or feel like humans. 🙂

As you can see, GPT-2 and GPT-3 are much larger and more complex than GPT. Just look how big GPT-3 is 😱.

GPT-3 can create more complicated and interesting texts than the first GPT, but also take more computer power and time to work. 🥱

Open Source vs Closed Source Models :

Open Source Models:

LLaMA-2 (Meta): A versatile model offering robust language understanding.
Falcon : Fast and efficient, tailored for innovation.
Mistral (Mistral AI): Highly adaptable, perfect for specific application tuning.

Closed Source Models:

GPT-4 (OpenAI): Advanced capabilities in language comprehension and generation.
Bard (Google): Integrates deep learning for engaging and informative interactions.
Claude (Anthropic): Focuses on safety and ethical AI with user-friendly features.

For more details on these and other models, visit the Prompting Guide: https://www.promptingguide.ai/models/collection

Prompt Engineering :

What is a Prompt?

“If LLMs were genies in a bottle, prompts would be your wishes. Be careful what you wish for (write carefully), because the LLM tries its best to fulfill it! 😱”

Well , Prompts are basically the text input to the LLMs. Anything you are writing to an LLM is a prompt.

Some people might not take prompt engineering seriously, calling it just a trend . But the truth is, we still don’t fully understand how Large Language Models (LLMs) work. Why do they sometimes give good answers and sometimes hallucinate?

While Large Language Models have demonstrated impressive capabilities in understanding and generating human language, integrating aspects of human emotional intelligence could take their performance to new heights.🙄

Wait, wait, let me explain…🙋🏼‍♂️

**source:** **https://arxiv.org/pdf/2307.11760.pdf**

Comparison of Original Prompt vs. EmotionPrompt :

As you see in the below table , EmotionPrompts significantly enhanced LLMs’ performance, truthfulness, and responsibility in generative tasks.

Below is a table summarizing the emotional prompts (EmotionPrompt) along with examples:

For each prompt, the original task is …

“Determine whether a movie review is positive or negative.”

Retrieval Augmented Generation (RAG) :

Adding your own data to LLM using RAG

This section is about how to add your own data to a pre-trained LLM (Large Language Model) using a prompt-based approach called RAG (Retrieval-Augmented Generation).

What is RAG ?

RAG, or Retrieval-Augmented Generation, is a technique used in large language models (LLMs) to improve the quality and accuracy of generated responses by integrating external knowledge retrieval mechanisms.

RAG Review :

Using RAG (Retrieval-Augmented Generation) in an AI application involves these steps:

The user asks a question.

Let’s say you ask, “How do I fix my bike’s flat tire?”

2. The system looks for documents that might have the answer.

The system searches through a bunch of documents, like manuals or help guides, to find ones that might explain how to fix a bike’s flat tire.

3. The system makes a prompt for the LLM (Large Language Model).

The system takes your question, the documents it found, and some instructions, and puts them together in a way the LLM can understand. It’s like giving the LLM a cheat sheet with all the important info.

4. The system sends this prompt to the LLM.

The system hands over this cheat sheet to the LLM and says, “Use this info to answer the question.”

5. The LLM gives an answer based on the information it got.

The LLM reads the cheat sheet, looks through the documents, and then gives you an answer like, “To fix your bike’s flat tire, first remove the wheel, then take out the inner tube, patch the hole, and put everything back together.”

Example:

You: “How do I fix my bike’s flat tire?”
System: Searches for bike repair guides and finds a few relevant ones.
System: Creates a prompt for the LLM: “User asked how to fix a bike’s flat tire. Here are some documents with instructions. Please help.”
LLM: “To fix your bike’s flat tire, first remove the wheel, then take out the inner tube, patch the hole, and put everything back together.”

This way, RAG helps the AI give you a useful answer by using both the documents it found and its own smarts

When to Use RAG (Retrieval-Augmented Generation) : ✅✅

Need Up-to-Date Information:

Use RAG when your application needs to provide the latest information. For example, if you’re dealing with news articles, RAG helps by retrieving the most recent and relevant news to answer questions accurately.

2. Specialized Knowledge Area:

Use RAG when specific, detailed knowledge is needed that goes beyond the general information an LLM has been trained on. For instance, if you need to access internal company documentation to answer queries about company policies or procedures, RAG can fetch the precise documents needed.

When NOT to Use RAG : ❌❌

General Conversational Applications:

Don’t use RAG if your application handles general conversations that don’t require specific or additional information. For example, casual chats or basic Q&A where the LLM’s existing knowledge is sufficient don’t need RAG.

2. Limited Resources:

Avoid using RAG if you have limited computational resources. The search component of RAG involves working with large knowledge bases, which can be costly and slow. Although it’s faster and cheaper than fine-tuning, it still requires significant resources.

What is Fine-Tuning an LLM ?

Fine-tuning means taking a pre-trained language model and giving it extra training on a specific dataset.

Start with a Pre-trained Model:

Imagine you have a super-smart AI that has read millions of books, articles, and websites. It knows a lot about everything

2. Choose a Specific Dataset:

You pick a bunch of documents or data related to a particular subject. For example, if you want the AI to be great at answering medical questions, you gather medical books and articles.

3. Train the Model on This Data:

You teach the AI using this specific data so it gets really good at understanding and talking about that topic. This is like giving it extra lessons focused on the subject you care about.

4. Better Performance in Specific Areas:

After this extra training, the AI is much better at answering questions or doing tasks related to that specific subject. It still remembers everything it learned before but is now extra knowledgeable about the new topic.

Now you might have one question:

Why Not Just Use RAG? 🙆🏻‍♂️🙆🏻‍♂️
RAG (Retrieval-Augmented Generation) is great for pulling in up-to-date or specific information when needed. But, if you want the model to deeply understand a specific domain or to write in a certain style consistently, fine-tuning is the way to go. 🚀🚀

Example:

Imagine you want the AI to write emails just like you do.

You could fine-tune the LLM on a collection of your personal emails.

After this extra training, the AI can write responses that sound just like you, using your tone and style. 😎

How Fine-Tuning Works :

Let’s see how it works — it’s not as complicated as it sounds:

Basic Approach to Fine-Tuning on Domain-Specific Data:

Start with a Basic LLM: You can download a pre-trained model from HuggingFace.
Prepare Training Data: Collect and organize instructions and answers relevant to your domain.
Select a Fine-Tuning Method: Popular methods include LoRA (Low-Rank Adaptation) and QLoRA.
Retrain the Model: Use your prepared data and chosen method to train the model on the new data.

When to Use Fine-Tuning :

Specialized Applications:

Use fine-tuning for applications that require deep knowledge of specific topics, like legal document processing that needs to understand professional vocabulary.

When NOT to Use Fine-Tuning :

General Applications:

Avoid fine-tuning for broad applications that don’t require specialized knowledge. The general capabilities of the LLM should be sufficient.

Dive in and have fun exploring what they can do!