What is NLP (Natural Language Processing)? What is the difference between NLP and LLMs (Large Language Models)

Understanding NLP Models, LLMs, AI Architectures (RNN, LSTM, Transformers), and reviewing ChatGPT vs. Gemini vs. Claude vs. Llama 2 vs. Grok 1

Prashant Ram

Published in

Blockchain Bistro

12 min readMar 16, 2024

This article is meant to be primer for understanding NLPs, LLMs and the different AI Models as of March 2024.

In this post we will examine,

The differences between NLP (Natural Language Processing) and LLM (Large Language Models).
Briefly review examples of NLP (Natural Language Processing) AI Models.
eg. ChatGPT, Gemini, Llama 2, Claude, Grok 1
Briefly review examples of non-NLP (Natural Language Processing) AI Models.
eg. Image-Object Recognition Systems, Self-Driving Car Autopilots, Generative Adversarial Networks (GANs) for Image Generation, AlphaGo Zero (Go Playing), Anomaly Detection Systems (Finance) etc.
Understand broadly the different categories of NLP Models i.e.
Statistical NLP Models, Rule-Based NLP Models, and Neural Network-Based NLP Models.
Get a high level understanding of the different Neural Network-Based Architectures, including
RNN (Recurrent Neural Networks), LSTM (Long Short Term Memory Networks), and Transformers, and talk about their relation to LLMs
eg. GPT (Generative Pre-Trained Transformer).
List out some noteworthy neural network models used for NLP (Natural Language Processing) other than LLMs, RNNs, LSTMs and Transformers architecture.
eg. Convolutional Neural Networks (CNNs), Variational Autoencoders (VAEs), and Graph Neural Networks (GNNs).
And we will close off the post with an Addendum briefly describing HuggingFace and VertexAI.

We have a lot to cover so let’s dive right in!!

Difference between NLP (Natural Language Processing) and LLM (Large Language Models)

NLP: refers to the entire field of computer science concerned with interactions between computers and human language. It includes various techniques and approaches for enabling computers to understand, manipulate, and generate human languages. LLMs are a specific type of NLP model.
LLM (Large Language Model): This is a type of NLP model that’s been trained on massive amounts of text data. This allows them to handle a broad range of NLP tasks, including generating text, translating languages, writing different kinds of creative content, and answering your questions in an informative way.

NLP models may be multimodal (meaning that they can use other modes of inputs and generate output other than text), by still using underlying NLP text as an interface.

Well known examples include DALL-E, Sora etc. where the user can provide inputs in a natural language, and the AI generates the output other than text eg. image or video.

Examples of NLP (Natural Language Processing) AI Models

Popular and well known examples of NLP AI models include

ChatGPT (OpenAI) — https://chat.openai.com/
Arguably the most well known NLP LLM AI Model by OpenAI.
Gemini (Google AI) formerly Bard — https://gemini.google.com/
The AI LLM model developed by Google AI, that superseded Bard and PaLM 2 in Feb 2024.
Llama 2 (Meta) — https://www.llama2.ai/
AI LLM model developer by Meta. The first Open Source AI Model made available by a major tech company
Claude 3 (Anthropic AI) — https://claude.ai/
Claude 3 is a family of large language models (LLMs) developed by Anthropic AI, focusing on advanced reasoning and factual language processing.
Claude 3 comes in three variants: Haiku, Sonnet, and Opus.
— Haiku: This is the fastest and most affordable model, ideal for tasks requiring quick responses and basic reasoning.
— Sonnet: Offering a balance between performance and cost, Sonnet is suitable for various NLP tasks like data processing and sales optimization.
— Opus: The most powerful model, Opus boasts near-human reasoning abilities and excels in complex tasks.
Grok 1 (xAI) — https://x.ai/
AI LLM model developed by xAI (AI company founded by Elon Musk). It is powered by Grok-1 LLM

ChatGPT vs Gemini vs Claude 3 vs Llama 2 vs Grok

Other honorable mentions include,

Apple Ferret 7B and 13B
Baidu’s ERNIE
Amazon Olympus (rumored)

Examples of non-NLP (Natural Language Processing) AI Models

AI is a broad term and can be used in a variety of applications that are not related to Natural Language Processing.

Here are 5 examples of AI models that do not use NLP (Natural Language Processing)

Image/Object Recognition Systems: These models are trained on vast datasets of labeled images to recognize objects, faces, or scenes within an image. They function by identifying patterns and relationships between pixels, not by understanding the text descriptions of the images.
Self-Driving Car Autopilots: These systems utilize computer vision, sensor fusion (combining data from cameras, radar, LiDAR), and path planning algorithms to navigate roads. They don’t require understanding language for operation.
Generative Adversarial Networks (GANs) for Image Generation: This type of AI model involves two neural networks competing with each other. One network (generator) creates new images, while the other (discriminator) tries to distinguish real images from the generated ones. This process leads to the generation of increasingly realistic and creative images without any language input.
AlphaGo Zero (Go Playing): This is a deep reinforcement learning model developed by DeepMind. It achieved superhuman performance in the complex game of Go without any human knowledge or pre-programmed moves. It learned solely through playing against itself and millions of simulated games.
Anomaly Detection Systems (Finance): These AI models analyze financial data to identify unusual patterns that might indicate fraud or market fluctuations. They rely on statistical methods and pattern recognition techniques, not language processing.

Types of NLP Models

At a very high level NLP Models can be broadly categorized into the following three types.

Statistical Models
Rule Based Models
Neural Network Based Models (including LLMs) — Learning networks

1. Statistical Language Models

These rely on statistical analysis of large text corpora to predict the next word in a sequence.

Examples: N-gram models (predict the next word based on the n preceding words)

Simpler Approach: These models rely on statistical analysis of text data. They examine how often words or sequences of words appear together to predict the likelihood of the next word in a sequence.
Limited Tasks: They are typically suited for simpler tasks like predicting the next word in a sentence or language generation with limited complexity.
Smaller Datasets: Statistical language models can be effective with smaller datasets compared to LLMs.
Faster Training: Due to their simpler approach and smaller datasets, they generally require less training time and computational resources.
Less Adaptable: They may struggle with adapting to unseen data or complex language constructs.

2. Rule-Based NLP Models

These models rely on a set of predefined rules and linguistic knowledge to process language. These rules are often hand-crafted by linguists.

Strengths:

Accuracy for Specific Tasks: For well-defined tasks with clear rules, they can be highly accurate.
Explainability: The reasoning behind the model’s decisions is often easier to understand due to the explicit rules.

Weaknesses:

Limited Adaptability: They struggle with unseen data or language variations outside the predefined rules.
Scalability Challenges: Creating and maintaining complex rule sets can be time-consuming and not easily scalable for large amounts of data.

Rule-Based NLP Models typically do not use neural networks and require a different kind of training compared to statistical and neural network-based models.

No Neural Networks: Rule-based models rely on a pre-defined set of rules and linguistic knowledge created by human experts (like linguists). These rules govern how the model should process and analyze language. There are no neural networks involved in learning patterns from data.
Different Training: Training for a rule-based model involves defining, refining, and testing the set of rules. This can be an iterative process where linguists may provide additional rules or adjust existing ones based on the model’s performance on test data.

3. Neural Network Based Models (including LLMs) — Learning networks

These AI models use artificial neural networks as their underlying architecture. These NLP models are “learning” models in that they focus on learning complex relationships through neural network architectures. and consequently can adapt to better to unseen data and handle more nuanced aspects of language.

These neural network architectures can include combinations of RNN (Recurrent Neural Networks), LSTM (Long Short Term Memory), Transformers and are used to in LLMs (Large Language Models).

LLMs (Large Language Models) leverage these powerful neural network architectures to learn complex patterns and relationships within vast amounts of text data.

They require enormous datasets for training, often containing petabytes of text and code. Training these models can take days or even weeks on powerful computer systems. However due to their complex learning and vast training data, LLMs can generalize better to unseen data and handle more nuanced aspects of language.

Consequently, LLMs can handle a wide range of NLP tasks, including

Text generation (going beyond simple word prediction)
Machine translation
Question answering
Summarization
And even creative writing to some extent

Summary

Comparison of Statistical, Rule-Based and Neural Network-Based NLPs

As an analogy:

Statistical Language Models: Imagine a small recipe book with common recipe sequences. It can predict the next ingredient based on what came before, but it can’t handle complex or unseen dishes.
Rule-Based NLP Models: Think of a cookbook with very specific instructions. It can guide you perfectly through a known recipe but can’t handle improvisations or unknown dishes.
Neural Network Models — LLMs (Large Language Models): Think of them as a giant library containing cookbooks from various cultures and cuisines. They can not only predict ingredients but also understand the nuances of different culinary styles and potentially even generate new recipes.

Delving further into Neural Network Models (or Learning Models) — i.e. Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Transformers?

Recurrent Neural Networks (RNNs):

Core Idea: RNNs are designed to address the limitation of standard neural networks in handling sequential data.
Standard models treat each data point independently, whereas RNNs have an internal memory that allows them to consider past information when processing the current input.
Structure:
They have a loop-like structure where information is processed not just from the current input but also from the hidden state, which captures the network’s memory of past inputs.
Strengths:
Can handle sequential data like text, speech, or time series data where the order of elements matters. Can learn long-term dependencies to some extent within a limited range.
Weaknesses:
Vanishing Gradient Problem: For very long sequences, the influence of earlier inputs can fade away (vanish) as the network processes later elements, making it difficult to learn long-term dependencies.

Long Short-Term Memory (LSTM) networks:

Addressing RNN Limitations: LSTMs are a specific type of RNN architecture designed to address the vanishing gradient problem.
Internal Structure: LSTMs incorporate special gating mechanisms that control the flow of information within the network. These gates can decide what information to remember from the past, what new information to include, and what to forget.
Strengths:
Can learn long-term dependencies more effectively than standard RNNs, making them suitable for tasks involving longer sequences. Widely used in various NLP tasks like machine translation, sentiment analysis, and text summarization.
Weaknesses:
Can be more complex to train compared to simpler RNNs.

In summary:

RNNs are a foundational concept for handling sequential data in neural networks.
LSTMs are a powerful variant of RNNs specifically designed to overcome limitations in learning long-term dependencies.
Both RNNs and LSTMs are widely used in various NLP tasks, with LSTMs preferred for situations where long-range dependencies are crucial.

Are RNNs, LSTMs different from LLM models?

RNNs & LSTMs: These are specific types of neural network architectures used in NLP tasks. They excel at handling sequential data like text, where the order of words matters. They can come in various complexities depending on the task.
RNNs and LSTMs are building blocks or tools within the realm of neural network-based NLP models.
LLMs (Large Language Models): These are a special kind of neural network model characterized by their massive scale. They are trained on enormous amounts of text data, allowing them to handle a broad range of NLP tasks beyond sequential data.
LLMs are a specific type of neural network model that leverages these building blocks (like LSTMs) at an enormous scale to achieve broader capabilities.

LLMs are a type of neural network model, and RNNs and LSTMs are specific architectures used within these networks. LSTMs are particularly well-suited for LLMs because they can effectively handle the long-range dependencies present in large amounts of text data. This is essential for LLMs to grasp complex relationships between words and concepts, which is critical for their broad range of capabilities.

While LSTMs are a core component, modern LLM architectures may incorporate additional techniques alongside them. These techniques can involve transformers, attention mechanisms, and other advancements to enhance the model’s capabilities.

Transformers

Transformers are a relatively new and innovative neural network architecture specifically designed for sequence modeling tasks.

Their core strength lies in their ability to efficiently process relationships between elements in a sequence, even if they are far apart. This is achieved through mechanisms like “attention” which allows the model to focus on relevant parts of the sequence.

Unlike traditional RNNs (Recurrent Neural Networks) that process information sequentially, transformers can analyze all elements of a sequence simultaneously. This is achieved through a mechanism called “attention.”

Transformers have become a foundational architecture for many NLP tasks due to their efficiency and ability to handle long-range dependencies.

Strengths:
Can model long-range dependencies more effectively than LSTMs in some cases. Efficiently handle parallel processing, making them suitable for large datasets. Widely used in various NLP tasks, often as a core component of LLM architectures.
Working Principle: Unlike LSTMs that process data sequentially, transformers analyze all elements of a sequence simultaneously using attention mechanisms, allowing them to identify relationships between any two parts of the sequence.

NOTE:
ChatGPT stands for “Generative Pre-Trained Transformer” meaning that it uses the Transformers neural network architecture in its core AI model.

References:

The Illustrated Transformer:
https://jalammar.github.io/illustrated-transformer/ (visual explanation)
Attention Is All You Need:
https://arxiv.org/abs/1706.03762
(research paper introducing transformers)

Link between Transformers and LLMs?

LLMs are a specific type of neural network model characterized by their massive scale and are trained on enormous amounts of text data.

Many LLM architectures leverage transformers as a core component. This combination allows LLMs to benefit from the strengths of transformers, such as efficient processing and handling long-range dependencies in large amounts of text data.

Not all LLMs necessarily use transformers exclusively. Some may incorporate transformers along with other architectures like LSTMs or RNNs.

In summary,

Transformers: A Neural Network Architecture
LLMs (Large Language Models): A Specific Type of Neural Network Model

Analogy:

Imagine transformers as a powerful new engine design for cars. This engine design (transformer) can be used in various car models (like LLMs).
Different car models (LLMs) may choose to use this engine (transformer) alongside other features (like additional engines or specific body styles) to achieve their desired functionality.

In essence:

Transformers are a foundational building block (engine design) used in many NLP models, including LLMs.
LLMs are a specific type of NLP model (car model) that often leverage transformers (powerful engine) for their capabilities.

Noteworthy neural network models used in NLP (Natural Language Processing) beyond LLMs (Large Language Models), RNNs (Recurrent Neural Networks), and LSTMs (Long Short-Term Memory networks) and Transformers

1. Convolutional Neural Networks (CNNs):

Focus: While traditionally used in image recognition, CNNs are finding applications in NLP tasks that involve analyzing sequential data with a specific structure.
Strengths:
Can effectively capture local patterns within sequences. Well-suited for tasks like text classification (sentiment analysis) or named entity recognition (identifying locations or people in text).
Working Principle: CNNs use filters (like kernels) that slide along the sequence, identifying and extracting local features.

2. Variational Autoencoders (VAEs):

Focus: These models are used for tasks involving data generation and representation learning in NLP.
Working Principle: VAEs consist of an encoder and decoder. The encoder compresses the input data into a latent representation, capturing its key features. The decoder then attempts to reconstruct the original data from this latent representation. This process helps the model learn meaningful representations of the data, which can be useful for tasks like text summarization or anomaly detection.

3. Graph Neural Networks (GNNs):

Focus: While not as widely used in NLP yet, GNNs are emerging as a promising approach for tasks involving textual data with inherent graph-like structures.
Working Principle: GNNs can process information about nodes (words) and edges (relationships between words) in a graph, allowing them to model relationships between entities in text data. This could be useful for tasks like question answering or information extraction from text with complex structures.

Addendum

What is HuggingFace?

HuggingFace is like GitHub for AI Models.

As of March 2024 HuggingFace has over 500K Open source AI Models.

https://huggingface.co/models

Besides the AI models ones listed above you can also use other open source models hosted on HuggingFace.

What is Vertex AI?

Google Vertex AI provides a cloud based platform where you can deploy and run AI Models from HuggingFace.

Vertex AI is a machine learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. Vertex AI provides a unified platform that seamlessly integrates data preparation, model training, deployment, and monitoring. This significantly reduces the complexity of managing different components and services separately.