Large Language Models and Generative AI — Part 1

Vedanth Venkatesh
4 min readMay 20, 2023

--

A large language model (LLM) is a type of artificial intelligence (AI) algorithm that uses deep learning techniques and massively large data sets to understand, summarize, generate and predict new content.

The term generative AI also is closely connected with LLMs, which are, in fact, a type of generative AI that has been specifically architected to help generate text-based content.

What is it and why it is so popular?

All language models are first trained on a set of data, and then they make use of various techniques to infer relationships and then generate new content based on the trained data. Language models are commonly used in natural language processing (NLP) applications where a user inputs a query in natural language to generate a result.The ELIZA language model debuted in 1966 at MIT and is one of the earliest examples of an AI language model.

An LLM is the evolution of the language model concept in AI that dramatically expands the data used for training and inference. In turn, it provides a massive increase in the capabilities of the AI model.

While there isn’t a universally accepted figure for how large the data set for training needs to be, an LLM typically has at least one billion or more parameters.

Parameters are a machine learning term for the variables present in the model on which it was trained that can be used to infer new content.

In 2018 Google released their seminal BERT model and in 2023 we have the latest breakthroughs like LLaMA by Meta AI and GPT-4 by OpenAI.

BERT by Google

GPT-3 by OpenAI

LaMDA by Google

PaLM by Google

LLaMA by Meta AI

GPT-4 by OpenAI

Transformers

Modern LLMs emerged in 2017 and use transformer neural networks, commonly referred to as transformers.

With a large number of parameters and the transformer model, LLMs are able to understand and generate accurate responses rapidly, which makes the AI technology broadly applicable across many different domains.

Some LLMs are referred to as foundation models, a term coined by the Stanford Institute for Human-Centered Artificial Intelligence in 2021. A foundation model is so large and impactful that it serves as the foundation for further optimizations and specific use cases.

How do large language models work?

LLMs take a complex approach that involves multiple components.

At the foundational layer, an LLM needs to be trained on a large volume — sometimes referred to as a corpus — of data that is typically petabytes in size. The training can take multiple steps, usually starting with an unsupervised learning approach. In that approach, the model is trained on unstructured data and unlabeled data. The benefit of training on unlabeled data is that there is often vastly more data available. At this stage, the model begins to derive relationships between different words and concepts.

The next step for some LLMs is training and fine-tuning with a form of self-supervised learning. Here, some data labeling has occurred, assisting the model to more accurately identify different concepts.

Next, the LLM undertakes deep learning as it goes through the transformer neural network process. The transformer architecture enables the LLM to understand and recognize the relationships and connections between words and concepts using a self-attention mechanism. That mechanism is able to assign a score, commonly referred to as a weight, to a given item (called a token) in order to determine the relationship.

Once an LLM has been trained, a base exists on which the AI can be used for practical purposes. By querying the LLM with a prompt, the AI model inference can generate a response, which could be an answer to a question, newly generated text, summarized text or a sentiment analysis.

What are large language models used for?

LLMs have become increasingly popular because they have broad applicability for a range of NLP tasks, including the following:

  • Text generation. The ability to generate text on any topic that the LLM has been trained on is a primary use case.
  • Translation. For LLMs trained on multiple languages, the ability to translate from one language to another is a common feature.
  • Content summary. Summarizing blocks or multiple pages of text is a useful function of LLMs.
  • Rewriting content. Rewriting a section of text is another capability.
  • Classification and categorization. An LLM is able to classify and categorize content.
  • Sentiment analysis. Most LLMs can be used for sentiment analysis to help users to better understand the intent of a piece of content or a particular response.
  • Conversational AI and chatbots. LLMs can enable a conversation with a user in a way that is typically more natural than older generations of AI technologies.

Among the most common uses for conversational AI is with a chatbot, which can exist in any number of different forms where a user interacts in a query-and-response model. One of the most widely used LLM-based AI chatbots is ChatGPT, which is based on OpenAI’s GPT-3 model.

To be continued..

--

--