LLM & GPT: what are they, and how do they work?

Thomas Latterner
7 min readSep 3, 2023

--

Generate by DALL-E

What is an LLM?

Developed by OpenAI, GPT is one of the most popular LLMs (Large Language Models). However, there are many others. Google has developed PaLM 2, Meta has developed Llama 2, and Anthropic has developed Claude 2.

One of the most well-known capabilities of LLMs is text completion. From this, numerous applications arise, such as:

  • Summarizing texts (like Jus Mundi, for instance)
  • Creating chatbots or virtual assistants (like ChatGPT, for example)
  • Assisting in content creation
  • Translating
  • Extracting data
  • Data classification
  • Answering questions
  • Changing the tone of a text
  • Correcting the spelling or grammar of a text (which I did with GPTChat for this article)

If you’re a developer, you can find concrete examples in my previous article:

Definitions and Capabilities of LLMs

LLMs are Machine Learning models based on a transformer architecture, using generative artificial intelligence to perform natural language processing (NLP) tasks.

Besides text completion, LLMs can be adapted for a variety of NLP tasks. For instance, they can be used for emotion detection in text, document classification, named entity recognition, and many more.

A Machine Learning model is an algorithm or mathematical formula that allows a computer to learn from data. Instead of being explicitly programmed to perform a task, it uses data to make predictions or decisions.

The “transformer” architecture is a specific structure used in Machine Learning to process data sequences, like text. It is particularly powerful for understanding long-term dependencies in data.

Generative artificial intelligence refers to AI models that can create new content. In the context of LLMs, this means generating text that didn’t exist before.

Natural Language Processing (NLP) is a branch of artificial intelligence focused on communication between computers and humans through natural language. This can include understanding, generating, or translating text.

To summarize: LLMs are AI models that use the “transformer” architecture to learn and generate text. They aren’t specifically programmed but learn from data. These models can understand and create natural language content, such as translating or generating text.

Learning Process and Cost

During its learning process, the model learns the statistical relationships between words, sentences, and paragraphs, allowing it to generate coherent and contextually relevant responses when given a command or query.

Training a machine learning model, simply explained, is like teaching a child how to solve a problem by showing them many examples. Over time, by seeing enough examples, the child (or the model) learns to recognize patterns or trends and can then use what they’ve learned to solve new problems similar to those they’ve already seen.

Training LLMs requires a lot of time, resources, and energy. It also requires a large amount of high-quality data. The combination of these prerequisites means that only certain large companies can train such models. There are open-source models you can find, for example, on Hugging Face, which are more or less good but struggle to compete with GPT (in terms of capabilities and reasoning).

To put this into perspective, according to some estimates, training GPT-3 would have required 570Gb of data (equivalent to about 300 billion words) and consumed 1287 gigawatt-hours (equivalent to the consumption of about a hundred American households per year). Additionally, there’s the human cost in research and development and the supervisors, whose role is to indicate the correct answers to the model. On top of the training cost, there’s the daily operational cost, amounting to approximately $700k per day.

Being trained on this vast amount of text, an LLM like GPT-3 can then understand several languages and possess knowledge on various subjects. This is why it can produce text in different styles.

Text Understanding

The key to enabling an LLM to understand text and provide quality responses lies in both the context window and the attention mechanism. The attention mechanism allows the model to understand the relationships between words, sentences, and paragraphs, and thus, grasp the deep meaning of a text. The context window, on the other hand, is the maximum number of words the model can use both to generate its response and refer to for keeping a history and accessing new data. It’s not really about words, but tokens. A token can be as short as a letter or as long as a word. For example, “ChatGPT” could be split into “Chat” and “GPT”, each being a token.

The current problem with the context window is that it’s very resource-intensive (memory and computation). For each new word, the model needs to understand its meaning and the relationship it maintains in the text with other words. We’re talking about a quadratic relationship, represented as x². To give you an idea, the square of 15 is: 15² = 15 x 15 = 225, and the square of 30 is: 30² = 900. 30 is double 15, while the square of 30 is 4 times the square of 15. This is called a quadratic relationship. To draw an analogy, it’s like the area of a square. If you double the length of one side, the area becomes four times larger. This quadratic resource consumption effectively limits the understanding of a text in its entirety, the total amount of text that can be sent to the LLM, and used for the response.

This limit is a maximum of 32,768 tokens (about 25,000 words) for GPT-4 and 100,000 tokens (75,000 words) for Claude, Anthropic’s LLM.

The larger this window, the more it allows the model:

  • To perform better, produce better “reasoning”
  • To understand more complex concepts or texts
  • To generate more accurate and precise responses
  • To access a longer conversation history, remember better, and thus maintain conversational consistency and quality

You can see why the attention mechanism is crucial for improving the quality of LLMs.

It’s important to note that LLMs don’t understand anything. Neither human language nor the meaning behind words and sentences. They capture and reproduce the statistical patterns of the billions of sentences they’ve seen during their training phase.

What are GPT and ChatGPT?

GPT, which stands for Generative Pre-trained Transformers, is the LLM developed by the American company OpenAI. The first version of this model was released in 2018. Several versions followed, with the latest being GPT-4, released in March 2023.

Beyond its ability to generate text, GPT has revolutionized how we interact with this technology. Its ability to understand and respond contextually has paved the way for more advanced applications in various fields. You might have noticed that in recent months, many products now offer new features boosted by AI, especially about text completion or chatbot.

ChatGPT is an interface that allows users to use two versions of GPT (3.5 and 4) in a web browser or via an app on Android or iOS. You can access its free version by creating an OpenAI account, or opt for its paid version. By subscribing to the “ChatGPT plus” service, you benefit from faster responses, priority access during peak times, and early access to new features like GPT-4. OpenAI teams continuously improve ChatGPT based on user conversations. They can access your conversation history and use it for this purpose. So, be careful about the information you share! Although you can turn off conversation history to avoid contributing to this improvement, it doesn’t prevent some OpenAI employees from accessing your exchanges, as they are kept for several days to ensure you’re not using the service fraudulently.

Below, you’ll find a screenshot of the ChatGPT interface, accompanied by some explanations for those less familiar with the tool:

The web interface of ChatGPT in September 2023
ChatGPT web interface in September 2023
  1. Model selection
  2. Option to add up to 3 plugins for your next conversation (this allows, for instance, generating a PDF or conducting a search via a third-party service. Some might be paid).
  3. Field for entering your prompt
  4. Button to start a new conversation
  5. Access to the history of your previous conversations
  6. Management of your subscription and preferences

GPT is also accessible via an API, eliminating the need for direct “human” interaction. An API, short for “Application Programming Interface”, is a set of rules and specifications that allow software to interact and communicate with each other. Through an API, applications can exchange information or data without knowing the internal details of their operation. This ensures interoperability between different systems. For GPT, OpenAI has provided a set of APIs to allow developers to integrate the capabilities of this LLM into their applications or platforms. However, using this API isn’t free. OpenAI has adopted a usage-based pricing model: for every 1000 tokens (text units) used, whether for the posed question or the generated answer, fees apply.

LLMs, like GPT, represent a major advancement in the field of artificial intelligence and natural language processing. Their ability to understand, generate, and interact in natural language opens the door to countless applications that were once challenging to implement. However, as with any technology, it’s essential to use it wisely and understand its limitations, which will be the subject of my next article.

I hope you enjoyed your reading! If you want to encourage me continuing writing articles like this, or if you find it useful, feel free to give me some “claps” or to leave a comment.

--

--

Thomas Latterner

Tech lover, LLM Enthusiastic, Entrepreneur, Co-Founder & Chief Technology Officer at Jus Mundi https://jusmundi.com/