What are Generative Pre-trained Transformers (GPTs)?

anitakivindyo
3 min readFeb 16, 2023

--

From chatbots, to virtual assistants, many AI-powered language-based systems we interact with on a daily rely on a technology called GPTs. But what exactly are GPTs, and how do they enable systems to generate human-like texts? This article answers that question.

Photo by DeepMind on Unsplash

Generative Pre-Trained transformers are a type of Large Language Models that use deep learning to produce natural language texts based on a given input.

A user will feed the model with input like a sentence and the generative pre-trained transformer (GPT) creates a paragraph based on information extracted from publicly available datasets. They Can process a wide range of text inputs from text paragraphs, code, even creative writing and they provide insights, analysis or even summaries based on the input given.

How do GPTs do this?

Well, as mentioned earlier GPTs are large Language models (LLMs) They are considered “large” because they have been trained on a massive amount of text data, talk about billions of texts. This alone allows the model to learn a vast array of linguistic patterns and associations therefore building a rich knowledge base.

There are other examples of Large Language Models like BERT — with 110 million parameters, GPT-3 — with 178 million parameters, PaLM — with 540 million parameters.

Parameters here in the context of LLMs refer to the values in the neural network that are optimized during training. These include different learning rate values, the number of connections between the neurons in the model…and so on.

The number of parameters in the model determine how complex the model will be and the amount of information it can both process and store. Therefore, it goes without saying, the larger the number of parameters the more sophisticated the model is, but also the more computationally expensive it is to train and use. Have you experienced downtime while using ChatGPT? That is probably why.

Transformers are at the core of GPTs (it’s even in the name). They were introduced in the 2017 paper — ‘Attention is all You need’ b Vaswani et al

What exactly are Transformers?? 🧐

They are a type of deep learning model that is widely used in NLP tasks like translation and text summarization. Transformers can process both categorical and numeric data, and are particularly effective in processing sequential data such as texts, audios and videos.

What makes Transformers different from traditional Neural Networks? 🤔

Unlike traditional neural networks where information flows in one direction through the layers, transformers allow each layer to address all other layers, enabling the network to focus on different parts of the input sequence as it processes the data at different stages. This key mechanism is known as ‘Self- attention’ or ‘Attention’.

Self- attention works by computing a set of attention weights for each input token. The weights then show the relevance of each token compared to other tokens. The transformer then uses the attention weights to assign more importance to the most significant parts of the input, and assigns less importance to the less relevant parts.

In the case of GPTs, the transformer predicts the next word in a sequence based on the input sequence of words.

How does the Transformer in GPTs predict the next word?

The deep learning model is trained to generate similar text to the input.

It does this by maximizing the probability of generating the next word in the sequence. Once the model is trained, it can be used to generate new text by sampling from the probability distribution of potential next words based on the current input. Pretty cool ,right?

In conclusion, Transformers can be considered a valuable and effective tool for generating sequential data, as demonstrated by ChatGPT, the popular third-generation Generative Pre-trained Transformer model, also known as GPT-3 developed by OpenAI. If you have ever interacted with a language-based AI system there is a good chance you have encountered GPTs or Generative Pre-trained Transformers.

Like its predecessors, GPT-3 has shown significant advancements in generating high-quality language output that is almost indistinguishable from human writing. This is a significant leap forward in natural language processing, and GPTs have the potential to transform how we interact with language-based systems.

You can also have a look at what Generative Ai models entails, here.

References

Attention is all You need’ b Vaswani et al

--

--

anitakivindyo

My personal blog aiming to explain concepts in Machine Learning, Artificial Intelligence and Data science