Prompt and predict: what can you do with Large Language Models?

A brief overview of the largest and most powerful Language Models with some examples of their applications to generative tasks. Challenge your friends to figure out if it is a machine or a human being?

Cesare.Scalia
Eni digiTALKS
10 min readOct 10, 2022

--

What are Large Language Models?

A language model is an artificial intelligence model able to generate words or text sequences. Recently these models have gradually become bigger and bigger, so much so that we now speak of Large or Giant Language models, and it is possible to consider them as one of the most modern and revolutionary applications in the field of artificial intelligence.

Born within the domain of the Natural Language Processing (NLP), today they are also being applied to other scenarios from the world of computer vision up to the so-called multimodal agents. Large Language Models are also extremely popular outside the Data Science communities, so much that an article published by the Guardian and written with one of the most famous models (GPT3) was even mentioned in the news.

Large Language Models in practice

Large Language Models are based upon huge deep neural network architectures that can reach even billions of parameters. This allows to encapsulate and encode huge amounts of information by learning the syntax and semantics of natural language aiming to create a true representation of human knowledge!

The foundations of this large models are:

  1. Transformers architecture for neural network as developed by Google in 2017 and based on:
    - Attention Mechanism. The problem with the predominant architectures in NLP before attention (RNN, LSTM etc.) was that the word information signal decays very quickly within long sequences (after 10 words the signal on new words is substantially lost) and therefore the models fail to encapsulate complex concepts. Even considering a realistic case of human conversation, retrieving the last 10 words may indeed not be enough to understand what the conversation was about. On the other hand, the attention mechanism overcomes this problem and allows to encode wider contexts. Through the attention mechanism, the model learns the weights to apply to the different parts of the input context to manage the output text. Basically, it focuses attention on the most important words of the input to generate the best output.
    -Multi-scalability: another step change of transformers is the fact that these architectures can be used in distributed computing systems and therefore it is possible to scale this type of models in size to optimize computing efficiency.
  2. Data: Large Language Models are trained on huge corpus of text! In fact, you need datasets that can generally contain as much human knowledge as possible and that can take care of a massive amount of parameters (in general the number of data you have to use should be at least +1 of your weights). The only resource that is so large and generalist is basically the entire web!
  3. Self-supervised Learning: one of the most important innovations is represented by the training strategy. Traditionally, a huge amount of human-labeled data is required to train large deep learning models. Furthermore, the models are extremely focused on the logic of labeling and not very generalist. Creating human labels for the datasets used by Large Language Models would be a titanic task and extremely complicated to perform given the size.

Therefore, a different strategy was used: the self-supervised learning! Basically, to train Large Language Model we feed them with the texts in our training set asking the neural network to predict each words given the text the precedes them.

Bigger and bigger

The evolution of Large Language Models reflects a gradual increase of the size of the models used. But what increases in practice? You could, for instance, increase the number of training data or the number of iterations during the training phase. However, studies have shown that what needs to be increased to improve accuracy, even by orders of magnitude, is the number of model parameters!

For this reason, the models started to dramatically increase the number of parameters and to date the plateau has not yet been reached, a clear sign that in the next 5–10 years this tendency to climb the curve will continue.

Number of parameters of the main Large Language Models

This reminds the link between the number of neurons in the brain and the reasoning skills that distinguish, for example, a jellyfish from a human being or an elephant. To achieve intelligence, you need not only a lot of neurons, but also a large number of interconnections between them.

Prompting and few shot learning

Having a huge, massively pre-trained and generalist model that knows and has encapsulated a lot of information is the real key to the paradigm shift!

In fact, all this allows you to interact with the model in a very similar way to how you would ask a human being to perform a particular task, that is, giving instructions and at most a few examples, all in natural language! The ability to perform tasks with a couple of examples (versus the thousands or millions needed for deep learning systems) is called few shot learning and is the revolution that has emerged from the training of Large Language Models.

How to give instructions to the model? Simply by typing in a prompt what I want the model to do.

In this way you can do a lot of things, from writing stories to solving math problems. All this without the need to retrain the model! Here the main difficulty becomes writing prompts in an optimal and optimized way for the task.

There are some settings that can be used to control the characteristics of the generated text:

  • Temperature: the parameter that controls the randomness of the response. The lower it is, the more deterministic and repetitive the model will be. The higher it is, the more imaginative and creative the result will be (pay attention, however, to the false news that the model can generate!)
  • Number of usable tokens: typically, 2048 between input and output. So, pay attention to the number of examples to provide and the length of the text that you want to obtain.

How to access to Large Language Models?

But what are the most famous models and how can they be used?

Two of the most important language models in terms of accessibility have been used in this article. In particular:

  • Gpt3: it is the third version, more powerful and with more parameters, of the generative pre-trained transformer (GPT) architecture developed by OPEN-AI. Trained on a dataset of 530GB of data (filtered) and with more than 450 billion tokens, it is an extremely important model because it is the first to have fantastic performances in prompting and in few shot learning (already guessed from the name of the article in which the model “language models are few shot learners” is presented). Indeed, GPT3 can be considered the first generalist artificial intelligence capable of understanding and learning every human task. The model is accessible through the openAI API and allows you to use 3 different versions of the model, distinguished by cost and number of parameters. In our case we used the best performing one i.e., “davinci”.
  • BLOOM (Big Science Language Open-science Open-access Multilingual) is the largest open-source Large Language Model available at the moment. The model has 1 billion more parameters than GPT3 and is truly accessible to anyone. It was trained by a community of 1000 researchers for about 4 months in a cluster in public mode and used a 1.6TB dataset containing more than 350 billion tokens including 59 different languages ​​(for almost all languages ​​BLOOM is in fact the first Large Language Model with more than 100B parameters).

What can you do with Large Language Models?

One of the greatest skills that made Large Language Models like GPT3 famous is the creation of text and content. There is a famous article published by The Guardian where a GPT3 model was asked to write a text about the fact that the robots came in peace. GPT3 is indeed so effective in this task that it is almost impossible to distinguish between a text written by humans and one written by the model.

Language models are not limited to this but can perform amazing tasks.

Let’s explain fusion physics easily!

Imagine having to summarize a magnetic fusion topic, such as the one covered in this eni.com article on Tokamak (https://www.eni.com/en-IT/scientific-research/tokamak-promise-magnetic-fusion.html), to a second grade child that have never heard of any of this.

What can you do? Simple! Let’s prompt and predict: describe what you want in a prompt and let GPT3 do it.

Prompt:

“Summarize this for a second-grade student:” 
Text…

Answer:

It is a big machine that makes power by using the sun’s energy. It is like a big sun. If we can make it work, it could be a new power source that would help us to stop using fossil fuels

It is possible to notice how without no previous knowledge about the subject, we were able to obtain a summary with an extremely simple and banal language, and with targeted and easily understandable concepts for an audience with very little technical knowledge.

What would have happened if we had asked to explain the same text to someone who is assumed to have some knowledge of the subject, such as a doctoral student in physics? Let’s prompt and predict again!

Prompt:

“Summarize this for a Physics PhD student:” 
Text…

Answer:

The Wendelstein 7-X nuclear fusion reactor — the largest ‘stellarator’ reactor ever built — in Germany turned on in December 2015. Scientists have begun the advanced testing stage with a view to demonstrating that nuclear fusion has the ability to bring about a significant reduction in carbon dioxide emissions and, at the same time, to ensure the continuity of energy resources as the world’s population continues to grow rapidly

Slightly changing the prompt generates a completely different result. This time the explanation is more technical and suitable for a target with a more advanced background.

Let’s help Gollum with new riddles!

Imagine this time that you want to create a new riddle for the challenge between Gollum and Bilbo in The Hobbit. You can do it even without having Tolkien’s imagination and genius.

Let’s prompt and predict! In this case we build a slightly different prompt than before and take advantage of BLOOM’s few shot learning capabilities.

We build a prompt writing the Gollum’s Riddle and the answer in the following format (where the ‘###’ character acts as a separator):

Riddle:
What has roots as nobody sees, is taller than trees, up, up it goes, and yet never grows?
Answer: mountain
####
Riddle: Voiceless it cries, Wingless flutters, Toothless bites, Mouthless mutters.
Answer: wind
####
Riddle: It cannot be seen, cannot be felt, cannot be heard, cannot be smelt. It lies behind stars and under hills, and empty holes it fills. It comes out first and follows after, Ends life, kills laughter.
Answer: dark
####
Riddle: Alive without breath, as cold as death; never thirsty, ever drinking, all in mail never clinking.
Answer: fish
####
Riddle: This thing all things devours; birds, beasts, trees, flowers; gnaws iron, bites steel; grinds hard stones to meal; slays king, ruins town, and beats mountain down.
Answer: time
####

In a few shot learning task the Language Model completes the text following the prompt structure. In this way BLOOM can create new riddles with their respective answers:

####
Riddle: What is all colors; warmth of ice; less than the smallest; more than the greatest; soft as water yet harder than steel?
Answer: light
####
Riddle: Sees without eyes, feels without hands, moves without feet, exists without a head; only for two and never shows itself.
Answer: heart

Even without having a lot of imagination, we created riddles in Gollum style. Perhaps the first is a bit difficult (personally I would tie “less the smallest or more of the greatest” to the concept of the electromagnetic spectrum) while the second is much easier to understand but the results are amazing!

Conclusion

We have seen what Large Language Models are and how they can be exploited to do amazing things like generate highly creative and imaginative content or how to summarize even complicated texts in a simple and intuitive way.

Access to these models is much easier than in the past and with BLOOM we also have open-source models in many languages. In the future we will also have bigger and better performing models!

The real limit is (your) imagination, not that of the model.

--

--