Getting started with Watsonx.ai II

Nathalia Trazzi
7 min readMay 23, 2024

--

This article is Part II of the series Getting started with Watsonx.ai and we will still be in the Prompt Lab environment, but not necessarily talking about Prompt Lab at the beginning.

Let’s start with Foundation Models

Foundation Models

First things first: Foundation models/Large Language Models belong to the field of Machine learning(ML), and more specifically, to the subfield of Natural Language Processing (NLP).

The image below illustrates the “core” of Artificial Intelligence. Machine Learning is a field of Artificial Intelligence, just as Neural Networks are part of Machine Learning, and Deep Learning is part of Neural Networks, just like Generative AI and Large Language Models.

The Foundation Models are a large-scale AI model trained on vast amounts of text data to understand and generate human-like text. They operate using a Neural Network architecture called Transformer, which is particularly effective at capturing long-range relationships in sequences of data, such as words in a sentence.

Although they can generate text and thus be considered generative to some extent, the term generative usually refers to models that can create new data, such as images, music, or text.

During training, a foundation learns to predict the next word in a sequence of words given the previous history.

This training is done in a task called “next-word prediction”, where the model attempts to guess the next word in a sentence based on the preceding words. Once trained, the model can generate new text by continuing a given sequence of words, extrapolating what it learned during training.

Image credit: https://developer-blogs.nvidia.com/wp-content/uploads/2023/06/general-workflow-for-prompt.png

Watsonx.ai relies on a library curated by IBM with Foundation Models from itself and partner providers.

Getting back to Prompt Lab

As seen in the previous post, the individual attempted a content generation task by asking the selected foundation model to provide a snippet of music lyrics.

Note that the selected model was: llhama-2–70b-chat

This foundation model belongs to Meta and is good for many activities, including content generation.

To choose to switch the Foundation Model, simply click on the word “Model” and select “View all foundation models”.

When “View all foundation models” is clicked, all the foundation models curated by IBM are visible, along with information about them, their providers, and their types.

For example, if the IBM model granite-20b-multilingual is selected, all its documentation will load.

The documentation is quite extensive, and this applies to any foundation model chosen.

To view the tasks that this model performs best, simply look at the top part of the pop-up window.

Next to the current tab “details”, there’s also access to “training taxonomy” where additional information can be find.

To choose the model, select the “Select model” option or go back to choose another model.

In this article, the prompt type selected is Freeform.

Now, there are other important points to consider when performing a task with a foundation model.

Click on “Model parameters.”

Model parameters

The model parameters determine how the task will interact with this model.

  • Decoding

To understand what a decoder is, this section will introduce a basic understanding of transformers.

Transformers are responsible to transform your input and gave the desired output.

Inside transfer there is a encoder and decoder.

The encoders have the same structure (but not share the same weights), each consisting of two sub-layers:

  • The first sub-layer is a self-attention layer, which enables the encoder to consider other words in the input sentence while encoding a specific word.
  • The outputs of the self-attention layer are then passed through a feed-forward neural network. This same feed-forward network is applied independently to each position.

The decoder has both those layers, but between them is an attention layer that helps the decoder focus on relevant parts of the input sentence.

The encoder processes the input sequence first. Then, its output is turned into attention vectors (K and V), used by each decoder to focus on relevant parts of the input. The softmax layer turns these scores into probabilities, ensuring they add up to 1.0. The highest-probability cell determines the word generated.

Decoder models use only the decoder part of a Transformer. At each step, attention layers access only preceding words, making them auto-regressive. Decoder models are often pre-trained to predict the next word, making them great for text generation tasks.

Getting back with Prompt Lab

In Watsonx.ai, there is two types of Decoders to choose.

Greedy

Greedy decoding selects the word with the highest probability at each step of the decoding process.

Take a basic example to see how greedy works in practice

Imagine that your model is trained to generate text, and the greedy decoding will be used to construct a sentence based on the input prompt “My cat has brown…”

During each step of the decoding process, the model forecasts the next word by considering the probabilities associated with words in its vocabulary. With greedy decoding, it simply chooses the word with the highest probability at each step.

  • “My cat has brown” + “eyes”: 0.6
  • “My cat has brown” + “fur”: 0.3

Greedy decoding would select “eyes” because it has the highest probability (0.6). This process continues for each subsequent word until a stopping condition is met or the desired length of the generated text is attained.

Sampling

Sampling decoding is more variable, more random than greedy decoding.

With sampling, instead of selecting the word with the highest probability at each step, the model randomly samples a word from the probability distribution generated by the model.

For example, considering the prompt “My cat has…”, if the model predicts the following probabilities for the next word:

  • “My cat has” + “brown eyes”: 0.6
  • “My cat has” + “soft fur”: 0.3

With sampling, the model may randomly choose “brown eyes” with a probability of 0.6 or “soft fur” with a probability of 0.3. The selection is random, but the probabilities influence the likelihood of each word being chosen.

Sampling introduces diversity in generated text compared to greedy decoding as it incorporates randomness into the selection process. However, it may result in less coherent or relevant output at times.

In Sampling Decoding, there are some configuration parameters.

Temperature: The highest is your temperature the more creative your output is going to be.

Top K: It’s like a rule that tells the computer to consider only a handful of the most likely words, when trying to guess the next word in the sentence.

Top P: Top p pick words only up to certain point, depending on how likely they are. It stops the computer form going to wild with crazy choices.

Random Seed:

The random seed is the starting number for the model’s random number generator, used to randomize token choices. If you want consistent results in your experiments, choose and specify the same number each time you run them.

  • Supported range: Integer from 1 to 4,294,967,295
  • Default: Randomly generated

How to use: Set the same random seed value for repeatable results.

Repetition Penalty:

If your generated output keeps repeating, consider using a repetition penalty. This penalty reduces the chances of the model repeating recently used tokens. A higher penalty value results in more varied output.

Supported range: Floating-point number from 1.0 (no penalty) to 2.0 (maximum penalty)

Default: 1.0 How to use: Increase the penalty to decrease text repetition in the output.

Additionally, note that when using greedy or sampling, there is configuration parameters that are present in both decoders.

  • Min Tokens: Parameter that controls the minimum number of tokens in the generated output
  • Max tokens: Parameter that controls the maximum number of tokens in the generated output

As a general rule, there are approximately 750 words for every 1,000 tokens.

  • Stopping criteria: Text generation stops after the model considers the output to be complete, a stop sequence is generated, the maximum token limit is reached, or the model generation time limit is reached.
  • Stop sequences: A stop sequence is a string of one or more characters. If you specify stop sequences, the model will automatically stop generating output after one of the stop sequences that you specify appears in the generated output.

To keep this article from getting even longer, please follow the link to continue with Getting Started with Watsonx.ai part III: https://medium.com/@nathalia.trazzi/getting-started-with-watsonx-ai-iii-93d5b8e55073

--

--

Nathalia Trazzi

AI Engineer. Proficient in Watsonx.ai, Assistant and Discovery. Full stack software engineer and Chatbot developer. Fine art photographer.