Demystifying the parameters that affect the output of Large Language Models

Sharath S Hebbar
3 min readNov 13, 2023

--

Large Language Models (LLMs) are machine learning models trained on a massive amount of text data to generate human-like text or perform language-related tasks. These models are designed to understand and generate text in a way that mimics human language patterns and structures and can be thought of as the next generation after more traditional natural language processing (NLP) capabilities. They have revolutionized the field of natural language processing, serving as the foundation for cutting-edge NLP applications such as Google Bard, OpenAI’s ChatGPT, and many others. These applications harness the power of large language models, which are trained on massive corpora and reinforced learning techniques.

The base of these came from Transformers

Refer to this paper: https://github.com/SharathHebbar/Data-Science-and-ML/blob/main/papers/Attention%20is%20all%20u%20need.pdf

Get to know the basics of Transformers: https://medium.com/@sharathhebbar24/transformers-an-intution-3ef6ef3b15f5

The evolution of Language models is as follows:

Large Language Models Evolution Tree

Hyperparameters are essentially the settings that are decided before training the model, and they significantly influence how the model learns and performs.

So to get the best output of the LLM we need to tune the hyperparameters

Important Hyperparameters to watch out

1. Temperature:

It controls the degree of randomness in token selection. Lower temperatures are good for prompts that expect a true or correct response, while higher temperatures can lead to more diverse or unexpected results. With a temperature of 0 the highest probability token is always selected. For most use cases, try starting with a temperature of 0.2.

2. Token Limit:

It determines the maximum amount of text output from one prompt.

3. Top-k:

It changes how the model selects tokens from the output. A top-k of 1 means the selected token is most probable among all tokens in the model’s vocabulary (also called greedy decoding), while a top-k of 3 means that the next token is selected from among the 3 most probable tokens (using temperature).

4. Top-p:

It changes how the model selects tokens for output. Tokens are selected from most probable to least until the sum of their probabilities equals to top-p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-p value is 0.5, then the model will select either A or B as the next token (using temperature).

5. Stop Sequence:

A stop sequence is a series of characters that stops the response generation if the model encounters it. The sequence is not included as a part of the response.

6. Block Abusive Words:

Adjusts how likely you are to see responses that could be harmful. Model responses are blocked based on the probability that the model contains violent, sexual, toxic, or derogatory content.

7. Return response:

The maximum number of model responses generated per prompt. Responses can still be blocked due to safety filters.

Note: All these parameters cannot be applied to all the models make sure to do your research before trying out the hyperparameters.

Link to the repo: https://github.com/SharathHebbar/Transformers/tree/main/config

Link for further research: https://huggingface.co/docs/transformers/main_classes/text_generation

To learn about Transformers watch out for this repo as I will be adding my learning over here
https://github.com/SharathHebbar/Transformers

--

--