The Impact of Temperature in LLMs: Balancing Determinism and Creativity

jithin james
2 min readJul 12, 2023

--

Notes from the course

How temperature affects next word prediction

Temperature plays a crucial role in the functioning of LLMs (Large Language Models) by influencing the randomness and creativity of the generated text. In LLMs, temperature acts as a parameter that controls the diversity of the output. It determines the level of randomness in the generated text by manipulating the softmax function, which is responsible for determining the probabilities of the next word in a sequence.

When the temperature is low (e.g., close to zero), the generated text tends to be more focused and deterministic. The LLM produces more confident predictions, resulting in output that is more deterministic and less varied. This can be useful when a more conservative and predictable response is desired.

Conversely, when the temperature is high (e.g., closer to one), the generated text becomes more diverse and creative. The LLM introduces more randomness into the predictions, allowing for unexpected and imaginative output. This can be beneficial for tasks that require generating a variety of potential responses or exploring alternative possibilities.

The choice of temperature depends on the specific application and the desired output. A lower temperature might be appropriate for tasks that require precision and consistency, such as question-answering or fact-based generation. On the other hand, a higher temperature can be advantageous for tasks that value creativity and exploration, such as storytelling or generating diverse responses in chatbots.

Finding the right balance in temperature is crucial to achieving the desired output from an LLM. It allows users to control the trade-off between coherence and randomness, tailoring the generated text to suit the specific requirements of the task at hand. By adjusting the temperature parameter, LLMs can adapt their output to various contexts and enhance their usefulness in a wide range of applications.

def get_completion(prompt, model="gpt-3.5-turbo",temperature=0): 
messages = [{"role": "user", "content": prompt}]
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=temperature, # this is the degree of randomness of the model's output
)
return response.choices[0].message["content"]
response = get_completion(prompt, temperature=0.7)

--

--