Demystifying Temperature in Machine Learning

Tuning the Randomness

Dagang Wei
3 min readMay 10, 2024

This article is part of the series Demystifying Machine Learning.

Introduction

Temperature? In machine learning? You might be surprised to learn that temperature isn’t just for measuring the weather. It plays a crucial role in how some machine learning models make decisions, particularly in tasks like text generation and classification.

The Math Behind the Heat

Let’s dive into the concept of temperature. Imagine a machine learning model that predicts the next word in a sentence. The model assigns a probability score to each possible word. The higher the score, the more likely the model believes the word is the next one.

Here’s where temperature comes in. It acts like a dial, controlling how “peaky” these probabilities are.

  • High Temperature (T > 1): The model assigns more even probabilities across all words. This makes it more likely to choose unexpected or creative options, encouraging exploration.
  • Low Temperature (T < 1): The probabilities become more concentrated on the highest scoring words. The model becomes more conservative, favoring the most likely choice.

Mathematically, temperature is applied using the softmax function along with a scaling factor. Softmax transforms raw scores (logits) into probabilities between 0 and 1. The temperature is then used to divide the logits before applying softmax. Here’s the equation:

probability(i) = exp(logit(i) / T) / sum(exp(logit(j) / T) for all j)
  • probability(i): The probability of the i-th element.
  • logit(i): The raw score (logit) of the i-th element.
  • T: The temperature.

As you can see, dividing by a higher temperature stretches the exponential function, resulting in more even probabilities.

So Why Use Temperature?

Machine learning models often deal with uncertainty. Temperature helps us control the balance between exploration (trying new things) and exploitation (focusing on the most likely option). Here are some applications:

  • Text Generation: High temperature can lead to more creative and surprising text, while low temperature promotes coherence and adherence to the original style.
  • Image Classification: A high temperature might help the model identify rare objects in an image, while a low temperature ensures it focuses on well-known categories.

By adjusting the temperature, we can fine-tune the behavior of our models for specific tasks.

Example in Python

Let’s see how this works in Python using the TensorFlow library. The code is available in this colab notebook.

import tensorflow as tf

# Sample logits
logits = tf.constant([1.0, 2.0, 3.0])

# Softmax with high temperature (T=2)
high_temp_probs = tf.nn.softmax(logits / 2.0)
# Softmax with high temperature (T=1)
medium_temp_probs = tf.nn.softmax(logits)
# Softmax with low temperature (T=0.5)
low_temp_probs = tf.nn.softmax(logits / 0.5)

# Print the probabilities
print("High Temperature:", high_temp_probs.numpy())
print("Medium Temperature:", medium_temp_probs.numpy())
print("Low Temperature:", low_temp_probs.numpy())

Output:

High Temperature: [0.18632373 0.3071959  0.5064804 ]
Medium Temperature: [0.09003057 0.24472848 0.66524094]
Low Temperature: [0.01587624 0.11731043 0.86681336]

This code defines some sample logits and calculates the softmax probabilities for both high and low temperatures. Running this code will show you how the probability distribution changes based on the temperature setting.

Conclusion

Temperature might seem like a simple concept, but it has a significant impact on how machine learning models make decisions. By understanding the math and its implementation in Python, you can unlock new ways to control and optimize your models!

--

--