Member-only story
Entropy-Regularized Reinforcement Learning Explained
Learn more reliable, robust, and transferable policies by adding entropy bonuses to your algorithm
Entropy is a concept associated with a state of disorder, randomness, or uncertainty. It can be considered as a measure of information for random variables. Traditionally, it is associated with fields such as thermodynamics, but the term found its way to many other domains.
In 1948, Claude Shannon introduced the notion of entropy in information theory. In this context, an event is considered to offer more information if it has a lower probability of happening; the information of an event is inversely correlated to its probability of occurrence. Intuitively: we learn more from rare events.
The notion of entropy can be formalized as follows:
In Reinforcement Learning (RL), the notion of entropy has been deployed as well, with the purpose of encouraging exploration. In this context, entropy is a measure of predictability of actions returned by a stochastic policy.
Concretely, RL takes the entropy of the policy (i.e., probability distribution of actions) as a bonus and embeds it as a reward component. This article addresses the basic case, but entropy bonuses are an integral part of many…