How probable is it that training to maximise the total entropy would lead to catastrophic…
Richa Verma
2

Hi Richa,

There should be no reason that optimizing for entropy is any more likely to lead to catastrophic forgetting compared to optimizing for reward. It could also be argued that in the case of a series of learned tasks, simply optimizing for the reward in each task would be more likely to lead to catastrophic forgetting, since then the policy would overfit to each of the sub-tasks.