If the optimal policy is deterministic, do variance and bias still make sense?
Fabio Zinno
1

Hi Fabio,

If the you are using an optimal policy, and it is deterministic, and the environment is also deterministic, then you will have zero variance and bias (unless you are using a function approximator with limited representation capacity).

For the actual learning process however, this is almost never the case, since some amount of exploration is required in order to improve the policy.