1 min readNov 8, 2018
Assume the overestimation exists, I know that if all values overestimate uniformly, the optimal policy(greedy towards) remains the same as that with no overestimation.But why the overestimation is not uniform?
Assume the overestimation exists, I know that if all values overestimate uniformly, the optimal policy(greedy towards) remains the same as that with no overestimation.But why the overestimation is not uniform?