Assume the overestimation exists, I know that if all values overestimate uniformly, the optimal…
Huang henry
1

Why should they be uniform? You can not guarantee that the overestimations are only concentrated around the states about which you want to learn more. By some randomness, you may end up in such area of states which are not of interest for the best possible solution.

But of course you can end up in the area of states of interest, where overestimations wont cause any problems at all