ate our Q func…o an estimated reward, called the Q function. First, we’ll initialize all Q values at equal values. Then, we’ll update the Q value of each action (picking each chest) based on how good the payout was after choosing this action. This allows us to learn a good value function. We will approximate our Q function using a neural ne…