🤯AI: Deep Reinforcement Stack
Mar 20, 2023
Taking a peek at the DRL toolbox.
[This article is a part of the 🤯AI series]
The literature on Deep Reinforcement Learning is still heavily academic, littered with Greek symbols (aka Math) and if you are just getting started it can be a little overwhelming how you go from this:
to being able to run this:
def optimize_model(self):
T = len(self.rewards)
discounts = np.logspace(0, T, num=T, base=self.gamma, endpoint=False)
returns = np.array([np.sum(discounts[:T-t] * self.rewards[t:]) for t in range(T)])
discounts = torch.FloatTensor(discounts).unsqueeze(1)
returns = torch.FloatTensor(returns).unsqueeze(1)
self.logpas = torch.cat(self.logpas)
policy_loss = -(discounts * returns * self.logpas).mean()
self.policy_optimizer.zero_grad()
policy_loss.backward()
self.policy_optimizer.step()
Source: Implementation of REINFORCE by Miguel Morales
Since a picture can be worth a thousand words, here is the stack I tend to use most often.
Want to kick the tires with this stack? See my previous article in this series.