🤯AI: Deep Reinforcement Stack

Mar 20, 2023

Taking a peek at the DRL toolbox.

[This article is a part of the 🤯AI series]

The literature on Deep Reinforcement Learning is still heavily academic, littered with Greek symbols (aka Math) and if you are just getting started it can be a little overwhelming how you go from this:

Example objective function being optimized in Deep Reinforcement Learning

to being able to run this:

    def optimize_model(self):
        T = len(self.rewards)
        discounts = np.logspace(0, T, num=T, base=self.gamma, endpoint=False)
        returns = np.array([np.sum(discounts[:T-t] * self.rewards[t:]) for t in range(T)])

        discounts = torch.FloatTensor(discounts).unsqueeze(1)
        returns = torch.FloatTensor(returns).unsqueeze(1)
        self.logpas = torch.cat(self.logpas)

        policy_loss = -(discounts * returns * self.logpas).mean()
        self.policy_optimizer.zero_grad()
        policy_loss.backward()
        self.policy_optimizer.step()

Source: Implementation of REINFORCE by Miguel Morales

Since a picture can be worth a thousand words, here is the stack I tend to use most often.

Want to kick the tires with this stack? See my previous article in this series.

🤯AI: Deep Reinforcement Stack

Written by Zoiner Tejada