🤯AI: Deep Reinforcement Stack

Zoiner Tejada
Mar 20, 2023

--

Taking a peek at the DRL toolbox.

[This article is a part of the 🤯AI series]

The literature on Deep Reinforcement Learning is still heavily academic, littered with Greek symbols (aka Math) and if you are just getting started it can be a little overwhelming how you go from this:

Example objective function being optimized in Deep Reinforcement Learning

to being able to run this:

    def optimize_model(self):
T = len(self.rewards)
discounts = np.logspace(0, T, num=T, base=self.gamma, endpoint=False)
returns = np.array([np.sum(discounts[:T-t] * self.rewards[t:]) for t in range(T)])

discounts = torch.FloatTensor(discounts).unsqueeze(1)
returns = torch.FloatTensor(returns).unsqueeze(1)
self.logpas = torch.cat(self.logpas)

policy_loss = -(discounts * returns * self.logpas).mean()
self.policy_optimizer.zero_grad()
policy_loss.backward()
self.policy_optimizer.step()

Source: Implementation of REINFORCE by Miguel Morales

Since a picture can be worth a thousand words, here is the stack I tend to use most often.

Want to kick the tires with this stack? See my previous article in this series.

--

--

Zoiner Tejada

CEO Solliance | Entrepreneur | Investor | AI Afficionado | Microsoft MVP | Recognized as Microsoft Regional Director | Published Author