Summary: Conservative Policy Iteration

Conservative Policy Iteration has 3 goals: (1) an iterative procedure guaranteed to improve a performance metric, (2) terminate in a “small” number of steps, and (3) find an “approximate” optimal policy. These three goals are hit by relying on a few assumptions…


Summary: PlaNet

Deep Planning Network (PlaNet), is a model-based agent that learns a latent state dynamics model from images and takes actions…


Summary: Proximal Policy Optimization(PPO)

Ideas from this summary are taken from the Proximal Policy Optimization paper.

PPO offers two key improvements to policy gradient methods:

  1. Surrogate objective include a simple first order trust region…

Summary: Deep Deterministic Policy Gradients

This post is a summary of Continuous Control With Deep Reinforcement Learning.

This basic goal of this paper was to transfer the success from deep Q learning achieved in discrete action domain to a continuous action domain. In…