Archive of stories published by Arxiv Bytes

Homepage

Open in app

All

Zac Wellmer in Arxiv Bytes

Oct 16, 2017

Summary: Prioritized Experience Replay

Ideas from this summary are taken from the Prioritized Experience Replay Paper.

2 responses

Zac Wellmer in Arxiv Bytes

Feb 16, 2019

Summary: World Models

One of the core issues in Reinforcement Learning is sample complexity. Therefore it’s appealing to train RL agents in a…

Zac Wellmer in Arxiv Bytes

Sep 13, 2018

Summary: Proximal Policy Optimization(PPO)

Ideas from this summary are taken from the Proximal Policy Optimization paper.

PPO offers two key improvements to policy gradient methods:

Surrogate objective include a simple first order trust region…

Zac Wellmer in Arxiv Bytes

Apr 24, 2019

Summary: SimPLe

Ideas and figures from this summary are taken from Model-Based Reinforcement Learning for Atari(SimPLe).

Zac Wellmer in Arxiv Bytes

Nov 9, 2017

Summary: Deep Deterministic Policy Gradients

This post is a summary of Continuous Control With Deep Reinforcement Learning.

This basic goal of this paper was to transfer the success from deep Q learning achieved in discrete action domain to a continuous action domain. In…

Zac Wellmer in Arxiv Bytes

Jan 31, 2019

Summary: Learning Plannable Representations with Causal InfoGAN

Zac Wellmer in Arxiv Bytes

May 6, 2019

Summary: Conservative Policy Iteration

Conservative Policy Iteration has 3 goals: (1) an iterative procedure guaranteed to improve a performance metric, (2) terminate in a “small” number of steps, and (3) find an “approximate” optimal policy. These three goals are hit by relying on a few assumptions…

Zac Wellmer in Arxiv Bytes

Feb 24, 2019

Summary: PlaNet

Deep Planning Network (PlaNet), is a model-based agent that learns a latent state dynamics model from images and takes actions…

Zac Wellmer in Arxiv Bytes

Sep 12, 2018

Summary: TreeQN

Ideas from this summary are taken from the TreeQN and ATreeC paper.

Zac Wellmer in Arxiv Bytes

Sep 15, 2018

Summary: Value Prediction Networks(VPN)

VPN is a deep reinforcement learning architecture that mixes ideas from both model free and model based methods. Generally model based methods learn environment dynamics so as to predict real observations, however, VPN attempts to learn a dynamics model that…

These were the top 10 stories published by Arxiv Bytes; you can also dive into yearly archives: 2017, 2018, 2019, and 2020.

About

Arxiv Bytes

short research summaries

More information