Papers simplified: Reward learning from human preferences and demonstrations in Atari

Adib AKM
Jan 2 · 5 min read
Spoiler alert: breakout was actually one of the games that were experimented with in this paper!

Background

Problem

This is a common example of a search to optimize ones’ performance in tic-tac-toe

Solution


Method

This is the pseudo-code for training our RL algorithm.

Results


Conclusion


Data Driven Investor

from confusion to clarity, not insanity

Adib AKM

Written by

Adib AKM

Life’s short but my passion for machine learning will be long ;)

Data Driven Investor

from confusion to clarity, not insanity