Predictive neural networks for reinforcement learning

Model 1 (our)

# Step 1: frame f_t — CNN1 → embedding e_t — policy → action a_t
# Step 2: e_t, a_t — pred_net → e^_t+1
# Step 3: step play: a_t — game → f_t+1 — CNN1 → e_t+1
# Step 4: minimize ||e_t+1 — e^_t+1||

Note: our model, with 1 CNN model in common and no Inverse model

Model 2 from here

# Step 1: Policy: S_t → CNN1 + PI classifier→ a_t
# Step 2: S_t, a_t → Forward model→ r^_t+1 and: S_t, S_t+1 → CNN2→ Inverse model→ a^_t
# Step 3: minimize || r^_t+1 — r_t+1 || and || a_t — a^_t|| for both Forward and Inverse models

Note: two CNN model that have to encode the same representation! Not efficient

Model 3 — modified from 2

# Step 1: Policy: S_t → CNN1 + PI classifier→ a_t
# Step 2: S_t, a_t → Forward model→ r^_t+1 and: S_t, S_t+1 → CNN1→ Inverse model→ a^_t
# Step 3: minimize || r^_t+1 — r_t+1 || and || a_t — a^_t|| for both Forward and Inverse models

Note: just one CNN model