Google Dopamine: New Reinforcement Learning framework

In this article, Dopamine “New reinforcement learning framework” is explained briefly with the coding session in which we train the simple Dopamine agent.

HAMZA ABDULLAH
THE 21st CENTURY
Published in
4 min readSep 23, 2018

--

Introduction (What is Reinforcement learning)

Reinforcement learning is an important part of Machine Learning. Reinforcement learning resembles in learning of humans and animals how they learn about the environment. In reinforcement learning, machine learns through its actions performed and results.

In Reinforcement learning, the learner is a decision making agent that takes actions in an environment and receives reward or penalty for its actions in trying to solve a problem. After the trial and run error run, it should learn the best policy, which is the sequence of actions that maximize the total reward.

Reinforcement learning gains a lot of momentum over the past few years. A lot of research and development has been done in this area. Google has also contributed in the field and released the new framework which provides speed, stability and reproducibility in R&D of reinforcement learning.

New framework called “Google Dopamine” a new reinforcement learning based on tensorflow framework.

Google Dopamine

You can check code for this article on GitHub.

Google Dopamine is a new Tensorflow-based framework that aims to provide flexibility, stability, and reproducibility for new and experienced RL researchers alike. Inspired by one of the main components in reward-motivated behavior in the brain and reflecting the strong historical connection between neuroscience and reinforcement learning research.

Dopamine is an open source framework with following features.

· Easy experimentation: Make it easy for new users to run benchmark experiments.

· Flexible development: Make it easy for new users to try out research ideas.

· Compact and reliable: Provide implementations for a few, battle-tested algorithms.

· Reproducible: Facilitate reproducibility in results.

Google has provided with the Github repository with well defined code and well explained how this framework works.

Now we move to the coding session for this article in which we build simple agent and train it. And then we see the results.

Install necessary packages

First we will install all the necessary packages needed to build this agent from scratch.

#dopamine for RL
!pip install — upgrade — no-cache-dir dopamine-rl
# dopamine dependencies
!pip install cmake
#Arcade Learning Environment
!pip install atari_py

After installing the required packages, we will install

import numpy as np
import os
#DQN for baselines
from dopamine.agents.dqn import dqn_agent
from dopamine.atari import run_experiment
from dopamine.colab import utils as colab_utils
#warnings
from absl import flags

Then we will initialize the BASE_PATH to store logs and game environment for which we’re training the agent.

#where to store training logs
BASE_PATH = '/tmp/colab_dope_run' # @param
#which arcade environment?
GAME = 'Pong' # @param

Now create a new agent from scratch

#define where to store log data
LOG_PATH = os.path.join(BASE_PATH, 'basic_agent', GAME)
class BasicAgent(object):
"""This agent randomly selects an action and sticks to it. It will change
actions with probability switch_prob."""
def __init__(self, sess, num_actions, switch_prob=0.1):
#tensorflow session
self._sess = sess
#how many possible actions can it take?
self._num_actions = num_actions
# probability of switching actions in the next timestep?
self._switch_prob = switch_prob
#initialize the action to take (randomly)
self._last_action = np.random.randint(num_actions)
#not debugging
self.eval_mode = False


#policy here
def _choose_action(self):
if np.random.random() <= self._switch_prob:
self._last_action = np.random.randint(self._num_actions)
return self._last_action

#when it checkpoints during training
def bundle_and_checkpoint(self, unused_checkpoint_dir, unused_iteration):
pass

#loading from checkpoint
def unbundle(self, unused_checkpoint_dir, unused_checkpoint_version,
unused_data):
pass


def begin_episode(self, unused_observation):
return self._choose_action()


def end_episode(self, unused_reward):
pass


def step(self, reward, observation):
return self._choose_action()

def create_basic_agent(sess, environment):
"""The Runner class will expect a function of this type to create an agent."""
return BasicAgent(sess, num_actions=environment.action_space.n,
switch_prob=0.2)
basic_runner = run_experiment.Runner(LOG_PATH,
create_basic_agent,
game_name=GAME,
num_iterations=200,
training_steps=10,
evaluation_steps=10,
max_steps_per_episode=100)

Now we will train the agent which we created in code just above.

print('Training basic agent, please be patient, it may take a while...')
basic_runner.run_experiment()
print('Done training!')

Loading baseline data & training logs now

!gsutil -q -m cp -R gs://download-dopamine-rl/preprocessed-benchmarks/* /content/
experimental_data = colab_utils.load_baselines('/content')
basic_data = colab_utils.read_experiment(log_path=LOG_PATH, verbose=True)
basic_data['agent'] = 'BasicAgent'
basic_data['run_number'] = 1
experimental_data[GAME] = experimental_data[GAME].merge(basic_data,
how='outer')

Now at the end finally result of the trained agent.

import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(16,8))
sns.tsplot(data=experimental_data[GAME], time='iteration', unit='run_number',
condition='agent', value='train_episode_returns', ax=ax)
plt.title(GAME)
plt.show()
Result: Trained agent which we just build from scratch above.

If you like this post, give it a ❤️ below so others may see it. Thank you!

--

--

HAMZA ABDULLAH
THE 21st CENTURY

Driven by a futuristically optimistic vision, I am dedicated to transforming society through innovation, striving to become a Type 1 civilization.