Atari Playing AI

Cuau Suarez
Inteligencia Artificial ITESM CQ
2 min readApr 24, 2017

Using Reinforcement Learning (Learning to take better actions in order to have a cumulative reward), scientifics of DeepMind Technologies developed a deep learning model for learning control policies directly from high-dimensional sensory input applied to learning and playing different gamees form the Atari 2600. The model uses a convolutional neural network, trained with a variant of Q-learning (It works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter https://en.wikipedia.org/wiki/Q-learning), whose input is raw pixels and the output is a value function estimating future rewards.

A convolutional network was used as the inputs are the frames of the Atari 2600 (210x160 RGB at 60Hz). The problem with this is that it is a lot of data and it is hard to process it a real time, the solution was to preprocess it by converting it to gray-scale and down-sampling it to a 110x84 image. Then, the picture was cut to a 84x84 area which contains the playing area.

The first hidden layer convolves 16 8x8 filters with stride 4 and applies a rectifier nonlinearity. The second hidden layer convolves 32 4x4 filters with stride 2 with another rectifier. The final hidden layer is fully-connected and consists of 256 rectifier units. The output layer is a fully-connected linear layer with a single output for each valid action (between 4 and 18). This network is referred as a Deep Q-Network (DQN).

There is one special characteristic about the learning process of this architecture, there is a technique known as experience replay which stores the agent’s experiences at each time-step pooled over many episodes into a replay memory. This allows the AI to distinguish some actions that worked in the past and in which conditions, and to update the weights. Also, as a matter of fact, learning directly from consecutive samples is inefficient due to the strong correlations between samples, randomizing them break these correlations and reduces the variance of the updates.

This was tested with 7 different games: Beam Rider, Breakout, Enduro, Pong, Q*bert, Sequest, and Space Invaders. The AI outperformed all previous approaches on six of the games, and surpassed a human expert on three of them.

For knowing more: https://arxiv.org/abs/1312.5602

--

--