The Making of a Deep Learning Tetris AI

Wenqin Ye
Wenqin’s Blog!
Published in
5 min readAug 9, 2017

--

The code is up on github!

For my final project for my computer science class I attempted to apply deep learning to tetris.

Having no idea where to start I initially followed this report by a student that took Stanford’s cs231n convolutional neural network to guide me: http://cs231n.stanford.edu/reports/2016/pdfs/121_Report.pdf

The report describes a system that is similar to the architecture that Google deep mind used in its paper to play games. The idea is to use a neural network to train a Q-function that predicts the quality of each move, where the quality of a move is defined as the reward gained from taking a given move and then performing optimally from there.

Architecture
Knowing the general idea I needed to decide on an architecture. I planned out three classes that would form the basis of my program: a Q-learning class, an optimizer class and a neural network class.

The neural network class contains the basic functionality for a neural network. It allows for the dynamic initialization of any sized neural network, with arbitrary activation functions. It further handles back propagation, gradient caching (for mini batches), as well as RMSProp.

The optimizer class trains the neural network to produce the desired outputs. It implements mini batch as well as handling errors, and data sets.

The Q-learning script uses deep-q learning as the basis to optimize the neural network.

The deep-q learning architecture

Building it
With a plan of how I wanted to design my system in place, I needed to actually code it.
Being given just three weeks to finish the project by my teacher I set out a plan:

  • Week 1: Learn everything I need to know about neural networks and code a functioning feedforward neural network.
  • Week 2: Learn about convolutional neural networks and code a functioning one
  • Week 3: Implement Q-learning, and learn how to hack the MaTris library to allow an AI to execute moves.

The first week was challenging because I knew very little about back-propagation so I had to scour the internet for resources. Unfortunately if you have ever searched backpropagation on the internet you will only find very formal mathematical explanations of the concept using derivatives. As a result I had to teach myself the calculus and how everything fit into each other. After several days of head banging I managed to figure out the gist of backpropagation and coded a functioning neural network (using matrix multiplication).

During week 2 I wanted to see the results of the neural network right away instead of waiting to code a convolutional one so I skipped straight to week 3 and coded the q-learning algorithm. The hard part was no so much figuring out q-learning but figuring out how the MaTris library I used worked. I had to play around with their code base and figuring out what their different functions did I figured out a way to insert some code so that my neural network could perform any move.

The result of this effort is the following:

From this it became apparent that the vanilla q-learning algorithm was not suited for tetris. The problem is that random moves that the neural network performs in the beginning to explore the game state do not lead to any rewards, since it is unlikely that random moves will lead to a line clear. Since a reward is necessary for q-learning, the neural network consequently never learned anything.

The student report in [1] solves this problem by adding an intermediary reward function to guide the neural network to make good moves so that it will be able to produce a line. But keeping to the spirit of artificial intelligence I chose to give the neural network an imagination! That is I gave the network a goal state to compare to (which is just one filled line) and how close the current game is to that goal state. This relates to the idea that when we want to achieve something we will always compare how close our current move will take us to where we want to be.

The final architecture looked something like the following (note the use of the autoencoder trained on my friend’s gameplay was used to represent high level features of the game to make it easier for the network to train):

The final architecture

But as it turns out that idea too was flawed. Here was the result:

Still not good :(

Even after the updated idea the gameplay did not work as well because taking the difference vector between the expected goal state and the actual goal state turned out not to be a useful metric to guide the neural network to getting the neural network to produce the desired output.

Future Improvements

A better method to determine “closeness” to a goal is required. An idea for the future is to train a separate adversarial network to determine how close the game is to getting a line. The training examples would come from my a professional tetris player’s gameplay, where any game state that leads to a line clear is considered to be close and everything else is considered to be not close. The hope is that the network will be able to generalize to what constitutes close and not close, which can then be used as a helpful guide for the q-learning network.

Conclusion and Reflections

I knew from the beginning this project would be challenging for me. Despite having had others things to work on, I always made sure I dedicated a few hours everyday to work on the project. As a result I was able to follow through on most of the deliverables I set out for myself, and even ended up producing a new idea that was implemented into the neural network. Most importantly I learned a lot about this new field of AI and neural networks no longer seem as mysterious as they used to.

Overall I enjoyed the process and I hope to work on more cool projects in the future.

I end with my favourite quote:

“Have a healthy disregard for the impossible” — Larry Page

--

--