Create an AI to Play Chrome Dino Game with CNN
In this article, we are going to implement a Convolutional Neural Network (CNN) model to play Chrome Dino game which is a extension of Google Chrome. When everything is finished we will see something like below.
To implement the CNN model, we will do these in order:
- Firstly, we are going to creating a dataset by playing the game for a while.
- To create a dataset, we are going to grab the screen while playing the game and recording which key was pressed at that moment.
- After the dataset is created, we are going to train a classification model which is prepared using CNN.
- After the classification model is trained, we are going to test the model by a test software and come back to previous things in order to obtain a better model according to the results.
Data Preparation
Before preparing the dataset, it was decided that what would be included in the dataset. For the classification model to be prepared in this context, the following things were decided:
- The input of the model will be a matrix conversion of screenshots taken from a specific part of the game.
- The classes to classify will be:
Up: This class will be used to jump the dino in the game.
Down: This class will be used to slide down the dino in the game.
Right: This class will be used to keep the dino in the game on its straight path.
According to the structure of the classification model, screenshots were taken while playing the game. Whatever button was pressed (Up, Down, Right) at the time these screenshots were taken, this button information was also used in the name given to the screenshot to be stored as a class. While taking screenshots, a square area determined with borders is taken as a basis in front of the dinosaur in the game.
We can see the process of implementing the model in Figure below.
Data Pre-processing
Sometimes, there were inconsistencies between screenshots and class information due to the speed of taking screenshots while playing the game. Since such incompatibilities in the dataset will adversely affect the training process of the model, the inconsistent data were examined and deleted.
While creating the dataset, screenshots were saved in RGB format. However, since there is not much color information in the images (usually there are black and white tones), the data was converted to “greyscale” before give it to the model as input. Since RGB images are 3-dimensional, the training process would take a long time if the model was trained with such data. Thus, the training time of the model was reduced by performing the conversion process, and FPS (frame per second) was increased while playing the game.
Data Visualization
There are 3 classes in the dataset (Up, Down and Right) as specified before. So we can see some examples of these classes in Figure below. There are too many examples like this in the dataset.
The distribution of classes in the dataset with a total number of images of 1032 was as in the graph below. Since the “Up” button is pressed more in the game, the most data is in this class.
Before starting the training, the dataset is divided into “training set” and “test set” as 90% — 10%. After this process, the distribution of the dataset was as follows.
Creating the CNN Model
Logistic Regression, Artificial Neural Network or Convolutional Neural Network may come to mind when it comes to classification with deep learning. Among these three structures, Convolutional Neural Network (CNN) is generally the leading one for image classification. Therefore, it was decided to use a CNN model, since image classification is required for playing the game. For this purpose, the CNN model in below Figure was created using the Keras library.
In above CNN model, there are 482,563 parameters to be trained as we see. In addition, dropout layer is used before the classification layer to prevent overfit while training.
Training the CNN Model
After the CNN model was created in section above, the training process was started for 5 epochs. In order to achieve better results during training, hyperparameters were manually changed, fine tuning was performed, and the model that gave the best result was selected. The results of training are:
Train Accuracy (Success on Training Data): 100%
Test Accuracy (Success on Test Data): 100%
The change in the accuracy and loss of the model during the training process is shown in the graphs below.
Since the accuracy of the model on the test data was 100% after the training, the confusion matrix, which shows how the classes are mixed with each other, emerged as no confusion as follows.
Testing the CNN Model
After the CNN model was created and the training process was done on the data, the testing phase was started. In the test phase, the weights obtained after the training process were used.
At this stage, a software created for the test was run and a screenshot of the relevant region in the game was taken. Then, a prediction is made using the model weights. This prediction will be one of the Up, Down and Right classes. If this prediction is:
Up: The test software created will continue the game by pressing the “Up” key of the keyboard.
Down: The test software created will continue the game by pressing the “Down” key of the keyboard.
Right: The test software created will continue the game by pressing the “Right” key of the keyboard.
Summary
In this article, firstly we prepared a dataset with playing the game. Then we simply create a CNN model to train with the dataset. Lastly, we create and start a test software, and now our AI is playing the Chrome Dino game :)
You can see all codes and model weights in this GitHub repository.