Reinforcement Learning with FIFA and Keras

I recently came across an interesting article by Chintan Trivedi on training a model via reinforcement learning to take free kicks in FIFA. Intrigued by it, I decided to try it out. Unfortunately, the free kicks skills session was not unlocked in my new installed copy of FIFA yet, however the practice shots was available. Excited, I tried the model on the shooting skill. After a couple of minutes running the code I realized the model wasn’t built for the shooting environment. I, therefore decided to spend some time hacking the code to make it suitable for the environment. Following are the results of the model after training it for about 100 epochs.

Random shots taken by the model while training
Shots taken by the model after about 100 epochs

I would like to explain some snippets of my code for those out there who wish to replicate the result or improve it. Please note, I’m no expert in this area. This is one of my initial attempts at reinforcement learning.

I would recommend you to read through Chintan Trivedi’s article and its prerequisites to get a context of the project and also about reinforcement learning in general. Chintan has done an amazing job explaining the code.

So lets begin. First of all, allow me to explain what I’m attempting to build and why. When I tried the FIFA repo on the shooting skill, I found it very difficult to capture one shot ( one shooting event ) individually and get its result. As there is no FIFA API, we have to work with the image data provided. When I ran the model provided by Chintan, most of the rewards collected were assigned to incorrect actions as there was a timing mismatch. Also, the current model didn't make use of any temporal layers which could help predict events that occur in a sequence. To fix these two issues, I decided to capture the entire skill sequence of taking shots along with the actions and feed it to a neural network with LSTM’s. One major hurdle I faced here was that while training the model, I could input a sequence of images and a sequence of expected actions, however while predicting actions in realtime, we need to input only one image and get one output. This was a very confusing challenge, as the model was built to take only batch inputs. I shall explain this issue later. For now here is the model I created with Keras API.

As you can see the model takes in a sequence of images ( about 30 ) in the shape of (30, 256, 256, 3) and outputs a sequence of actions of the shape (30, 4). The 4 actions that are allowed in the game are move right, move left, low shot and high shot. The neural network predicts the probability for the actions from which we shall take the highest two and execute. I have added a ConvLSTM2D layers with MaxPooling3D followed by TimeDistributed Dense Layers. These layers help maintain the shape of the sequence and generate an output of the same size.

Now, we also need to capture the points and text so that we know when the skill has ended and what was the performance. For this, we shall use Tesseract library by Google. Tesseract is an OCR library for text recognition by google. Please follow the instructions here to use it. We must use the PyTesseract API to access it via python. To detect the text on screen, we must first capture the screen as an image. For this we use GrabScreen library and manually find the coordinates of the points text using something like paint.

Points image submitted to PyTesseract

Once we have extracted the text area, we need to preprocess the image so that pytesseract can provide us good character recognition results. I have converted the background to pure white and the foreground points to black using numpy.

Score capture code for FIFA

We also need to capture the ‘RETRY DRILL’ section to detect if the skill session is over in a similar way.

Reward system

The previous model utilized the DQN algorithm to predict the next action. However, I feel that it can only be used for environments wherein we have full control over the environment and can execute one event at a time and receive its result. Something like the OpenAI Gym environment is perfect for DQN. For our case, I just replaced DQN with a simple reward function that says if total points after the skill is greater then 4000, then reward all actions in this sequence with 1. If points are less than 3500, the actions are given a reward of -1. Using this method, we provide rewards to the entire batch of actions instead of just one action.

Reward calculation for entire sequence

Handling the Model

As I mentioned earlier, we have an issue where in we train our model on a sequence, but during prediction have only 1 image to provide and expect one output. Well, we handle this by creating a dummy numpy array of zeros with shape (30, 256, 256, 3). When we wish to predict an action we replace the last element of the numpy sequence with the image i.e zeros[timesteps-1] = image. In the output sequence, let’s pick the last set of actions i.e actions[timesteps-1] and pick two actions with the highest values. I understand that this is not a perfect solution. I myself was unsure of using it. However, it seems to work to a certain extent.

Lets combine all this together, and start training our model. Following are some of the important parameters I’m using.

We begin in our file where we initialize our model and start training. The file contains the main training code. Our model starts taking random actions at the beginning so it can learn about the environment. Overtime, the model learns the actions it needs to take to get the most rewards and starts playing on its own. All the actions taken, are stored along with their screenshots and reward within the file. When required to train, a batch is requested from the and the model is trained.

The entire purpose of the Experience replay file is to store all the recorded data and assign rewards to the actions. When requested for a batch the module creates a pair of inputs ( images ) and a set of rewards for the actions. The get_batch method picks 3 random sequences to train upon and appends the reward to all actions in that sequence.

The last module is the FIFA module is which the interface between the model and the game. It helps us detect the score, check if the drill is over, restart drill, get current state and most importantly execute actions via the act method. The get_reward function returns 1 if the final score is above 4000, 0 if its above 3500 and -1 if its lower then 3500. The code is pretty self-explanatory.

To start training, start FIFA in a windowed mode at 1280x720 resolution. It will start at the top left corner of the screen. Do not move it from there. Choose skills games -> shooting -> bronze shooting. In toggle the train mode to 1 and and run it within python. Click the FIFA window after you're done. the program will automatically recognise the screen and start playing. Train it until 100 epochs to see some results. Your performance will increase from try harder to excellent.

Results during initial epochs
Results after 100 epochs

The entire code is available on Github. Please do provide feedback or any improvement suggestions. Thank you.