The Gopher with Artificial Intelligence
Using Supervised ML for Games
Machine learning is used everywhere these days. For the Go language conference GopherCon 2019, we put Go’s Gopher mascot into an endless running game. By collecting data from human players and training a model on GCP, we were able to create an AI player that scored better than many of the humans. The backend code is available in this repo.
Gopher Run was playable on Chromebooks at Google Cloud’s GopherCon booth. The main controls are simple: jump with up arrow and roll with down arrow. The levels are procedurally generated, so they’re infinite and change every time (but contain similar patterns). You take damage if you run into spikes or bugs, and after losing your three lives, the number of coins you collected is your score. The player speeds up over time at predetermined intervals, making the game harder as you progress. The game has a few other mechanics, but they’re not relevant to this post.
The AI created for this game uses supervised learning, meaning it trains on the top players’ behaviors, unlike reinforcement learning which trains by playing and improving itself. While reinforcement learning has constant improvement and doesn’t require storing player data, it also takes much longer to train (the supervised algorithm only needs to run once) and it’s harder to set up and implement.
The training data consists of snapshots of the game recorded every half-second and whenever the player performs an action. This way the algorithm sees how the player reacts to different situations (by jumping, rolling, or doing nothing). After training and deploying the model, actions of the AI are determined by giving the current game state to the deployed model for a prediction of what a player is most likely to do given the situation.
I used the Unity engine for the game and the Google Cloud AI platform for ML, but this works with any engine that can send HTTP requests and ML service (or custom ML code).
Creating the Input
If you’re making this kind of AI, one of the more challenging tasks is determining what to put in the input. To keep everything fast and memory efficient it’s important to condense a game state into only the most helpful information. To record the player’s situation, I stored their current
y position and vertical velocity. Since I wanted the AI to avoid obstacles, I looked at the nearest three bugs and one spike ahead of the player, and stored their
y positions and
x distances from the player (normalized by player speed, see note below). A data point was collected every time the player performed an action as well as every half-second to capture times when the player was doing nothing. The data was stored in a structure in Unity until score submission. Then it was sent to a csv file in Cloud Storage, and preparation for training collects the top 10 players’ files.
The diagram shows the variables collected from the nearest spike and nearest three bugs. The information given to the algorithm includes the
y variables, each
dx variable divided by the player’s speed, as well as the player’s vertical velocity.
Note on normalization: Horizontal spacing of objects scales with speed — compare the two gameplay gifs in the first section. Consider that a jump at high speed covers more horizontal distance despite having the same peak height and airtime. Dividing by speed means the input variable is the time it will take to reach the hazard (e.g. a spike 6 units ahead ÷ a speed of 12 units/second = 0.5 seconds to reach the spike), and this is what’s actually important for timing inputs like jumps.
I used GCloud’s built-in
XGBOOST framework with a classification objective (
multi:softmax), which took around 7 minutes to train each time. More complicated behavior might be better suited to neural networks or regression objectives. Find something appropriate for your input which gives relevant information for the AI’s behavior. For example, I could have used a regression objective with -1 = roll, 0 = do nothing, 1 = jump. I elected not to since the player’s input was not analog (the buttons were either held or not, it wasn’t a 360° control stick), and it wouldn’t handle jumping while rolling at the same time.
Most games will probably only require a one-time training job. However, for the purposes of the demo, we wanted to see the AI improve as players improved and contributed better data, so we re-trained at regular intervals. You can automate this with a shell script (here’s the one I made). It uses the
gcloud setup commands from the AI Platform quickstart, then loops the training and version creation commands (also in the quickstart). This creates a trained model in a Cloud Storage directory, which overrides the model from the previous training job, and creates a new version of the deployed model (whose
job-dir parameter points to the training output directory). You do have to deploy the model manually to create the first version (easy with the cloud console), but afterwards the new versions use the latest training output and set themselves as the default version, so prediction requests will always use the latest model.
Using ML Predictions
The AI player is identical to normal player but keyboard input is disabled, and instead it repeatedly makes requests to the ML model. Requests to a deployed model in the GCloud AI platform are formatted as the input minus the target column (i.e. the action, which is what’s being predicted). It then returns one of the possible target classes, indicating that a player in this game state would be likely to roll, jump, or do nothing (which calls the roll or jump method).
Note on GCloud outputs: Even with though multi-class classification algorithm uses strings for the class column, a quirk of Google Cloud AI is that the prediction returns a float instead of a string, either 0.0 (first class seen in the input), 1.0 (second class seen in the input), 2.0, etc. Since the correspondence between the floats and the classes changes depending on the order the classes are seen in the input, prefix the csv training file with one dummy data line for each class, so the class represented by 0.0 is always jump, 1.0 is always roll, etc.
Running the AI
The earliest versions didn’t get very far. One early version with not enough data jumped every time it reached the ground. The AI quickly improved as it got more data and human players did better, but somewhat plateaued at a high score of 268 on the first day, which was enough to put it in the top 10 leaderboard until it got pushed out around the end of the day. On the second day it kept improving and at the end it reached 370. In comparison, human players typically scored 10–30 on their first run, and the leaderboard scores were around the 200–800 range on day 1 and 600–2000 on day 2 (much higher due to the returning players).
Optimize performance and memory usage.
There was a bug in which I accidentally collected data every frame, which destroyed the framerate and crashed the WebGL build by using up all the memory (even with the relatively simple data points I was collecting). It’s always good to follow standard best practices, like keeping Unity’s Update function as small as possible.
Time to query the model is non-negligible in a fast-paced game.
The AI’s gameplay had short but noticeable pauses while waiting for responses from the model. Under other circumstances it might be possible to send multiple data points at once, since the API takes in a list. I requested one data point at a time because I didn’t know where the AI player was going to be in the future or the future vertical velocity. I also set up the HTTP requests to run asynchronously, but the game moved too fast and the responses wouldn’t come after the AI had already colliding with upcoming hazards.
Because of the aforementioned constraint, I only queried the AI every half-second (more often at higher speeds). This meant that the AI was limited in how often it could perform a new action. I had “stop rolling” counted as its own distinct input, so a pattern which required rolling followed by a pattern which required jumping would sometimes see the AI successfully roll under the first part, then stop rolling, then before another half-second had passed it would run into the second pattern because it hadn’t yet made a new request (which would have told it to jump).
This kind of ML works well and can be used for more complex tasks.
After getting enough data, the AI was consistently good (outside of the issue in the above paragraph, which was a limitation of the implementation rather than the model). I used the ML output in a very direct way by having the AI copy the action predicted for a player, but there are more creative possibilities when you know what the player is likely to do in any situation. If the player always jumps over certain bug patterns instead of rolling, the level generation algorithm could adapt and increase the frequency of a similar pattern with added bugs above, forcing the player to roll. In a game like PAC-MAN or Bomberman, the AI could cut the player off by predicting where they will go next. There are endless applications in every genre, so I encourage you to use ML for something creative in your own projects.
Have you implemented any games? Do you think AI could be trained to play your game? Let me know in the comments!
Special thanks to Tyler Bui-Palsulich and Franzi Hinkelmann as well as Dane Liergaard, Jon Foust, and everyone on the Go and Cloud DPE teams in Google NYC.