How We Built an AI to Play Street Fighter II — Can you beat it?
We’ve spent a lot of time exhibiting at and attending various developers conferences, and last week we attended Samsung Developer Conference (SDC). One thing we’ve always found is that it is easy to have a boring booth; if people want to know about your product, the internet has made the traditional free t-shirt + product flyer obsolete. For SDC, we knew we didn’t want a boring booth — after all, we had to be at the booth ourselves for two full days! So we did the obvious thing: Used Gyroscope’s AI to play and win at Street Fighter II Turbo on SNES, and then held a tournament between all the characters that Gyroscope learned how to play.
Gyroscope’s AI doesn’t normally play videos games, nor did we have a SNES SDK. So, before the conference, we figured out how to extract game information from inside Street Fighter II Turbo, built the Gyroscope SNES SDK, then pitted the Gyroscope AI against in-game bots in thousands of games while we tweaked the AI for this special application. At the conference, we held a Final Four style single-elimination bracket of each character. We asked the conference attendees to pick which character they thought would win; those that picked correctly participated in a raffle for an SNES Classic. Our AI performed admirably and two attendees walked away with a new SNES Classic!
What follows below are the details of the AI and the event. If you want to compete against our AI, either with another AI or as a human and learn what happens next, sign-up!
Building the AI
First, we had to figure out what problem we were actually solving. We cast the problem of playing Street Fighter II as a reinforcement learning problem (one of the problem types that Gyroscope’s AI supports). In a reinforcement learning problem, the AI observes the world, selects an action to take, and receives a reward for that action. The AI’s goal is to maximize its reward over time given what it has observed in the past by taking optimal actions. Before we could start applying our AI, we needed to define the observations (i.e., what the AI “sees”), actions, and rewards for Street Fighter II.
You can think of these as what the AI “sees” in the environment. When a human looks at the game, they see each character, they see them jumping, moving, kicking, etc. They also see their health meter and the timer. We needed to distill this information into a format the AI can understand, a format called the “observation space”. In reinforcement learning, there are two commons ways to think of the observation space. The traditional approach is to measure specific signals that, we, as humans, believe are pertinent to the problem at hand. The modern approach is to give an AI images of the environment after each action and let it determine the important elements in the image. This modern approach is often considered the better approach because its more generic and makes less assumptions about feature importance. However, this approach often requires longer training time. Given time constraints, we chose the traditional approach and defined the observation space by hand.
Specifically, we defined the observation space as:
- X and Y coordinates of each player
- Health of each player
- Whether each player is jumping
- Whether each player is crouching
- Move ID for each player
- Absolute difference in X and Y coordinates between players
- Game clock
Note that this observation space is huge! There are trillions, if not more, of unique observations.
Once the AI observes the environment, it must act. The simplest way to characterize the actions available are by considering the buttons on a Super Nintendo controller: Up, Down, Left, Right, A, B, X, Y, L, R. A single action, then, is a combination buttons being pressed. If we consider every possible combination of button presses, that would create 1024 (2¹⁰) possible actions. That is a lot of possible actions! It would take a while for an AI to learn which actions work and which do not, though the AI would eventually learn. However, any Street Fighter II player knows that not all buttons can be pressed at all times. Further, many moves evolve over sequences of button presses.
Another way to consider the action space is the set of moves available (e.g., high kick, throw, uppercut, etc). The AI could select a move and then we would translate that move into button presses. Determining the moves for each character would take a while (lots of googling and playing) and would be required for each unique character. Again, for sake of training time, we simplified the action space to the combination of one directional control press and one button control press (e.g., “Up + A” or “L”), with each press being optional. This formulation reduced the action space to 35 possible actions. Note that more advanced moves and combos can still evolve over time, but they were left to the AI to discover!
Finally, once an action is taken, the AI receives a reward. When humans play a game, they have a general perception about how well they doing, which is supported by things like health level and damage dealt. AIs need that perception boiled down to a single number (usually) so they can maximize it. We selected health gap in each frame as the reward. So, at each observation, the AI receives a reward equal to the health gap between the players. For example, if the AI acts by kicking the opponent for 10 damage, the health gap after will be 10 and the AI will be awarded that amount. If the AI does not take an action after the next observation, it will still be awarded another 10 for doing “nothing”. Why? Because it has maintained that health gap. Alternatively, if the AI is kicked and does not block, the health gap decreases. In fact, the gap can be negative and that’s a sign that things aren’t going well for the AI.
AI for AI
What we’ve discussed above is the final formulation of the problem we used in the competition. We also tweaked parameters in our AI system. Gyroscope’s proprietary AI is an algorithm of algorithms. It figures out which algorithm to use for each problem. With so much information in-hand about the Street Fighter problem, we short-circuited that loop and selected DQN as the reinforcement learning methodology, with several modifications, most notably, the absence of an image-based observation space. DQN uses a model to predict which actions are optimal to take instead of testing and knowing every possible action given every possible observation — doing such an exploration is nearly impossible given the size of the observation space. In another post, we’ll discuss the model in detail along with alternatives and show their effect on the performance of the AI.
The emulator glue
Before we could train the AI, we had to connect it to Street Fighter. Gyroscope is accessible via SDKs for iOS and Unity. We did not (yet!) have an SNES SDK, so we needed to find tools that could help us instrument an SNES game such that we could use our technology to the play those games. Fortunately, the tool assisted speedrun community — the folks who try to win a game as fast as possible, often by going frame by frame through a game looking for bugs in the game that allow them to skip ahead — has built amazing tools for interacting with classic game consoles.
It is not just the emulator we needed; we also needed tools around the emulator core. We found BizHawk, which supported many emulator cores, including the SNES cores. BizHawk gave us a number of critical tools:
- A Lua language scripting interface that gave us frame-by-frame control of games;
- A suite of console memory watching tools which lets one inspect the game memory (either all of it or specific addresses);
- The ability to run with no speed throttling and no display showing, thereby maximizing the frame rate of the game;
- The BizHawk source code.
For Street Fighter specifically, the Lua interface allowed us to send joypad button presses, read button presses, read memory locations, and control the core emulator. The memory inspector gave us the ability to read the health of our opponents, read the moves the opponent is making, and other data that is required for our observations. Note that we only used signals that a human player has; we didn’t let the AI know anything a human doesn’t know.
Honestly, we can’t say enough good things about BizHawk. Not only is the product first-class, but the source code is extremely clean, readable, and extensible. It was a pleasure to work with this codebase — the source code became critical later, as you’ll see.
Reading the RAM: Finding the observations in SNES WRAM
We knew we’d need to figure out a few critical pieces of data to make our observation space:
- The X & Y positions of the players
- The health of the players
- What move the player was doing (e.g., kind of punch, kick, throw, or special move)
- The amount of time left on the fight clock
These are all the things a human knows when playing the game. We made an educated guess that these values were in the SNES RAM somewhere.
The SNES memory layout is well documented, and there’s not a lot of game RAM to look through. We used the BizHawk tools to monitor the change in RAM values between frames in order to find addresses that changed when we took actions like pressing left on a controller. It took us a few hours but we ended up locating all the data locations specified earlier. We were able to create a mapping from RAM to observation that looked like:
public int get_p1_health()
public int get_p2_health()
And so on. This code let us access these values between frames and build a data structure of the entire game observation.
First try: Write the Gyroscope SDK in lua
BizHawk embeds a Lua scripting engine in the application and exposes a number of emulator functions to this engine. So, it was logical that the first thing we tried was to write our Gyroscope SDK in Lua. We wrote a Lua library for accessing all the memory locations that are later translated into an observation and for sending joypad presses to the emulator.
But, how to get the data out of Lua and into Gyroscope? The Lua interface doesn’t support any network I/O! Given that our service runs in the cloud, that was a big problem. The only I/O we could do from Lua was file I/O or SQLite I/O.
We wrote some python code to read a game observation from a file written by Lua and send it to Gyroscope, but it was very hard to synchronize with Lua and getting the actions (button presses) back to Lua was buggy. Plus, it was super slow, even after we moved the files to a RAM disk. We tried the same thing with SQLite but ran into the same speed problems.
At this point, we decided to move the SDK code from Lua to a native BizHawk tool; these tools are written in C#, the language that all of BizHawk is written in. We kept the python code we had written because it gave us an easy interface to our service (which speaks gRPC) and it provided synchronization between AI players playing each other (making sure they are on the same frame and so on). We called this python the EmulatorController.
Got it: Doing it all in C#
BizHawk provides an easy C# interface to implement tools that control various aspects of the game and emulator. We used this interface when porting our Lua code to C# and quickly had a working tool for manipulating Street Fighter in C#.
In C#, we had access to all of the .NET libraries, so we quickly got a socket connection up to our EmulatorController code. For each frame, we grabbed an observation from the game, sent this observation to the EmulatorController, and the controller would consult the Gyroscope AI, and sent the emulator back the action (buttons) that should be pressed in the next frame.
We now had a working method of running Street Fighter II as fast as the host machine could, of sending game observations to Gyroscope, and of getting back actions for which controller buttons to press. We also had the ability to synchronize two AI bots playing against each other. It was time to train!
Putting it all together: Training the AI
With observations, actions, and rewards defined, along with the AI connected to SNES, we were ready. We trained our AI against the built-in game bot. We trained each character for around 8 hours or ~3000 matches. Our hypothesis was that a well-trained AI would (1) maximize reward, and (2) as a consequence, have a reasonably high win-percentage near the end of training.
Because playing Street Fighter is an entirely novel use of our service, we assumed we would have to do some tuning — our AI doesn’t usually optimize for these sort of quick rewards nor control such extensive action spaces. Over the course of two fun weekends, we tried many variations of the observation space, action space, reward function, and DQN parameters until we had an AI with a high win percentage.
Beyond standard model tuning techniques and good science (i.e., changing one thing at a time), the key discovery we made was around the uneven weighting of directional control presses vs. button controls presses. We found that directional controls, in a single frame, cause very little change in the game; however, button controls, once pressed, caused significant change in the game over a series of frames. For example, a punch takes many frames to happen. What this meant was that an action in one frame of the game could evolve over many subsequent frames. Further, button presses, while incredibly important, required much more frequent pressing to yield value. To overcome this game behavior, and to make the AI behavior more human-esque, we had the button presses repeat for 20 frames (or ⅓ of a second) before the AI took its next action. Rewards were accumulated over those 20 frames. Stated another way, we had the AI take actions and make observations every ⅓ second of game-time instead of at every rendered frame.
A common question asked is why we didn’t have a “win” as the reward function. In short, it creates a delayed reward, which makes training much more difficult and lengthy. The health gap was a reasonable heuristic that we believed would lead to wins — and, it did.
When we began training, our AI behaved randomly and won ~20% of the time against a 3 star opponent (Street Fighter has a star-based rating system). So, 20% win rate is the baseline we had to beat to know the AI was working at all. In the end, the AI reached a 90% win rate against the in-game 3 star bots! For the simplistic setup we chose, and the short training time, we were excited with its performance. Further, we expected that longer training sessions would reach an even higher win rate, but would potentially overfit for the specific bot against which the AI trained. For the tournament, we stopped training at ~80% win-rate to avoid that situation.
Windows batch scripting is the worst
Once our AI began winning, we started training it with every character character available in Street Fighter II. In order to train each, we used Google Cloud Platform to spin up many Windows Server 2016 instances (BizHawk builds best on Windows), and then wrote an unfortunate number of .bat scripts to get all the training working. The training required automating player selection, game resetting, model recording, progress plotting via some R scripts, and other such functions. We added a number of command line options to BizHawk to make it easier to automate.
At the conference: Fight!
At the conference, we setup our booth to show four AI battles, each with two AI-controlled characters fighting each other. We also setup the tournament bracket — seeding the bracket with pairs of characters that were not fighting on the booth display.
We put jars with each character picture on them and gave attendees raffle tickets. The attendees placed their raffle ticket in the jar of the character they thought would win the tournament; when the tournament ended, we picked a ticket from the winning character’s jar and the holder of that ticket won an SNES Classic! We also ran a display showing the training phase so that attendees could see how the Gyroscope AI works.
At 4:30pm on each day we ran the bracket. We’d run one test game and then one real match.
Day one tournament: M.Bison is OP
Guile v Vega: Guile got utterly destroyed. Vega’s AI had quickly learned to close the gap and duck and stab, and to jump over any special moves. Vega advanced.
Blanka v M. Bison: M.Bison is OP. His special attack is almost impossible to block, and, as such, M.Bison advanced.
Chun-Li v Sagat: Chun-Li is also a close-up fighter — her speed and low attacks won against Sagat’s long reach and frequent special move use. Chun-Li advanced.
Balrog v Dhalsim: This was fascinating — Dhalsim spent most of his time in the air, using his long legs to beat Balrog. Dhalsim advanced.
Vega v M.Bison: M.Bison’s attack was too strong. M.Bison goes to the finals.
Chun-Li v Dhalsim: Dhalsim did far more damage from the air, easily defeating Chun-Li.
M.Bison v Dhalsim: Look, basically M.Bison’s character is too strong to legitimately compete. M.Bison takes the tournament!
Day two tournament: E.Honda Shakes things up
During the second day, we re-seeded the starting matches, removing M.Bison from the tournament (overnight he was caught abusing performance-enhancing drugs in the form of cheat codes). We added in E. Honda, who had done terrible in the test matches.
Two fights really stood out on the second day: (1) Vega v Sagat, a drawn out battle in which Vega dodged Sagat’s special move no less than three times while approaching Sagat (twice by well timed ducks and once by jumping(!) over the fireball), and (2) the final, E. Honda v Sagat. Sagat beat E.Honda in the finals in an amazing battle that took their health bars almost to zero before Sagat got a winning hit. E. Honda even getting that far was lucky — we replayed 100 games of E. Honda v Sagat and E. Honda only won 11. Amazing!
What’s next? Play our AI!
Think your AI can beat our AI? We didn’t get to keep any of the SNES Classics, so we’re having an AI-bot tournament where the winner gets an SNES Classic. If you’re interested in joining, send us an email to firstname.lastname@example.org or sign-up. For entrants, our emulator modifications (https://github.com/GyroscopeHQ/BizHawk) will allow you to setup things (README.md coming soon).
We had a lot of requests to play our AI, which is very obvious in hindsight but something we didn’t implement before the show. If you’re a skilled SF II player and want to fight our AI, get in touch! We’d love to have a livestream of a human v AI SF II matches at the Gyroscope offices.
Get Gyroscope for your game: Increase monetization
What is the Gyroscope? Gyroscope is an AI-based technology that helps mobile game and mobile app developers maximize monetization. We do this via an SDK (for Unity or iOS) that automatically collects observation data from the app and sends that data to the Gyroscope AI. The AI then predicts when to trigger an action (normally a monetization event). The action happens at the time our AI thinks the user is most likely to want to see the monetization event. This automated action system means the developer can increase the lifetime value of the player but doesn’t have to think about the timing of ads or IAP prompts, and the player isn’t overwhelmed by ads or IAP prompts and remains happy and engaged.
Special thanks to the BizHawk developers for creating such an amazing platform.
Thanks also to Samsung NEXT for getting us space at Samsung Developer Conference. If you’re a startup in high tech looking for a great investor, reach out to Samsung NEXT.
Thanks to Alexandra Escobar, our Operations Manager, for all of the logistics and heavy lifting.
Want to know more details? Interested in working with us? Have some fun suggestions? You can reach us on Twitter at @GyroscopeHQ, on Facebook at https://www.facebook.com/GyroscopeSoftware/, and via email at email@example.com