Diambra — How to deploy you first AI agent in Street Fighter III (tutorial)

DIAMBRA
10 min readJan 5, 2025

--

Step by step video teaching you how to code and deploy your AI Agent

We are about to launch our first event with $DIAMB rewards:

Want to get prepared for the event? Keep reading to find out more!

P.S.: Not only devs will earn $DIAMB… Spectator during the final live-streamed on Twitch? Maybe 😉.

1. Introduction

We’ll guide you step-by-step to set up your environment, code and deploy your very first AI agent in the iconic Street Fighter 3 — right from your Windows machine.

Whether you’re an AI enthusiast or a newbie, this tutorial will empower you to blend the thrill of gaming with Reinforcement Learning.

Explore the Diambra project here: https://diambra.ai/
Discord, if you have any questions : https://discord.com/invite/diambra

All scripts in a zip file : google drive

2. Setting Up the Environment

Install Docker desktop

Download, install and open Docker Desktop by following
https://docs.docker.com/desktop/setup/install/windows-install/

⚠️ You need to have Docker open for all the next steps.

Create a GitHub Repository

  1. Go to https://github.com/
  2. Create a new repository called MyFirstAgent.
  3. Clone your repository to work on it locally.

python version 3.9.7 & numpy version 1.23

  1. Download Python 3.9.7 from https://www.python.org/downloads/release/python-397/
  2. Add Python to your system path during installation.
  3. Verify the installation by running the following in the command line:
python --version

4. Update pip and install the required version of NumPy:

python -m pip install --upgrade pip
pip install numpy==1.23
python -c "import numpy; print(numpy.__version__)"

Ensure Python is 3.9.7 and NumPy is 1.23 before proceeding.

Install Diambra Libraries

Install the necessary Diambra libraries:

python -m pip install diambra
python -m pip install diambra-arena
pip install diambra-arena[stable-baselines3]

Create a Diambra Account

Sign up at https://diambra.ai/. You’ll need your credentials later.

3. Launching Diambra

Download the Street Fighter 3 ROM

'STREET FIGHTER III 3RD STRIKE: FIGHT FOR THE FUTUR [JAPAN] (CLONE)', 'street-fighter-iii-3rd-strike-fight-for-the-futur-japan-clone', '106255', 'wowroms'

Prepare the ROM

  1. Create a folder named roms in your Git project directory.
  2. Place the downloaded sfiii3n.zip file into this folder.

Create a Random Agent Script

In your project, create a file called gist.py with the following code:

#!/usr/bin/env python3
import diambra.arena

def main():
# Environment creation
env = diambra.arena.make("sfiii3n", render_mode="human")

# Environment reset
observation, info = env.reset(seed=42)

# Agent-Environment interaction loop
while True:
# (Optional) Environment rendering
env.render()

# Action random sampling
actions = env.action_space.sample()

# Environment stepping
observation, reward, terminated, truncated, info = env.step(actions)

# Episode end (Done condition) check
if terminated or truncated:
observation, info = env.reset()
break

# Environment shutdown
env.close()

# Return success
return 0

if __name__ == '__main__':
main()

Run Your Script

Execute the following command in your project directory with a cmd:

diambra run -r absolute\path\to\the\roms\ python gist.py

Replace absolute\path\to\roms\ with the actual path to your ROMs folder.

4. Start training

Prepare Configuration Files

  1. Create the following folders in your project cfg_files/sfiii3n

2. Add a configuration file named sr6_128x4_das_nc.yaml with the following content:

folders:
parent_dir: "./results/"
model_name: "sr6_128x4_das_nc"

#'Alex', 'Twelve', 'Hugo', 'Sean', 'Makoto', 'Elena', 'Ibuki', 'Chun-Li', 'Dudley', 'Necro', 'Q', 'Oro', 'Urien', 'Remy', 'Ryu', 'Gouki', 'Yun', 'Yang', 'Ken', 'Gill'
settings:
game_id: "sfiii3n"
step_ratio: 6
frame_shape: !!python/tuple [128, 128, 0]
continue_game: 0.0
action_space: "discrete"
characters: "Ryu"
difficulty: 4
outfits: 1

wrappers_settings:
normalize_reward: true
no_attack_buttons_combinations: true
stack_frames: 4
dilation: 1
add_last_action: true
stack_actions: 12
scale: true
exclude_image_scaling: true
role_relative: true
flatten: true
filter_keys: ["action", "own_health", "opp_health", "own_side", "opp_side", "opp_character", "stage", "timer"]

policy_kwargs:
#net_arch: [{ pi: [64, 64], vf: [32, 32] }]
net_arch: [64, 64]

ppo_settings:
gamma: 0.94
model_checkpoint: "0"
learning_rate: [2.5e-4, 2.5e-6] # To start
clip_range: [0.15, 0.025] # To start
#learning_rate: [5.0e-5, 2.5e-6] # Fine Tuning
#clip_range: [0.075, 0.025] # Fine Tuning
batch_size: 256 #8 #nminibatches gave different batch size depending on the number of environments: batch_size = (n_steps * n_envs) // nminibatches
n_epochs: 4
n_steps: 128
autosave_freq: 512
time_steps: 1024

Let’s talk about the settings we can play with

  1. Game Settings:
  • You can change the character your AI agent controls by modifying the characters field in the settings (e.g., replace "Ryu" with another character from the list provided above).
  • outfits: Defines the character's outfit (1 for default).

2. Model Training Settings :

  • stack_frames: The number of past frames the AI agent uses as context for decision-making. A higher value provides more context but slows training.
  • stack_actions: The number of previous actions the AI considers when deciding its next move. Higher values provide context but may introduce noise.
  • filter_keys: Specifies the information the AI receives about the game state (e.g., health, position, etc.).

For more information on the wrapper settings, refer to the
https://docs.diambra.ai/wrappers/

Create the Training Script

Next, create a script called training.py in your project directory. Copy the following code into the file:

import os
import yaml
import json
import argparse
from diambra.arena import load_settings_flat_dict, SpaceTypes
from diambra.arena.stable_baselines3.make_sb3_env import make_sb3_env, EnvironmentSettings, WrappersSettings
from diambra.arena.stable_baselines3.sb3_utils import linear_schedule, AutoSave
from stable_baselines3 import PPO

# diambra run -s 8 python stable_baselines3/training.py --cfgFile $PWD/stable_baselines3/cfg_files/sfiii3n/sr6_128x4_das_nc.yaml

def main(cfg_file):
# Read the cfg file
yaml_file = open(cfg_file)
params = yaml.load(yaml_file, Loader=yaml.FullLoader)
print("Config parameters = ", json.dumps(params, sort_keys=True, indent=4))
yaml_file.close()

base_path = os.path.dirname(os.path.abspath(__file__))
model_folder = os.path.join(base_path, params["folders"]["parent_dir"], params["settings"]["game_id"],
params["folders"]["model_name"], "model")
tensor_board_folder = os.path.join(base_path, params["folders"]["parent_dir"], params["settings"]["game_id"],
params["folders"]["model_name"], "tb")

os.makedirs(model_folder, exist_ok=True)

# Settings
params["settings"]["action_space"] = SpaceTypes.DISCRETE if params["settings"]["action_space"] == "discrete" else SpaceTypes.MULTI_DISCRETE
settings = load_settings_flat_dict(EnvironmentSettings, params["settings"])

# Wrappers Settings
wrappers_settings = load_settings_flat_dict(WrappersSettings, params["wrappers_settings"])

# Create environment
env, num_envs = make_sb3_env(settings.game_id, settings, wrappers_settings,render_mode="human")
print("Activated {} environment(s)".format(num_envs))

# Policy param
policy_kwargs = params["policy_kwargs"]

# PPO settings
ppo_settings = params["ppo_settings"]
gamma = ppo_settings["gamma"]
model_checkpoint = ppo_settings["model_checkpoint"]

learning_rate = linear_schedule(ppo_settings["learning_rate"][0], ppo_settings["learning_rate"][1])
clip_range = linear_schedule(ppo_settings["clip_range"][0], ppo_settings["clip_range"][1])
clip_range_vf = clip_range
batch_size = ppo_settings["batch_size"]
n_epochs = ppo_settings["n_epochs"]
n_steps = ppo_settings["n_steps"]

if model_checkpoint == "0":
# Initialize the agent
agent = PPO("MultiInputPolicy", env, verbose=1,
gamma=gamma, batch_size=batch_size,
n_epochs=n_epochs, n_steps=n_steps,
learning_rate=learning_rate, clip_range=clip_range,
clip_range_vf=clip_range_vf, policy_kwargs=policy_kwargs,
tensorboard_log=tensor_board_folder)
else:
# Load the trained agent
agent = PPO.load(os.path.join(model_folder, model_checkpoint), env=env,
gamma=gamma, learning_rate=learning_rate, clip_range=clip_range,
clip_range_vf=clip_range_vf, policy_kwargs=policy_kwargs,
tensorboard_log=tensor_board_folder)


# Print policy network architecture
print("Policy architecture:")
print(agent.policy)

# Create the callback: autosave every USER DEF steps
autosave_freq = ppo_settings["autosave_freq"]
auto_save_callback = AutoSave(check_freq=autosave_freq, num_envs=num_envs,
save_path=model_folder, filename_prefix=model_checkpoint + "_")

# Train the agent
time_steps = ppo_settings["time_steps"]
agent.learn(total_timesteps=time_steps, callback=auto_save_callback)

# Save the agent
new_model_checkpoint = str(int(model_checkpoint) + time_steps)
model_path = os.path.join(model_folder, new_model_checkpoint)
agent.save(model_path)

# Close the environment
env.close()

# Return success
return 0

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--cfgFile", type=str, required=True, help="Configuration file")
opt = parser.parse_args()
print(opt)

main(opt.cfgFile)

Run the Training Script

To start training, run the following command in your terminal:

diambra run -r absolute\path\to\the\roms\ python training.py --cfgFile absolute\path\to\the\cfg_files\sfiii3n\sr6_128x4_das_nc.yaml

Replace absolute\path\to\roms\ and absolute\path\to\cfg_files\ with the actual paths to your ROMs folder and configuration file, respectively.

If you have enough RAM, you can train your agent using parallel environments by adding -s 4 to the command to utilize 4 environments simultaneously. You can also scale this up to 8 or 16 environments if your hardware supports it.

diambra run -s 4 -r absolute\path\to\the\roms\ python training.py --cfgFile absolute\path\to\the\cfg_files\sfiii3n\sr6_128x4_das_nc.yaml

5. Save and Continue Training

Once you have started your training, the progress will be saved in the following directory structure: results\sfiii3n\sr6_128x4_das_nc\model

The model will be saved in a ZIP format. After the first training session, you should see a file named 1024.zip. This represents the checkpoint after 1024 time steps.

Modifying the Training Duration

If you wish to adjust the training duration or the frequency of checkpoints, you can modify the following parameters in your configuration file (cfg_files/sfiii3n/sr6_128x4_das_nc.yaml):

batch_size: 256
n_epochs: 4
n_steps: 128
autosave_freq: 512
time_steps: 1024

1. batch_size: 256

  • Meaning: Defines the number of samples processed in a single training step.
  • Context: In machine learning or reinforcement learning, a “batch” is a subset of data used to compute updates to the model. A larger batch_size can speed up computation if sufficient hardware resources are available, but it requires more memory..

2. n_epochs: 4

  • Meaning: The number of times the entire dataset is iterated over during training.
  • Context: With n_epochs = 4, the model will see each sample in the dataset four times in total. Too few epochs might result in underfitting, while too many could lead to overfitting.

3. n_steps: 128

  • Meaning: Number of consecutive steps collected or processed in each episode (e.g., in a simulation or game environment).
  • Context: n_steps defines how long interactions are observed before reinforcement updates or strategy evaluations occur.

4. autosave_freq: 512

  • Meaning: Frequency (in steps or iterations) at which the model’s progress is automatically saved.
  • Context: Every 256 iterations, the current state of the model is saved. This ensures that, in case of an interruption (e.g., a crash), you lose no more than 256 steps of computation.

5. time_steps: 1024

  • Meaning: Total number of time units or frames used in a simulation, episode, or training process.
  • Context: time_steps represent the total duration of an episode before resetting or learning new data.

Checkpointing and Resuming Training

  1. After your first training session, a model checkpoint named 1024.zip will be saved in:
results/sfiii3n/sr6_128x4_das_nc/model

2. To resume training or continue from this point:

Update the model_checkpoint value in your configuration file (sr6_128x4_das_nc.yaml) to the name of the latest checkpoint file, without the .zip extension. For example:

model_checkpoint: "1024"

3. Each time you train, the model will generate a new ZIP file for the checkpoint (e.g., 2048.zip, 1536.zip, etc.). Always update the model_checkpoint to the latest file before starting a new training session.

Training Tips

1. Training Duration:

  • For a high-quality AI agent, it’s recommended to train for at least 48 hours. This allows the model to learn complex strategies and adapt better to the game environment.

2. Checkpoint Management:

  • Regularly check your saved models and keep backups of significant milestones (e.g., after 10,000 steps or after achieving a good performance).

3. Adjust Parameters:

  • You can experiment with the parameters (batch_size, n_steps, time_steps, etc.) to find the optimal settings for your system and the complexity of the game.

6. Evaluate your model

Once you’ve trained your model and saved its progress, you may want to evaluate it to understand its performance and behavior in the game. Evaluation is essential to measure how well your AI agent plays and to identify areas for further improvement.

Create the Evaluation Script

In your project directory, create a new Python script named evaluate.py.

import os
import yaml
import json
import argparse
from diambra.arena import load_settings_flat_dict, SpaceTypes
from diambra.arena.stable_baselines3.make_sb3_env import make_sb3_env, EnvironmentSettings, WrappersSettings
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3 import PPO

# diambra run -s 8 python stable_baselines3/training.py --cfgFile $PWD/stable_baselines3/cfg_files/sfiii3n/sr6_128x4_das_nc.yaml

def main(cfg_file, model_file):
# Read the cfg file
yaml_file = open(cfg_file)
params = yaml.load(yaml_file, Loader=yaml.FullLoader)
print("Config parameters = ", json.dumps(params, sort_keys=True, indent=4))
yaml_file.close()

# Settings
params["settings"]["action_space"] = SpaceTypes.DISCRETE if params["settings"]["action_space"] == "discrete" else SpaceTypes.MULTI_DISCRETE
settings = load_settings_flat_dict(EnvironmentSettings, params["settings"])

# Wrappers Settings
wrappers_settings = load_settings_flat_dict(WrappersSettings, params["wrappers_settings"])

# Create environment
env, num_envs = make_sb3_env(settings.game_id, settings, wrappers_settings,render_mode="human")

env.render_mode="human"

print("Activated {} environment(s)".format(num_envs))

agent = PPO.load(model_file)

# Evaluate the agent
# NOTE: If you use wrappers with your environment that modify rewards,
# this will be reflected here. To evaluate with original rewards,
# wrap environment in a "Monitor" wrapper before other wrappers.
mean_reward, std_reward = evaluate_policy(agent, env, deterministic=False, n_eval_episodes=10)
print("Reward: {} (avg) ± {} (std)".format(mean_reward, std_reward))

# Run trained agent
observation = env.reset()
cumulative_reward = 0
while True:
env.render()

action, _state = agent.predict(observation, deterministic=False)
observation, reward, done, info = env.step(action)

cumulative_reward += reward
if (reward != 0):
print("Cumulative reward =", cumulative_reward)

if done:
observation = env.reset()
break

# Close the environment
env.close()

# Return success
return 0

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--cfgFile", type=str, required=True, help="Configuration file")
parser.add_argument("--modelFile", type=str, required=True, help="Model file")
opt = parser.parse_args()
print(opt)

main(opt.cfgFile, opt.modelFile)

Make sure your configuration file (cfg_files/sfiii3n/sr6_128x4_das_nc.yaml) has the correct model_checkpoint value set to the most recent checkpoint file.

Run the Evaluation Script

To evaluate your model, execute the following command in your terminal:

diambra run -r absolute\path\to\the\roms\ python evaluate.py --cfgFile absolute\path\to\the\cfg_files\sfiii3n\sr6_128x4_das_nc.yaml --modelFile absolute\path\to\results\sfiii3n\sr6_128x4_das_nc\model\model.zip

Replace absolute\path\to\roms\ and absolute\path\to\cfg_files\ and absolute\path\to\result\sfiii3n\sr6_128x4_das_nc\model\model.zip with the actual paths to your ROMs folder and configuration file, respectively.

7. Make your first submission

After successfully training and evaluating your AI agent, the next step is to submit it to Diambra for evaluation or participation in competitions. This section explains how to prepare and make your submission

1. Create an access token for diambra

To allow Diambra to access your repository and evaluate your model, you need to generate a personal access token.

  • Go to “Settings” in the top-right corner of the GitHub website.
  • Click “Developer settings” at the bottom-left of the page.
  • Click “Personal access tokens”, “Fine-grained Token” and then “Generate new token”.
  • Give your token a name, select the necessary scopes (e.g., “repo” for accessing private repositories), give all access, and click “Generate token.”
  • Copy the generated token and save it somewhere safe, as you won’t be able to see it again.

2. Create your submission manifest

The submission manifest is a YAML file that tells Diambra how to evaluate your AI agent. Follow these steps:

  1. Create a file named submission-manifest.yaml in your project directory.
  2. Add the following content, replacing placeholders with your information:
mode: AIvsCOM
image: diambra/arena-stable-baselines3-on3.10-bullseye:main
command:
- python
- "/sources/submissionagent.py"
- "--cfgFile"
- "/sources/cfg_files/sfiii3n/sr6_128x4_das_nc.yaml"
- "--trainedModel"
- "/sources/results/sfiii3n/sr6_128x4_das_nc/model/model.zip"
sources:
.: git+https://[User]:{{.Secrets.token}}@github.com/[User]/[ProjectName].git#ref=Master

3. Create your submission script

This script defines how Diambra will load your trained model and interact with the game environment during evaluation.

  • Create a file named submissionagent.py in your project directory.
#!/usr/bin/env python3
import os
import yaml
import json
import diambra.arena
from stable_baselines3 import PPO
from diambra.arena import SpaceTypes, Roles, EnvironmentSettings,load_settings_flat_dict
from diambra.arena.utils.gym_utils import available_games
from diambra.arena.stable_baselines3.make_sb3_env import make_sb3_env, EnvironmentSettings, WrappersSettings
import random
import argparse

def main(cfg_file, trained_model, test=False):
# Read the cfg file
yaml_file = open(cfg_file)
params = yaml.load(yaml_file, Loader=yaml.FullLoader)
print("Config parameters = ", json.dumps(params, sort_keys=True, indent=4))
yaml_file.close()

base_path = os.path.dirname(os.path.abspath(__file__))
model_folder = os.path.join(base_path, params["folders"]["parent_dir"], params["settings"]["game_id"],
params["folders"]["model_name"], "model")

# Settings
params["settings"]["action_space"] = SpaceTypes.DISCRETE if params["settings"]["action_space"] == "discrete" else SpaceTypes.MULTI_DISCRETE
settings = load_settings_flat_dict(EnvironmentSettings, params["settings"])
settings.role = Roles.P1

# Wrappers Settings
wrappers_settings = load_settings_flat_dict(WrappersSettings, params["wrappers_settings"])
wrappers_settings.normalize_reward = False

# Create environment
env, num_envs = make_sb3_env(settings.game_id, settings, wrappers_settings, no_vec=True)
print("Activated {} environment(s)".format(num_envs))

# Load the trained agent
#model_path = os.path.join(model_folder, trained_model)
agent = PPO.load(trained_model)

# Print policy network architecture
print("Policy architecture:")
print(agent.policy)

observation, info = env.reset()

while True:
action, _state = agent.predict(observation, deterministic=False)
observation, reward, terminated, truncated, info = env.step(int(action))

if terminated or truncated:
observation, info = env.reset()
if info["env_done"] or test is True:
break

# Close the environment
env.close()

# Return success
return 0

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--cfgFile", type=str, required=True, help="Configuration file")
parser.add_argument("--trainedModel", type=str, default="model", help="Model checkpoint")
parser.add_argument("--test", type=int, default=0, help="Test mode")
opt = parser.parse_args()
print(opt)

main(opt.cfgFile, opt.trainedModel, bool(opt.test))

4. Duplicate your model

duplicate and rename model.zip your last model in your folder result/sfiii3n/sr6_128x4_das_nc/model

5. Push all your work on github

Ensure that the following files and folders are correctly structured and push in your GitHub repository:

  • The model_checkpoint file (e.g., 512.zip) in the appropriate results folder.
  • The configuration file (sr6_128x4_das_nc.yaml) in cfg_files/sfiii3n.
  • All necessary code scripts, including submissionagent.py, and any others you’ve used.

6. Submit you AI agent

Now that everything is prepared, you can submit your agent using the following command in your terminal:

diambra agent submit --submission.secret token=[token] --submission.manifest submission-manifest.yaml

Replace [token] with the token you generated earlier.
Follow the link to see how your submission is doing

8. What is next ?

If you want to go further in the personalization, you can have a look at the docs : https://docs.diambra.ai/

Join a community of AI developers on discord : https://discord.com/invite/diambra

--

--

DIAMBRA
DIAMBRA

Written by DIAMBRA

Code and deploy your AI agents in iconic fighting games. Diambra: https://diambra.ai/

No responses yet