RL Coach and Open VINO with StarCraft2

Abhishek Nandy
Intel Software Innovators
10 min readJul 1, 2019

Introduction:-

In this article we will see that how Starcraft 2 environment is used in Reinforcement Learning Coach. We will setup the Starcraft2 environment first with installing the game setup. We will see how exactly the Deepmind’s model works. We will use coach to create a bot that is supposed to play the game perfectly. As we train the environments we will save the checkpoints and these models will be the basis for using the Open VINO Toolkit optimizer. From the optimizer we will generate both the xml and bin files for inference. We have also saved the gif files while training and inference is done in order to show the optimized simulation for the training process. We will touch on RL Coach dashboard.

The flow for Starcraft 2 RL coach and Open VINO toolkit

In the next figure we will see the flow for RL Coach and Open VINO toolkit for Starcraft 2.

Star Craft 2 Flow

Let us discuss the flow:

  • According to the flow we will first setup Starcraft2 IDE for the Reinforcement Learning process.
  • Next we need to connect Reinforcement learning Coach with Starcraft 2 API. We find the algorithms that are supportive for usage of reinforcement learning coach they are A3C and Dueling DDQN.
  • The general motive for the game is to mine minerals.
  • After that we start training process in Reinforcement Learning Coach, saving your model as checkpoints.
  • Now from the saved model we initialize the Optimizer and from that we create an xml and bin file for further inference.
  • The inference is done on the training we did and the last training process that we performed.

Starcraft 2 environment analysis

We have this Starcraft2 model released which is as an open source application that helps in training Reinforcement Learning models.

With the release of the open source tool we can use various scenarios in StarCraft 2 as Reinforcement Learning Environments.

Starcraft 2 AI model is testing scenario for training the AI, it uses the concepts of Deep Reinforcement Learning.

We will work on training the Starcraft 2 model using Reinforcement Learning Coach and optimize it using Open VINO toolkit.

We will be training Deep Q Model with Dueling DQN and A3C on the collect minerals Shard Game of Deepmind’s Starcraft 2 AI environment.

The general scenario for the game is to collect mineral shards.

How the Deep Q learning approach evolved

Deepmind’s first attempts to run the simulations were based on Atari Games.

Deepmind’s team created an algorithm that is known as Deep Q Learner and used this algorithm to beat any Atari Game.More details are shown below

Deepmind combined two different ideas of Machine Learning. Deepmind combined the idea of deep learning which in this case is learning about features. It will learn necessary features which are labels. We then use CNN to know about dense features from game scene windows. From the game screen windows they got were pixels and learnt dense representations. After training it got converted to an output which was either up, down, left or right arrow keys values.

We can get more details of Deepmind in the following link below(https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/)

The flow of the process as per Deepmind for Atari Games is shown in the figure below.

Deepmind Game flow

It not only took sensory input(movements) from the game and also used what is called Q learning. Q Learning is a type of Reinforcement Learning which generalizes into creating a Q matrix. The Q learning algorithm describes how to pick an action from strategy related to game policy. We pick the action and observe it in the game. We will see that if we get a reward, +1, and -1 for not getting the reward and based on the results we then update the Q matrix.

Alpha Go

The following figure shows how Deepmind worked on an AI that has the capability to beat grandmasters for the Chinese game Go.

More details on Alpha Go can be found here

AlphaGO Cheat sheet

The game is so complex that the search space for the game is very vast for an AI to brute force through all the options. There are so many combinations of game states that it was too hard for an AI to compute.

Deepmind’s algorithm was so good that it was successful in beating the grand master for the game Go.

The game Alpha GO uses 2 different Neural networks.

One was for policy network

One was for value network

Both of them computed two different values.

Using the reinforcement Learning approach for policy and value the algorithm had faster access through gigantic tree search.

The tree search is called as Monte Carlo tree search.

Monte Carlo Tree search simulates a search tree and Artificial Intelligence selects and action at each time step based on the action value and prior probability which is the output of the policy network and some exploration parameter. So it uses the values of policy network and the value network as guides to help it search through the tree of possible moves that the game can play in every time steps and then trained Alpha Go to compete at its best to become an expert.

StarCaft2 principles

StarCraft2 Game

When we are building the strategy for Starcraft 2 to make an AI obviously we are thinking in terms of building a world class Starcraft 2 player.

The key considerations for the game are:

  • When we should speed our wealth gaining process
  • How we should build our army

Deepmind in collaboration with Blizzard’s entertainment gave rise to this opensource version of Starcraft 2 known as PySC2. PySC2 has following components:

It has an api wrapper completely written in python. It comprises of datasets of anonymous replays and consists of minimum Reinforcement learning games.

Steps required to install PySC2

Installing PySC2 is shown in the following link

After the entire installation process is complete we have to test the installation

From the terminal we will write this command

python -m pysc2.bin.agent --map CollectMineralShards

It will start the ide and the process.

We get the following response in terminal

pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
I0116 22:15:50.053973 140673773569792 sc_process.py:110] Launching SC2: /home/abhi/StarCraftII/Versions/Base55958/SC2_x64 -listen 127.0.0.1 -port 21071 -dataDir /home/abhi/StarCraftII/ -tempDir /tmp/sc-bmdze23u/ -displayMode 0 -windowwidth 640 -windowheight 480 -windowx 50 -windowy 50
I0116 22:15:50.056916 140673773569792 remote_controller.py:163] Connecting to: ws://127.0.0.1:21071/sc2api, attempt: 0, running: True
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/abhi/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 21071 -dataDir /home/abhi/StarCraftII/ -tempDir /tmp/sc-bmdze23u/ -displayMode 0 -windowwidth 640 -windowheight 480 -windowx 50 -windowy 50'
Starting up...
Startup Phase 1 complete
I0116 22:15:51.060845 140673773569792 remote_controller.py:163] Connecting to: ws://127.0.0.1:21071/sc2api, attempt: 1, running: True
Startup Phase 2 complete
Creating stub renderer...
Listening on: 127.0.0.1:21071 (21071)
Startup Phase 3 complete. Ready for commands.
I0116 22:15:52.063097 140673773569792 remote_controller.py:163] Connecting to: ws://127.0.0.1:21071/sc2api, attempt: 2, running: True
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface enabled
Configure: score interface enabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
The game in progress

After closing the application or terminal now we will look at the list of RL coach presets for the environment that are available for Starcraft 2, they are

Starcraft_CollectMinerals_A3C

Starcraft_CollectMinerals_Dueling_DDQN

RL Coach and OpenVINO

In this section we will see how the RL Coach and StarCraft 2 environment will work together for the optimization process

Let us start with A3C algorithm option

We will pass the following command for training and saving the mode(We are saving the model only so that the meta file is generated and is accessed for the model optimization process as well as dumping gif files for the training process)

coach -r -p Starcraft_CollectMinerals_A3C -s 300 -dg

The below content is obtained in the terminal window.

Creating graph - name: BasicRLGraphManager
Version: B55958 (SC2.3.16)
Build: Jul 31 2017 13:19:41
Command Line: '"/home/abhi/StarCraftII/Versions/Base55958/SC2_x64" -listen 127.0.0.1 -port 17180 -dataDir /home/abhi/StarCraftII/ -tempDir /tmp/sc-jn9_1iuh/ -displayMode 0 -windowwidth 640 -windowheight 480 -windowx 50 -windowy 50'
Starting up...
Startup Phase 1 complete
crStartup Phase 2 complete
Creating stub renderer...
Listening on: 127.0.0.1:17180 (17180)
Startup Phase 3 complete. Ready for commands.
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface enabled
Configure: score interface enabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Training process

We see in the folder that the checkpoint files are saved according to the progress.

Checkpoint files

For model optimization process we will again go to the deployment folder of OpenVINO and use the mo_tf.py for creating the xml and the bin file.

python mo_tf.py --input_meta_graph ~/experiments/16_01_2019-22_27/checkpoint/40_Step-40715Model Optimizer arguments:Common parameters:- Path to the Input Model:     None- Path for generated IR:     /opt/intel/computer_vision_sdk_2018.4.420/deployment_tools/model_optimizer/.- IR output name:     40_Step-407151.ckpt- Log level:     SUCCESS- Batch:     Not specified, inherited from the model- Input layers:     Not specified, inherited from the model- Output layers:     Not specified, inherited from the model- Input shapes:     Not specified, inherited from the model- Mean values:     Not specified- Scale values:     Not specified- Scale factor:     Not specified- Precision of IR:     FP32- Enable fusing:     True- Enable grouped convolutions fusing:     True- Move mean values to preprocess section:     False- Reverse input channels:     FalseTensorFlow specific parameters:- Input model in text protobuf format:     False- Offload unsupported operations:     False- Path to model dump for TensorBoard:     None- List of shared libraries with TensorFlow custom layers implementation:     None- Update the configuration file with input/output node names:     None- Use configuration file used to generate the model with Object Detection API:     None- Operations to offload:     None- Patterns to offload:     None- Use the config file:     NoneModel Optimizer version:     1.4.292.6ef7232d

The xml and the and bin file generated is being used for the inference part for OpenVINO .

We have successfully created the necessary files needed to implement in the inference part.

Inferring using our model

As we have generated the xml and the bin for final inference we have to share them with parameter –m the path to the xml bin file generated as well as using the Algorithm for the Reinforcement learning approach using –i option so that we can run the simulation with best performing checkpoints from the Reinforcement Learning coach that is generated with build target setup for CPU.

./rl_coach -m <xmlbin path> -i <algorithm> -d CPU./rl_coach -m 0060.xml -i A3C -d CPU

As we run the inference we will be able to pull up the best possible result for StarCraft 2 game.We are saving a gif for each results that we get so in this case the best result of all the gifs file is shown.

The gifs below show the progress before training and after training.

Before Training
After Training

Dashboard

Now we will look at additional feature of RL Coach known as dashboard.

Debugging RL algorithms is a tough process. But RL Coach comes in with a built in tool named dashboard that helps visualize the training signals. The important point on using dashboard is that it is dynamically updated during Agent training.

It also allows comparing of signals such as overlaying one on another.

Let us see how dashboard works

We can also open up dashboard files directly from terminal using the following command.

dashboard -f ~/experiments/Cartpole_A3C/21_05_2019-00_05/worker_0.simple_rl_graph.main_level.main_level.agent_0.csv

It directly opens the csv in gui mode.

When have trained or it is an ongoing process for environment a csv and json file is saved in the experiments folder.

When we initialize RL dashboard using the dashboard command in terminal we first have to select the csv file for more visualization and how the algorithm works

We connect to the anaconda environment where we have installed coach and put the command as discussed above.

Dashboard

As soon the coach command start working the tool will open in a browser.

Dashboard view

We will go to the experiments folder and select the csv file as shown in the figure.

Dashboard with csv files

After that we will be in the interactive dashboard window.

Dashboard view

The dashboard is internally built with Bokeh interactive visualization library.

We can select from data options to see the training signals.

Training signal

We can also compare two signals while using the control key from keyboard and selecting more than one signal with it.

Comparing two signals

Conclusion

In the second part of the article we have covered how StarCraft 2 works

We have seen how StarCraft 2 environment is initialized for training process

We have touched on some interesting algorithms for Deepmind.

We then installed Starcraft 2 AI client

We configured Starcraft 2 client to work with RL Coach.

We initiated the training process with checkpoint files as well as gif files generated together at periodic intervals

The generated checkpoint files which were later optimized with Intel OpenVINO Toolkit.

For inferencing we passed on the xml file and the latest simulation is being shown.

Finally, we touched on dashboard which is visual tool for coach. We touched upon important features of it.

This article guides us how to get started with StarCraft 2 environment with RL Coach and OpenVINO.

--

--

Abhishek Nandy
Intel Software Innovators

Chief Data Scientist PrediQt |Intel Certified oneAPI Instructor|Thinker|Innovator