DeepMind’s AlphaStar wins StarCraft II games against pros

Vo Chi Cong

Published in

green-bamboo

3 min readJan 25, 2019

AlphaStar: Mastering the Real-Time Strategy Game StarCraft II | DeepMind

StarCraft, considered to be one of the most challenging Real-Time Strategy games and one of the longest-played esports…

deepmind.com

deepmind/pysc2

StarCraft II Learning Environment. Contribute to deepmind/pysc2 development by creating an account on GitHub.

github.com

DeepMindのゲームAI「AlphaStar」、StarCraft IIでプロゲーマーに完勝。しかし条件平等化で1敗を喫す - Engadget Japanese

Google(Alphabet)傘下のAI開発子会社Deepmindは、囲碁AI「AlphaGo」で人間のチャンピオンを破った後、こんどはAIでコンピューターゲームに挑戦することにし、リアルタイムストラテジーゲーム「StarCraft…

japanese.engadget.com

Trí tuệ nhân tạo AlphaStar đã đánh bại con người trong tựa game chiến thuật StarCraft vô cùng phức…

Các nghiên cứu cho thấy rằng có rất nhiều người dân tại Mỹ lo sợ công việc của họ sẽ bị đe dọa thay thế bởi robot và…

genk.vn

Highlights

StarCraft, considered to be one of the most challenging Real-Time Strategy (RTS) games and one of the longest-played esports of all time, has emerged by consensus as a “grand challenge” for AI research.
AlphaStar plays the full game of StarCraft II, using a deep neural network that is trained directly from raw game data by supervised learning and reinforcement learning.
the neural network architecture applies a transformer torso to the units, combined with a deep LSTM core, an auto-regressive policy head with a pointer network, and a centralised value baseline.
AlphaStar to learn, by imitation, the basic micro and macro-strategies used by players on the StarCraft ladder. This initial agent defeated the built-in “Elite” level AI — around gold level for a human player
seed a multi-agent reinforcement learning process. A continuous league was created, with the agents of the league — competitors — playing games against each other
takes the ideas of population-based reinforcement learning further, creating a process that continually explores the huge strategic space of StarCraft gameplay, while ensuring that each competitor performs well against the strongest strategies, and does not forget how to defeat earlier ones.
each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays
The neural network weights of each agent are updated by reinforcement learning from its games against competitors, to optimise its personal learning objective. The weight update rule is an efficient and novel off-policy actor-criticreinforcement learning algorithm with experience replay, self-imitation learning and policy distillation.
a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent. During training, each agent experienced up to 200 years of real-time StarCraft play. The final AlphaStar agent consists of the components of the Nash distribution of the league — in other words, the most effective mixture of strategies that have been discovered — that run on a single desktop GPU.
AlphaStar had an average actions per minute (APM) of around 280, significantly lower than the professional players, although its actions may be more precise
During the matches against TLO and MaNa, AlphaStar interacted with the StarCraft game engine directly via its raw interface, meaning that it could observe the attributes of its own and its opponent’s visible units on the map directly, without having to move the camera
subsequent to the matches … AlphaStar chooses when and where to move the camera, its perception is restricted to on-screen information, and action locations are restricted to its viewable region …MaNa defeated a prototype version of AlphaStar using the camera interface, that was trained for just 7 days. We hope to evaluate a fully trained instance of the camera interface in the near future.
AlphaStar’s success against MaNa and TLO was in fact due to superior macro and micro-strategic decision-making, rather than superior click-rate, faster reaction times, or the raw interface.
agents were trained to play StarCraft II (v4.6.2) in Protoss v Protoss games, on the CatalystLE ladder map
AlphaStar could be useful in solving other problems. For example, its neural network architecture is capable of modelling very long sequences of likely actions — with games often lasting up to an hour with tens of thousands of moves — based on imperfect information.

DeepMind’s AlphaStar wins StarCraft II games against pros

AlphaStar: Mastering the Real-Time Strategy Game StarCraft II | DeepMind

StarCraft, considered to be one of the most challenging Real-Time Strategy games and one of the longest-played esports…

deepmind/pysc2

StarCraft II Learning Environment. Contribute to deepmind/pysc2 development by creating an account on GitHub.

DeepMindのゲームAI「AlphaStar」、StarCraft IIでプロゲーマーに完勝。しかし条件平等化で1敗を喫す - Engadget Japanese

Google(Alphabet)傘下のAI開発子会社Deepmindは、囲碁AI「AlphaGo」で人間のチャンピオンを破った後、こんどはAIでコンピューターゲームに挑戦することにし、リアルタイムストラテジーゲーム「StarCraft…

Trí tuệ nhân tạo AlphaStar đã đánh bại con người trong tựa game chiến thuật StarCraft vô cùng phức…

Các nghiên cứu cho thấy rằng có rất nhiều người dân tại Mỹ lo sợ công việc của họ sẽ bị đe dọa thay thế bởi robot và…

Highlights

Written by Vo Chi Cong