SyncedReview
Published in

SyncedReview

Open AI Dota 2 Bots Get Leaner & Meaner

The Dota 2-playing bot team “AI OpenAI Five” have already demonstrated expert-level performance in the popular video game at Internet scale and have even learned effective human–AI cooperation skills. Now they’re gotten even stronger, as detailed by Open AI researchers in the new paper Dota 2 with Large Scale Deep Reinforcement Learning .

Open AI committed to its Dota 2 project about three years ago, and this April their bot team beat 2018 Dota 2 world champions Team OG. But the learning curve did not stop there — Open AI has since trained a new agent, “Rerun,” which has notched a 98 percent win rate against the OpenAI Five.

Dota 2 is a multiplayer online battle arena video game in which two teams of five players compete to collectively destroy the “Ancient” home base of their opponents whilst defending their own. The game presents various challenges for AI systems to deal with, such as long time horizons, imperfect information, and continuous state-action spaces. The key to solving such a complex environment was to scale existing reinforcement learning systems to unprecedented levels using thousands of GPUs over multiple months.

What’s impressive about Rerun — as also seen in the DeepMind AlphaSeries — is that model performance has increased while training time and compute requirements have decreased.

In advance of last year’s showdown with Team OG, the OpenAI Five trained on the equivalent of 10,000 years of self-play over a 10-month period. Researchers also used custom “surgery” tools about every two weeks to enable the resumption of bot training after improvements to the strongest version with minimal loss in performance and in a shorter time compared to the typical practice of retraining each new version from scratch.

“If we had trained from scratch after each of our twenty major surgeries, the project would have taken 40 months instead of 10,” note the researchers in their paper. As AI systems tackle larger and harder problems in the real world, research on enabling AI models to deal with more complex and dynamic environments will be critical.

Rerun’s training was based on the OpenAI Five’s final settings, which sped things up. Training was completed in two months, without any surgeries, and required only 20 percent of the resources used to train the OpenAI Five.

The OpenAI Five and now Rerun demonstrate that successfully scaled up, modern reinforcement learning techniques can achieve superhuman performance in competitive e-sports games. OpenAI has always stressed that its long-term goal is to tackle general and real-world problems, and that video game environments are platforms for research toward the development of artificial general intelligence (AGI).

The paper Dota 2 with Large Scale Deep Reinforcement Learning is available here.

Journalist: Yuan Yuan | Editor: Michael Sarazen

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

--

--

--

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Synced

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

More from Medium

DALL-E (Zero-Shot Text-to-Image Generation) -PART(2/2)

The Sequence Scope: The ML Hardware Virtualization Layer

Google Trains a 540B Parameter Language Model With Pathways, Achieving ‘Breakthrough Performance’

Notes on Abstractive Summarization: PegasusXSUM and T5