Trends in Machine Learning Algorithm Training: Gaming Environments

Bryan House
Deep Sparse
Published in
4 min readJul 31, 2019
Photo: Unsplash (Element 5 Digital)

Ever since IBM’s Deep Blue beat world chess champion Garry Kasparov in 1997, AI researchers have been training algorithms to beat humans at their own games. In recent years, games have provided a fertile training ground for reinforcement learning — a method in which algorithms repeatedly try different techniques to master a task, and are rewarded (or reinforced) for good behavior.

It’s easier for an algorithm to win a game if they’re fed all the rules in advance. But, it’s significantly harder to compete against humans in real-time video games, master unpredictable virtual environments, or win at games with zero knowledge at the outset.

Here are some of the latest trends in training gaming algorithms, and what they may mean for the future of machine learning.

3D Gaming Environments

One of the challenges of training algorithms for the real world lies in the fact that the real world is so unpredictable. One way to combat that uncertainty is to train algorithms in virtual worlds, or 3D gaming environments like Unity or Unreal.

Using reinforcement learning techniques, researchers and companies like Google’s DeepMind are using these environments to test out how a computer may learn from a physics-realistic environment. In a simulation shown to Fast Company, an algorithm driving a dog in the Unity environment “trained” it to fetch a stick, starting with learning to walk, and repeatedly trying again until the dog successfully retrieved the stick.

Other initiatives, including OpenAI Five, use reinforcement learning to defeat real-time, multiplayer human teams in popular gaming environments. OpenAI Five defeated a professional team at Dota, a complex esports game played between two teams of five players. The game’s environment changes about every two weeks, as the system gets an update. By learning the game at a rate of approximately 180 years per day, OpenAI Five was able to choose between 1,000 possible actions based on the moves made by its human opponents.

The theory is that if algorithms can train in these types of unpredictable simulated environments, they may be able to achieve more accurate results in the inference phase in the real world.

Learning Games from Scratch (and Facing Stiff Competition)

Many AI researchers have successfully used reinforcement learning to train algorithms on increasingly more complex games with unknown variables or loosely defined rules.

A popular example is 2017’s Libratus AI project from Carnegie Mellon, which defeated a set of expert human challengers Heads Up, No Limit Texas Hold’em Poker. Unlike prior experiments that focused on games where players know the exact state of the game at all times, Libratus was dealing with a game famous for bluffing and unknowns. Libratus was successful because it broke the game down to increasingly finer-grained sub-games, which made “reasoning” computations simpler to crack.

To contrast, DeepMind’s AlphaGo used both supervised learning and reinforcement learning techniques to teach an algorithm to defeat the world’s best Go player. In the supervised learning phase, the algorithm was fed data based on 150,000 human games of Go, distilling the moves that would give it the highest probability of winning. In the second reinforcement phase, the computer would play against itself, making micro-adjustments based on the outcomes.

The next phase of the program, AlphaGo Zero, was intended to learn on its own from scratch, using only the rules of the game as a foundation. Instead of learning from humans, the algorithm learns from itself, discovering new, successful outcomes that may not have even been figured out by humans before. Unlike its predecessor, AlphaGo Zero remembers the outcomes of each tree search it conducted to determine outcomes for each move (and eventually the game).

In essence, this model can choose the best outcome based on many possibilities, which could be extremely effective for applications such as protein folding. An updated DeepMind algorithm based on AlphaGo called AlphaFold recently won a recent protein folding contest, predicting the structure of complex, three-dimensional shapes.

Up Next: Creating the Game

If algorithms are so great at mastering human games, what’s next? Potentially, creating their own.

Using a form of creative mimicry, this recent Georgia Institute of Technology study trained an algorithm to create new games based on classic Nintendo video games. After learning the basic constructs and rules of the game, the algorithm creates “game grids” and designs for new game concepts. Although creative AI is still in its earliest stages, researchers on this project estimate that these types of algorithms could serve as assistants to humans in generating creative outcomes (or in extreme cases, take the lead).

New breakthroughs in the area of game creation are happening faster and faster. For example, NVIDIA recently created the first AI-generated imagery for a video game using Unreal Engine. However, it still takes an inordinate amount of computing power to make AI-rendered graphics realistic and fast enough for gaming purposes (and raises all sorts of ethical questions around the implications of fake imagery generated by machine learning algorithms).

We’re fascinated with where these algorithm training techniques take the field of machine learning in the future — particularly how some of these gaming models will be used in real-world inference use cases. It’s certainly a trend we’ll be tracking in the future on our blog, so please follow along!

….

Neural Magic is powering bigger inputs, bigger models, and better predictions. The company’s software lets machine learning teams run deep learning models at GPU speeds or better on commodity CPU hardware, at a fraction of the cost. To learn more, visit www.neuralmagic.com.

--

--

Bryan House
Deep Sparse

Startups, baseball, my family and Ozzie the rescue dog. Chief Commercial Officer, Neural Magic