Games Machines Play

Todd Moses
Fintech with Todd
Published in
8 min readMay 26, 2018

In professional education there is a move toward learning games. These are board games designed to teach or enforce a specific skill. The reason for the resurgence of learning by play is that it works.

Children and even baby animals learn by playing. Turns out, adults do too. Philippe Kruchten, Engineering Consultant, designed a game to teach agile principals called Mission to Mars. In this game, participants group into teams and compete against other teams using the agile principals. When first introduced at a training conference, many participants continued playing after the exhibit closed.

According to a 2018 panel at Stanford’s Graduate School of Education, “games help us develop non-cognitive skills, which are as fundamental as cognitive skills in explaining how we learn and if we succeed.” They concluded that these non-cognitive skills, how one behaves, is better suited to a game context than a traditional classroom and is a far greater predictor of success than even IQ scores.

People during game play create a strategy to acquire a reward. Be it a badge, top score, or access to the next level. Players may fail their first few attempts. Learning from their mistakes, they become better the more they learn. It is in this learning that they go from loosing to winning. All due to new skills acquired through playing the game.

In 1993, I was introduced to SimCity in a Sociology class as Auburn University. A game designed by the famous Will Wright, who later created the Sims. Here, one is a type of city planner and mayor all in one. Each move requires the expansion of city services and zoning areas of land as either residential, commercial, or industrial.

As more areas are zoned with services, the residents begin construction. The better one is at planning, the larger the population becomes and this funds new projects through taxation. Make the tax rate too high and people leave or worse yet, riot and burn down structures. Lower the tax rate too much and growth of the city is halted. Players are rewarded with statues, parades, and features in the daily newspaper for consistent good work.

This game and other like it are excellent teachers of working with systems. A city is a system of government, people, and commercial enterprises. All must work together for the city to work. While I never became a city planner, my job as an engineer involves systems and the lessons learned with the game have remained a part of my professional life.

How Machines Play

It is very difficult to beat a computer chess game. Even the Radio Shack handheld games from the 1980’s were difficult on the highest levels. The reason is the game tree. Here a machine can examine all moves, then all counter-moves to those moves, then all moves countering them for each move made by the human opponent.

While a novice like myself can get stuck in an endless loop of move countermove with the computer. A Chess Grand Master can discover a tiny crack in the logic and confuse the system enough to win. The reason is that computers do not play like people as they are limited to knowledge of the game — without notion of human behavior.

In late 2017, the journal Nature published an article on AlphaGo. This is the Deep Learning machine Google used to win against top ranked Go players. Many media outlets ran with the story and it became common knowledge. The reason was not that a machine beat a human player. What made this so spectacular was how the machine learned to play.

Traditional systems that use a search tree over all possible moves do not work for Go. This is because Go has a very large number of moves, actually more moves than the number of atoms in the universe, and no known method to evaluate the strength of each move.

Instead, AlphaGo uses a search tree with neural networks. Each network takes a description of the board as input then processes through millions of network layers. According to the AlphaGo team, “One neural network, the policy network, selects the next move to play. The other neural network, the value network, predicts the winner of the game.”

This goes way beyond your Mac OS chess game. AlphaGo began by playing thousands of games against itself, learning from it’s mistakes, and improving slightly each iteration through a system of rewards known as Reinforcement Learning.

Reinforcement Learning

Games work because they reward behavior. The reward of status within a community is enough to keep people hooked for months or even years on a single game. It is so powerful that there is a real concern over video game addiction.

Reinforcement learning or RL is a machine learning technique that borrows from behavioral psychology. Unlike supervised learning, the focus of RL is on finding a balance between exploration and exploitation. That is do you use the knowledge you have or explore for other options.

How RL works is the system receives an observation. It then selects an action from a set of options. The system’s environment moves to a new state and the reward associated with the transition is determined. The goal of the system is to collect as much reward as possible.

The result of the system is compared with another optimal outcome, giving the system a notion of regret. Thus the system must reason about the long term consequences of it’s actions. Something most people do poorly. This makes Reinforcement Learning well-suited to problems that include a long-term over a short-term reward trade-off.

AlphaGo did not have access to all of the available moves of Go. Instead, it played against really good players and formulated new strategies based upon it’s past mistakes. In 2016, this process proved itself as AlphaGo received a 9 dan professional ranking (the highest certification). What is most shocking is some of the moves AlphaGo made in its professional level matches were entirely new. Thus teaching Go masters new knowledge about their 3000 year old game.

Game Theory

Made famous by the 2001 Ron Howard movie, A Beautiful Mind, Game Theory consists of studying mathematical models of conflict and cooperation. In cooperative games, the focus is on predicting what coalitions will form, the joint actions of the group, and the resulting payoffs. Conflict methods analyze how bargaining procedures will affect the distribution of payoffs within such coalitions.

My team recently built a Machine Learning system that uses Game Theory to measure the effectiveness of business meetings. We take into account both coalitions in conflict and cooperation based upon n number of individual turns. Then make an assessment as to how beneficial the meeting was then giving analysis of the event in a searchable format.

We used a continuos game consisting of a finite number of moves. All centered around a theory that meetings offer near textbook laboratories for game theory. In each one, people are either in conflict or cooperation based upon one or more decisions. Each taking turns with their comments and questions that constitute a single move within the game.

While individuals can demonstrate leadership in guiding meeting. It takes group cooperation for a meeting to become effective. An interesting study would be one where multiple AI based systems are given the opportunity to work together or in competition with other ones for a given goal. How would the Jeopardy winning Watson of IBM work with Google’s AlphaGo?

These are the types of next steps for AI. Enabling multiple system to work toward a common goal as cooperators or competitors. That may solve problems far more complex than any single system is able. However, it may become a dystopian nightmare.

Trading as a Game

Financial trading is a game. Players choose to act in conflict or cooperation with other players in order to maximize their profits. There is a reward for optimum behavior and regret for poor performance. However, there is the aspect of randomness that makes trading different from other strategy games like Go and Chess in that sometimes there is no forceable way to win.

AlphaGo has proven that machines are capable of discovering unknown elements to games. As in the Go match where the computer created new strategies never before realized. The question remains of can a machine create a winning strategy when a large degree of randomness is part of the game?

There is hope from a 1950’s addition to Game Theory by mathematician Lloyd Shapley called a Stochastic Game. As one may guess from the title, it is designed to handle random states within games.

A Stochastic Game is played in a sequence of stages with each stage of the game starting at some state. The players select actions and receive a payoff that depends on both the current state and actions selected. Afterward, the game moves to a new random state whose distribution depends on the previous state and actions selected by the participants. This is repeated for either a finite or infinite number of stages.

This type of game very closely resembles those of financial markets. The question is now, can a machine create a winning strategy within a Stochastic Game? The caveat may come from the market itself as there is an unknown number of players.

Should one consider the entire market as a single player in conflict within a Stochastic Game? However, there will still be times when optimal strategies for either player does not exist. This is a starting point and not a solution.

Conclusion

People and machines both learn by playing games due to the improvement of non-cognitive skills. Systems like AlphaGo have proven that Deep Learning is capable of creating highly effective strategies from such games. The question is, can a Stochastic Game be incorporated into a Deep Learning system to develop highly probably trading strategies?

It all appears highly promising. However, like most of technology, it is likely that the first 80% of such a system is easily managed. It is the next 20% that could take a decade or more of research to complete.

--

--