There has been significant recent success in creating strong algorithmic agents for large two-player zero-sum games (e.g., for computer poker). However, despite the fact that it has taken decades to get to this point, this is in many ways actually one of the EASIEST problems in this area. The first reason for this is that in two-player zero-sum games, there is a natural and compelling solution concept called Nash equilibrium. In these games, a Nash equilibrium has a special property that it is “unbeatable:” against any opposing strategy it is guaranteed to either win or tie in expectation. (Note that this does not mean that it would be unbeatable in terms of short-term performance: this guarantee is only over the long term, and a significant sample of play is needed to ensure that short-term luck is outlasted). Note that this property does not hold in general for games with more than two players (or in two-player non-zero-sum games). This is true for a variety of reasons. For one thing, several players could collude against the other. However it can still fail even without the possibility of collusion. These games can contain multiple Nash equilibria each giving different payoffs to the players (while in a two-player zero-sum game they all give the same payoffs), and if one agent follows one of them while the other agents follow a different one, the resulting strategies may not be a Nash equilibrium. Furthermore, one player could deviate from a Nash equilibrium and in addition to performing worse himself could also make the first player perform arbitrarily poorly (which cannot happen in two-player zero-sum games).
Thus, at least from a conceptual perspective, trying to play a Nash equilibrium is a clear goal for two-player zero-sum games.
Next, from a computational perspective, there exist “efficient” algorithms for computing a Nash equilibrium in two-player zero-sum games, which leads computer science theoreticians to classify the problem as being “easy” (In computational complexity terms, there exists a “polynomial-time algorithm” for this problem https://en.wikipedia.org/wiki/Time_complexity#Polynomial_time).
On the other hand for games with more than two players, and for two-player non-zero-sum games, the problem of computing a Nash equilibrium is considered by theoreticians to be “hard” (technically it is “PPAD-complete” https://en.wikipedia.org/wiki/PPAD_(complexity)).
Note that this does not mean that Nash equilibrium can be computed quickly in all two-player zero-sum games in practice (or conversely that it cannot be computed for any 3-player games). This only refers to a theoretical worst-case bound as the size of the problem instance gets large.
For two-player zero-sum games, there exists an exact algorithm based on a linear programming formulation (https://en.wikipedia.org/wiki/Linear_programming) that runs in polynomial time and scales to games with 10⁸ game states. However, two-player no-limit Texas hold ’em has around 10¹⁶⁵ game states (https://arxiv.org/abs/1302.7008), and so we need additional approaches beyond the exact linear programming approaches that will scale to much larger problems, likely with a loss of a theoretical guarantee. Many new approaches have been developed over the past decade or so with the goal of approximating equilibrium strategies in games of this magnitude, culminating in agents over the past few years that have achieved superhuman performance in 2-player no-limit Texas hold ’em poker.
While this is a monumental achievement, it is, as stated above, perhaps only one of the easiest problems in this area, due to the fact that there was a clear well-defined goal, and calculating this goal was theoretically “easy” (though of course we know in practice it was far from it).
I will now describe four key open problems that I think addressing will play a pivotal role for the future of the field.
- Identifying new relevant game-theoretic solution concepts (and developing algorithms for computing them).
While Nash equilibrium has emerged as the “central” solution concept in game theory, several others exist. Most of these (though not all) are “refinements” of Nash equilibrium — specific subsets of Nash equilibrium that contain particularly desirable properties (games can contain many — even infinitely many — Nash equilibria, and certain ones can be more desirable than others). Some examples include subgame perfect equilibrium, trembling-hand perfect equilibrium, evolutionarily stable strategies, sequential equilibrium, proper equilibrium, and several others (https://en.wikipedia.org/wiki/Solution_concept). Many of these solution concepts are significantly more challenging to compute (or approximate) than Nash equilibrium (for some of them no algorithms exist at all).
For the no-limit poker domain in particular, I have written a manuscript arguing that the “natural” equilibrium that we would like to compute does not agree with any of these standard game-theoretic solution concepts (http://nebula.wsimg.com/e1b8017843b51376e6c6c2c6fc59d62b?AccessKeyId=4F0E80116E133E66881C&disposition=0&alloworigin=1.)
Note that this problem applies to two-player zero-sum games as well as for the more challenging classes of non-zero-sum and multiplayer games. For all of them, there are the conceptual problems of formulating the correct solution concept (for two-player zero-sum typically this means selecting a particular Nash equilibrium with desirable properties), as well as the algorithmic questions of devising algorithms for approximating them in practice.
2. Strong agents for more than two players (and non-zero-sum games).
As described above, achieving this task is significantly more challenging than for two-player zero-sum games, both from conceptual and computational perspectives. Many real-world situations have more than two players and/or are not zero-sum, and if we desire to create successful agents for them these questions must be addressed.
The very first question I often get from people after describing my research is related to whether the approaches would apply to games with more than two players. In fact, it turns out that several of the techniques used in the course of the research on two-player poker are directly applicable to multiple players (e.g., the abstraction and post-processing algorithms). Some of them can be applied, but with either a loss of theoretical guarantee and/or an increase in runtime (e.g., the counterfactual regret minimization equilibrium-approximation algorithm can still be run on games with more than two players, but has no significant theoretical guarantee for these games). For others it is a bit more complicated. For example, endgame solving can still apply to multiplayer games; however, if the resulting endgames still have multiple players, it may not be feasible to solve them. However, on the positive side, for games that initially have more than two agents, but have endgames that are two-player zero-sum games (which is common in many variants of poker), endgame solving could prove to be critical like it has for two players. For elaboration see slides from my talk “Strong Game-Theoretic Strategies: Beyond Two Agents” given at the MIT poker club http://nebula.wsimg.com/f844b4edd588331b06e523be7e46dcd5?AccessKeyId=4F0E80116E133E66881C&disposition=0&alloworigin=1.
So while several of the approaches devised in the context of creating agents for two-player zero-sum games are still applicable in some form to multiplayer games, they come far short of solving the problem entirely, and new solution concepts and algorithms are both needed to create strong agents.
During the first several years of my PhD I actually worked on algorithms for approximating Nash equilibrium in classes of games with more than two players (before I switched over to working on two-player games to create submissions for heads-up limit and no-limit Texas hold ‘em for the Annual Computer Poker Competition http://www.computerpokercompetition.org/). You can see my early papers “Computing an Approximate Jam/Fold Equilibrium for 3-Player No-Limit Texas Hold’em Tournaments” and “Computing Equilibria in Multiplayer Stochastic Games of Imperfect Information” on my website. These papers presented algorithms that computed provably close approximations of Nash equilibrium strategies in a three-player no-limit Texas hold ’em poker tournament. (It was proven that if the the algorithms converge then the resulting strategies are a Nash equilibrium, but it was not guaranteed that the algorithms would converge at all as they are guaranteed to do in two-player zero-sum games.)
Even if we were able to devise algorithms for computing Nash equilibrium in multiplayer games, there would be no guarantee on their practical performance (as I described above). In fact, many theoretical researchers think that Nash equilibrium is not useful at all in these games for that reason. However, recent experiments I have done demonstrate that agents based on a Nash equilibrium strategy may in fact perform very well in these games despite the fact that they are not guaranteed to do so. As a class project for a class I taught in Artificial Intelligence (http://www.ultimateaiclass.com/) I had all students create agents for a simplified version of poker called 3-player Kuhn poker (read about the class project here https://medium.com/ganzfried-gleans/3-player-kuhn-poker-project-ce1b818b4d5e). They created a wide variety of agents. However, my subsequent experiments showed that Nash equilibrium agents actually outperformed the class project submissions. (See more details in my recent journal article “Successful Nash Equilibrium Agent for a Three-Player Imperfect-Information Game” http://nebula.wsimg.com/98cac451471a03c8ddef1f21cafc7349?AccessKeyId=4F0E80116E133E66881C&disposition=0&alloworigin=1).
3. Approaches for robustly integrating opponent exploitation with game-theoretic analysis.
While Nash equilibrium and the other game-theoretic solution concepts assume that all the agents are “rational,” in practice many opposing agents are irrational and may make significant mistakes that we would like to capitalize on. As a simple example, against a Rock-Paper-Scissors opponent who has just played Rock for the first 20 games iterations, you would likely want to play Paper more often than the Nash equilibrium (which would place 1/3 weight on each action). Thus, even for two-player zero-sum games, we can obtain significantly higher performance by learning to exploit opponents’ mistakes than by just following a Nash equilibrium. However, for many reasons it is generally much more challenging to create strong opponent exploitation algorithms for large games (particularly games of imperfect information) than it is to approximate Nash equilibrium strategies (in fact, several of the strongest opponent exploitation algorithms assume that approximations of Nash equilibrium strategies have already been created as a subroutine). As one example, I created an algorithm for opponent modeling in large imperfect-information games I showed significantly outperformed an approximate Nash equilibrium strategy against several opposing agents in two-player limit Texas hold ’em (as for the three-player results described above, this project was prior to working on the two-player Nash equilibrium approximation agents such as Tartanian7 and Claudico). The approach essentially started by assuming that the opponent was following the approximate equilibrium strategy, and improved the model over time based on observations of how the opponent’s play differed from the prior strategy (essentially estimating that the opponent was playing the “closest” strategy to the prior that agreed with the observations so far). This paper was called “Game Theory-Based Opponent Modeling in Large Imperfect-Information Games” and is available here (incidentally this is my most-cited paper so far).
One caveat of opponent exploitation algorithms is that they are susceptible to potentially being counterexploited by a strong and deceptive opponent. I have also developed algorithms that in certain contexts are able to exploit opponents in way that is “safe” and guarantees a strong performance even against these deceptive opponents.
More recently I have also been working on algorithms for exploitation in imperfect-information games where we observe the opponent’s public actions but not private information (e.g., private cards in poker), assuming natural prior strategy distributions that are Dirichlet or Uniform. http://nebula.wsimg.com/7f7a6c301f94576b58a9b2384deccb61?AccessKeyId=4F0E80116E133E66881C&disposition=0&alloworigin=1
Thus the goal here is to develop approaches that on the one hand are able to effectively capitalize on mistakes of irrational opponents, but at the same time are robust enough to ensure that they cannot get counterexploited in turn and perform poorly in the event that the learning model is wrong (or due to observational or sampling noise). Different approaches may be required based on the setting at hand (i.e., depending on the amount of historical data available on the opponent or whether is there is any available at all, depending on the degree of observability of the opponent actions and/or information, assumptions on whether the opponent is following a static or dynamic strategy, and a variety of other factors).
It is worth noting that Problem #1 identified above (computing new Nash equilibrium refinements) can also be viewed as a form of opponent exploitation, in that we are attempting to compute strategies that while in equilibrium will perform well against certain systematic deviations of the opponents from equilibrium.
4. Approaches for computing strategies that are human understandable.
The final problem that I think is worth singling out is one that I expect will prove to be increasingly important as game theory is being applied to large-scale settings where humans (as opposed to automated agents) are ultimately making the final decisions. For example, as game-theoretic algorithms are being deployed for national security applications, ultimately it is police officials who must decide the precautions to be taken against a particular suspect. For humans to make effective real-time strategic decisions, we cannot expect them to parse massive files of strategies containing probability distributions. Instead we need rules that are easily understood by humans.
In an article titled “Computing Equilibria by Incorporating Qualitative Models” (again written before I started working on the two-player no-limit Texas hold ’em agents), I came up with an approach that takes advantage of human-understandable qualitative strategy models to improve equilibrium computation. I applied this to two-player limit Texas hold ’em and showed that equilibrium strategies for the river betting round always conformed to one of three easily understandable qualitative structures. http://nebula.wsimg.com/a79717a299f4ecdcb6cf4ce7043695f6?AccessKeyId=4F0E80116E133E66881C&disposition=0&alloworigin=1
More recently I have incorporated approaches from machine learning (specifically algorithms for learning decision trees) to create a more general approach for computing human understandable strategies, that I have applied to deduce two simple and fundamental rules of poker strategy (for when a player should make an extremely large “all-in” bet, or when a player should make a very small bet/never bet at all). http://nebula.wsimg.com/0f239345d8d8479aa8196b50a249f509?AccessKeyId=4F0E80116E133E66881C&disposition=0&alloworigin=1
(In a side project I have showed that the structure of optimal strategies in a natural poker variant has a strong visual resemblance to my current home state https://www.twoplustwo.com/magazine/issue163/sam-ganzfried-optimal-poker-strategies-florida.php.)
As game-theoretic models are being applied to more and more domains where humans are making the ultimate decisions, it will become increasingly important to augment large strategy files with a concise representation that can be easily interpreted by the human decision maker, preferably one that is visually appealing.
Having written this list, I would also like to say there have been (and continue to be) many important applications of computational game theory beyond poker, and in fact identifying and modeling these application domains is a very critical part of the research in this field. These applications include national security, cybersecurity, political science, medicine, education, e-currencies, and many others (in fact my research has already been applied to and/or cited by papers on many of these topics). The list of problems I provided is intended to describe what I believe to be the most important theoretically foundational questions without regard to any particular application domain at the moment (though which I expect to apply to many of them). It is of course very important to continue both theoretical and applied work on these questions on a variety of domains spanning many applications.
For further information about the problems identified in this list and related topics, please see the materials on my website http://www.ganzfriedresearch.com/, including the game theory class that I taught http://www.bestgametheoryclass.com/.