3-player Kuhn poker project

In my first post https://medium.com/ganzfried-gleans/how-undergraduate-ai-should-be-taught-85c9827719b I described my new undergraduate class on artificial intelligence (AI) that I recently taught at Florida International University and how it compared to classes taught at other universities, such as Carnegie Mellon, Duke, and Princeton. One of the novel components of that class was a project where the students were asked to create an agent for a simplified form of poker called 3-player Kuhn poker. They could pick a group of up to 3 students (though were free to work alone as well, which many students chose to do). They were evaluated in 3 ways: powerpoint presentation in class, project report (2–4 pages), and an experimental competition. The grade breakdown was 20% for the presentation, 40% for the paper, and 40% for the competition.

3-player Kuhn poker is one of the forms of poker that has been included in the annual computer poker competition, which is typically held at the AAAI conference http://www.computerpokercompetition.org/index.php/competitions/rules/75-limit-games. The game has a 4-card deck containing an Ace (A), King (K), Queen (Q) and Jack (J). Each of the players first antes 1 “chip,” and then each player is dealt one card from the deck (the fourth card is not dealt to anyone). The first player then has the option to check, or bet an additional chip. When facing a bet, a player can call the bet or fold. That is, only a single bet is allowed by any player (i.e., players can not “raise” over a previous bet). At showdown, the highest card wins the entire pot. The ace is the highest card, followed by king, queen, and jack. So, for example, the longest sequence of actions would be if player 1 checks, player 2 checks, player 3 bets, then player 1 calls or folds and finally player 2 calls or folds.

This game is quite small and can be solved analytically relatively straightforwardly, so one may wonder if there is any scientific merit in studying this game. Certainly if this were a game of the same size with just two players, the merit would be quite limited. For example, here we can see for a 2-player version of poker called “one-card poker,” where each player is dealt one card from a 13-card deck and there is a single round of betting like in 3-player Kuhn poker, the solutions are provided at the bottom http://www.cs.cmu.edu/~ggordon/poker/. These “solutions” correspond to a “Nash equilibrium” strategy for both players. If the players follow this strategy, then they will guarantee that in expectation, they will either win or tie against any possible opponent, regardless of whether he is an expert or a novice. A few caveats about this claim. First, this assumes that the equilibrium player alternates the role of player 1 and player 2 (as one of the players could happen to have an advantage — for instance in full 2-player no-limit Texas hold ’em, it is widely known that the player who acts first in the first round (preflop) has a sizable advantage in the game as he is able to act second in the subsequent (postflop) rounds, and therefore gets to act after seeing his opponent’s choice of action, which is advantageous). A second caveat about this “unbeatability” property for two player games is that it is only in expectation, i.e., in the long run as the number of game iterations grows very large. As there is luck involved, in the short run anything can happen, and a Nash equilibrium could perform quite poorly. For example, if the equilibrium agent happens to be dealt a 2 in 10 straight rounds in 1-card poker and the opponent is dealt an ace, clearly the equilibrium agent will lose over this duration, though eventually if they play long enough he will either win or tie.

Note also that while I described this “unbeatability” property for two-player poker, it applies more broadly to a class of games known as two-player “zero-sum” games. These are games where for any outcome, the sum of the payoffs of the players is zero. (It turns out that the sum need not be exactly zero, as long as the payoffs sum to any constant the game will have this property).

For example, in two-player poker, if one player wins $100, the other player is losing $100 (ignoring rake to the house). As another example, in the classic game rock-paper-scissors, if one player receives payoff of 1 the other player receives -1 (and vice versa), or they both receive zero in a tie; in all cases the sum is zero.

For games with more than two players or for two-player games that are not zero-sum, this result doesn’t hold. In these games, even if a player plays a Nash equilibrium strategy, he could actually perform quite poorly, for a variety of reasons. One of these reasons is that the game could have several Nash equilibria, and if the opponent(s) are following a different one than we are then the resulting set of strategies is not guaranteed to be in equilibrium and could in fact be quite bad for us. For example, in the classic Battle of the Sexes game, there are three Nash equilibrium strategies: one where both the husband and the wife choose Football, one where they both choose Opera, and one where they both choose the preferred option (Football for husband and Opera for wife) with probability 3/5 https://en.wikipedia.org/wiki/Battle_of_the_sexes_(game_theory). If the husband follows the first equilibrium while the wife follows the second one they will both do very poorly and receive payoff of 0. Note that in a two-player zero-sum game this “mis-coordination effect” could not happen, as Nash equilibrium has an “exchangeability” property that if one player follows one equilibrium and the other player follows another one then the overall strategies are actually guaranteed to be in equilibrium as well.

There are also other reasons why a player could potentially do poorly in these games when playing an equilibrium strategy, even for games that have only one Nash equilibrium. If one of the other players deviates from the equilibrium, we could actually do worse ourselves (while the deviating player would necessarily do the same or worse due to the definition of Nash equilibrium). (Note again that this could not happen in a two-player zero-sum game: if the opponent deviates from equilibrium, he could not improve his payoff, and furthermore our payoff could not decrease). Furthermore, for games with more than two players, several other players could collude against us to ensure our payoff is decreased even if we follow the equilibrium (again the payoff would decrease for at least one of the other agents, but could increase significantly for some of the other ones, improving their overall payoff while hurting ours in the process). Note also that it does not help if games with more than two players are zero-sum; any two-player non-zero-sum game can be converted into an equivalent three-player zero-sum game by adding in a third dummy agent with payoff equal to negative the sum of the first two payoffs, so therefore the properties of three-player zero-sum games are identical to those of two-player non zero-sum games.

Of course, these problematic situations need not arise in practice, and it is very possible that computing a Nash equilibrium could perform extremely well for these game classes. However, it would not have any worst-case theoretical guarantee like it does in the two-player zero-sum case.

One further issue making Nash equilibrium more compelling for the two-player zero-sum case is that there exist efficient algorithms for finding one in these games (from a computational complexity perspective, this task can be done in “polynomial time.”) In contrast, computing one Nash equilibrium is “PPAD-complete” for the other game classes (two-player non-zero-sum games and games with more than two players), and it is widely believed (though not definitively proven) that no efficient (polynomial-time) algorithms exist that guarantee computing a Nash equilibrium in these games, though there do exist algorithms that are able to successfully compute one in certain games (e.g., the first two papers of my PhD were on this https://www.cs.cmu.edu/~sganzfri/JamFold_AAMAS08.pdf https://www.cs.cmu.edu/~sganzfri/Stochastic_IJCAI09.pdf).

Now back to 3-player Kuhn poker. When I mentioned to a colleague that I was doing a project on 3-player Kuhn poker his immediate response was “isn’t that a solved game?” The answer it turns out is yes, but hopefully by now I have persuaded you why that is not the end of the story. A recent paper actually computes a set of infinitely-many Nash equilibrium strategies for this game http://poker.cs.ualberta.ca/publications/AAMAS13-3pkuhn.pdf (it is possible that other Nash equilibria as well). I made this paper available to the students and actually suggested as a starting point for their agent that they implement one of the Nash equilibrium strategies from the table in the paper.

There were 11 total projects, and the final agents ended up utilizing a variety of different approaches. Some agents were based on the equilibria from the table, some implemented Nash equilibrium-finding algorithms of their own such as counterfactual-regret minimization, and others incorporated machine learning and “opponent modeling” to attempt to learn about the opponents’ strategies and adapt during play, including approaches such as neural networks, fictitious play, and Bayesian analysis. Slides from 9 of the class presentations are on the bottom of the class website http://www.ultimateaiclass.com/. Note that the presentations were quite short (I told them to only allocate for several minutes), and important details are omitted from many of them. The students also turned in longer papers describing the approaches more thoroughly, though they submitted hard copies and I do not have full electronic versions to post.

In addition to the presentation and project, I ran a competition to evaluate the agents. Ideally I would have liked to try out all possible groupings of the agents and run a large number of matches for each grouping to obtain statistical significance, but the projects were submitted at the end of the semester and I also had the final exam to prepare and grade, so was forced to run experiments in the short amount of time I had available. I decided to use a standard tournament format, where several teams compete in the first round, then winners move on to the second round, etc., and ultimately a champion is determined. Since there were 11 teams, I randomly selected 8 teams to receive “byes” in the first round, while the other 3 teams competed for a second round slot. The first-round winner moved on to the second round with 9 teams. Then I ran three randomly-selected second-round matchups, which produced three winning teams that moved on to the third and final round. Ultimately a champion agent was crowned. For each of the 3-agent matches, I did the following. Suppose the agents were called A, B, and C. I played matches for each of the 6 combinations of positions for the agents: A-B-C, A-C-B, B-A-C, B-C-A, C-A-B, and C-B-A. The reason for this is that an agent can have a significant advantage or disadvantage by happening to be positioned to the left/right of a specific other agent, and I wanted to reduce the role of this luck component. For each of the 6 position combinations, I played 3000 hands (the role of the agents as player 1, 2, 3 rotated clockwise each hand, as is typically done in poker). Then I averaged the results over these 6 3000-hand matches to determine the winning agent between the set of 3 agents. I used the same “random seed” (which corresponded to a dealing of cards for the players) for all of the matches in order to attempt to minimize the role of luck in a similar way as “duplicate scoring” is used in the annual computer poker competition http://www.computerpokercompetition.org/index.php/competitions/rules/76-duplicate-poker.

Admittedly despite my attempts to reduce variance/luck as much as possible given the time constraints, the experimental design I selected still left a significant amount of room for luck. For one, 3 teams were unlucky to have not received a bye and to have to compete in an additional round. Also clearly some of the matchups were tougher than others and some teams were lucky to have easier first or second-round draws. Nonetheless, I still think that for a game of this size that 3000x6 duplicate hands between every combination of agents may be a reasonable sample size and it is possible that statistical significance was obtained between a given matchup. If I had more time I would have liked to try out all possible 3-agent combinations and ensure statistical significance for each one.

While some may argue that it may be unfair to base the students’ grade on the outcome of an experiment with a large luck component, I think the evaluation is justified for several reasons. First, as described above I think I played quite a reasonable sample of hands in comparison to the size of the game, and I think that the results are relatively conclusive and that overall the stronger agents outperformed the weaker ones. And in general, I feel that empirical performance is an important metric in designing real AI systems and therefore that it should be reflected in the evaluation. That said however, it was still possible for students to get a reasonable score for the project even with poor competition performance, as the experimental results were only worth 40% of the grade, while the presentation was worth 20% and the paper 40%.

Interestingly, the agent that ended up winning the competition employed a simple fixed strategy that outperformed far more “sophisticated” approaches, including several based on the Nash equilibria from the table in the paper and several based on opponent modeling and learning. One of the papers actually presented extensive experimental results showing that natural learning algorithms led to significantly worse performance than just following equilibrium strategies https://www.cs.cmu.edu/~sganzfri/Project_Presentation_Juan.pptx.

In conclusion, despite the fact that this game is “solved” in the sense that Nash equilibrium strategies are known (and were made readily available to all students), designing strong strategies for 3-player Kuhn poker remains a challenging open problem, as the game contains infinitely many equilibria, and furthermore it is not clear if the goal should even be to play one. Creating strong agents for games with more than 2 players is a major open problem in AI, far beyond this one specific game, and the fact that such a simple game is poorly understood should make it apparent that the status of larger multiagent settings we would expect to encounter in the real world is even more wide open. I will point out that the students were not expected to have prior background in game theory (or in machine learning or AI in general) before entering the class, and the game theory background was provided just over the course of several lectures I gave during the class. Furthermore, as I described, 3-player Kuhn poker is also one of the games used in the Annual Computer Poker Competition, to which teams of game-theory experts participate. So this project is accessible to students with a very wide range of backgrounds, while still providing a challenge even to the strongest students.