I wrote my PhD thesis proposal in September 2013, around 20 months before I ended up graduating. The title is the same as my dissertation title, “Computing Strong Game-Theoretic Strategies and Exploiting Suboptimal Opponents in Large Games.” The document as well as my final dissertation and individual papers are all up on my website.
The proposal consisted of two main portions: the first on completed work that I had already done (A), and the second on work that I planned to do in the future (B). The first portion (group A) is pretty self explanatory. In retrospect, the second portion (B) can now be further refined into four distinct subgroupings: 1) research that was completed during the PhD and included in the dissertation (as theoretically envisioned), 2) research that I didn’t complete during the PhD but have done subsequently, 3) research that others have done subsequently, and 4) research that has not been successfully explored (yet).
Groups 2, 3, and 4 can be further refined into “subsubgroupings” i and ii, with i being research that essentially extends prior work I had done previously and ii being new projects that are proposed (though in some cases the distinction between i and ii is not very clear).
Group 1 is pretty self-explanatory, and the material can be found in my final dissertation. Most notable was improving “endgame solving,” which has played a key role in the success of computer 2-player no-limit Texas hold ’em agents against humans (see this short article summarizing its contribution).
(Technically the endgame solving can be viewed as a blend between Group 1 and Group 3, as from my end it was completed for the PhD, but others have also utilized subsequently as well). In addition to endgame solving, my research on new information-abstraction algorithms produced several papers and also falls into the first category.
In fact I was told at the time by some members of my committee that the ratio of my proposal falling in group A to B was atypically high; however, it was clear to me that they did not really appreciate the scope of the projects in B, and it was obvious to me when writing the proposal that in reality only a very small percentage of the items in B would be completed for the dissertation.
But what has really been striking for me is, in retrospect, that despite the fact that I was not able to complete many of the proposed new projects myself before graduation (or after), nearly all of them have been implemented in some form subsequently by others (i.e., Group B3ii). While I have continued to work on research in computer poker to an extent myself since my graduation, it has not been as much of a major focus, and I have also expanded in several other application areas including recommendation systems, socialization, education, medicine, and hurricane prediction.
I listed out 9 items in Section 5.2 of the proposal “Work To Be Done.” I’ll go through each of them and summarize the developments since my graduation.
“Improvements for qualitative model project: implement new polynomial-time algorithm and try to characterize the situations in which the different models are used in limit Texas Hold’em.”
While I really liked that project and wish I would have had time to continue it to achieve the goals, I have not been able to unfortunately. However, I have written several articles within the area that this has opened up, on computing strategies that are human understandable (in contrast to prior approaches that often output massive strategy files that may be in binary and are unintelligible to humans). My paper “Computing Human-Understandable Strategies: Deducing Fundamental Rules of Poker Strategy” describes a new approach for computing strategies that are human understandable, culminating in two fundamental poker strategy rules (for when a player should make an extremely large bet or not bet at all/make a very small bet).
“New algorithm for information abstraction based on a new algorithms for computing earth-mover’s distance between histograms of unordered clusters.”
This was successfully completed for my dissertation and published in 2014 AAAI article “Potential-Aware Imperfect-Recall Abstraction with Earth Mover’s Distance in Imperfect-Information Games.”
“New algorithm for information abstraction that will allow MCCFR to parallelize on ccNUMA architecture”
This was also successfully completed for my dissertation and published in Section 2 the 2015 AAMAS paper “Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold’em Agent.” (Note that the first two authors for this paper are listed in alphabetical order per demand of the project manager despite the publication convention of AI.)
“Gain a better theoretical understanding of purification and thresholding by providing formal proofs of Observations 1 and 2, and generalizing results to games of arbitrary size”
Unfortunately these were never done, though I still find them interesting. However, I did work on extensions of purification and thresholding to more general post-processing approaches that were critical to some of the newer agents for two-player no-limit Texas hold ’em such as Claudico. These are described in Section 4 of the paper presented in the previous paragraph, and also elaborated on in my AAAI Magazine article “Reflections on the First Man versus Machine No-Limit Texas Hold ’em Competition.”
“Apply machine learning algorithms to extract human-understandable knowledge from the computed strategy files”
While this was not done explicitly for the two-player no-limit Texas hold ’em strategy files (in fact my access to those was promptly removed when I graduated in May 2015 right after the competition concluded), as mentioned above I have worked on applying algorithms from machine learning to extract human-understandable knowledge from optimal strategies in a simplified version of no-limit poker for the paper “Computing Human-Understandable Strategies: Deducing Fundamental Rules of Poker Strategy.”
“Additional enhancements and experiments for the endgame solver”
The “endgame solver” was singled out as being a pivotal component of the agent Claudico in the Brains vs. Artificial Intelligence competition, as discussed in the AAAI Magazine article “Reflections on the First Man versus Machine No-Limit Texas Hold ’em Competition.” Many enhancements were developed in the course of that competition. Subsequently the endgame solver was scaled up to a supercomputer and several new enhancements were developed leading to superhuman success in agents DeepStack and Libratus. (Note that the name “endgame solving” was later changed to “subgame solving” for several of these subsequent articles).
“Extending the endgame solver to solve intermediate portions of the game”
I did not work on this during my PhD, but I have a follow-up paper from 2017 with Kailiang Hu that presented initial work on this direction in a short article “Midgame Solving: A New Weapon for Efficient Large-Scale Equilibrium Approximation.”
Admittedly the results in that paper were extremely weak due to not having strong base strategies to start from initially. (Endgame solving approaches require a strong base strategy for the “trunk” to start with, and the newer “safe” approaches also require strong estimates for counterfactual values of nodes corresponding to avoiding to play the endgame. As I mentioned earlier CMU did not give me access to strategies such as Claudico.)
However, despite the poor results, a major strength of the approach is that it runs extremely fast on a single machine. Subsequent research from CMU extending the approach was published in a paper, “Depth-Limited Solving for Imperfect-Information Games,” which incidentally did not cite my earlier article. CMU’s approach produced an agent for 2-player no-limit Texas hold ’em that defeats several of the strongest agents using only a 4-core CPU and 16 GB of memory.
“New algorithm for solving large imperfect-information games by creating an endgame database and integrating it with MCCFR”
The idea here was to create a database by solving a large number of endgames in advance offline, and then in real time once play got down to an endgame to map it to one of the games in the database and look up the precomputed solution for that game. While I was not able to find time to work on this, this idea is similar to what has been done in some recent works if the concept of a “database” is replaced by a “neural network”, e.g., the DeepStack agent from Alberta, and the article from CMU “Deep Counterfactual Regret Minimization.”
“New algorithm for opponent exploitation that utilizes the endgame solver”
The idea here is to create an opponent model for the opponent just in the upper portion of the game tree (i.e., the “trunk”), and then to solve endgames where the opponent plays according to this model and we play according to the approximate Nash equilibrium strategy we have precomputed for ourselves. Note that with standard endgame solving without an available opponent model we just model both players as having followed the approximate Nash equilibrium strategy for the trunk (this was done for the agents in the Brains vs. AI competitions). However, there is no inherent reason in the algorithm that the equilibrium must be used for the trunk, and if we are able to create an accurate model for the opponent, the same approach would work with that as an input. While I was not able to work on this myself, many commercial “solver” tools, most notably PioSolver, do exactly this: they let a user enter a “range” (i.e., a distribution of cards based on following a given strategy) for both players, which is the best estimates for the strategies both players are following. Typically the users who are serious human players have some idea what strategy the opponent is following, and so are essentially creating an opponent model that is used as an input for the algorithm.
PioSolver also integrates opponent modeling and endgame solving in another way with a tool called “Node Locking,” where specific strategy constraints are imposed to model in the opponent within the endgames (as opposed to just for the trunk). Note that the first article to impose such strategy constraints for opponent modeling was my 2011 AAMAS paper “Game Theory-Based Opponent Modeling in Large Imperfect-Information Games.”
That paper assumes that we observe the opponent’s public actions but not his private information, and based purely on these “partial observations” the algorithm builds a model for the opponent’s strategy. The algorithm models the opponent as playing the “closest” strategy to a precomputed approximate Nash equilibrium strategy, subject to additionally imposed constraints that the opponent’s strategy conforms to our “partial” observations of his play. This algorithm, called DBBR (for Deviation-Based Best Response), was demonstrated to outperform just playing approximate Nash equilibrium against a variety of simple synthetic opponents as well as some entrants from the Annual Computer Poker Competition for 2-player limit Texas hold ’em.
A recent interesting paper at AAAI 2019 from researchers at Alberta and DeepMind proposes the first large-scale algorithm for solving such games with strategy constraints, “Solving Large Extensive-Form Games with Strategy Constraints.” While such games with strategy constraints can be solved by a linear program, they present a new algorithm based on CFR that scales to significantly larger games, along with significant new analysis. Experiments are performed on a security game and on opponent modeling in a small poker game. (Noticeably there is no mention in that work of my prior work on using strategy constraints for opponent modeling in imperfect-information games. My paper is only cited as “The only previous work which uses these partial observations has no theoretical guarantees on solution quality (Ganzfried and Sandholm 2011)”).
The idea from my paper of basing an opponent modeling procedure off observed deviations of the opponent from (approximate) equilibrium has received significant attention from high-stakes professional players recently, for example in the following Twitter exchanges: “You need to know equilbriums to train your ability to see deviations so you can exploit,” and “Exploiting is a calculated deviation from the equilibrium to EXPLOIT your opponents deviation from the equlibrium.”
Reflecting back over 5 years since the proposal, it is very interesting to see that all 9 of the proposed future projects have been studied in varying capacities. While only a few of them were addressed by the time of the thesis document itself (as I anticipated), many of them have been studied since my graduation, some by me, some by other researchers, and some by developers of popular commercial software. In some cases these were very close to what I had envisioned, though in others the subsequent research veered off in somewhat unexpected directions. It will be interesting to follow the progression further, and hopefully I will find time at some point to address the lingering topics that have not been fully explored yet.