Predicting Human Strategies in Games via Deep Learning

Zudi
Analytics Vidhya
Published in
5 min readMay 2, 2020

--

Fig. 1 | Predicting the distribution of human actions in a normal-form game using deep neural networks.

Normal-form game is a classical model in game theory that has helped establish several theories for understanding human strategies. One of the most famous examples is the prisoners’ dilemma, which is a normal-form game that can be present as a 2x2 payoff matrix. So what would the prisoners do when facing the choices? Traditional game theories tell that if we assume the prisoners are completely rational, then both prisoners will betray the other because such a strategy reaches the Nash Equilibrium:

“A stable state of a system involving the interaction of different participants, in which no participant can gain by a unilateral change of strategy if the strategies of the others remain unchanged.” — New Oxford American Dictionary

However, despite having strong theoretical guarantees, the Nash Equilibrium model usually does not perform well in predicting the real-world behavior of humans. This is because real human players can hardly be completely rational in choosing strategies. The level of rationality can be influenced by many factors, including the difficulty of computing the strategy, or the recognition of opponent players’ strategy. Therefore behavioral game theorists have relaxed the complete rationality assumption to model real-world experimental outcomes, which leads to theories like Cognitive Hierarchy (CH). However, those models still have strong assumptions, which makes it hard to generalize.

Therefore our question is:

Can we develop a model with minimal assumption of the players and let the model find a good way to predict human strategies completely from the data?

To this end, we make use of deep learning. Recent years have witnessed the boosting of deep learning techniques that have changed many aspects of human life. However, different from computer vision tasks like hand-written digits recognition that has been industrialized for more than twenty years, deep learning methods have been applied in human behavior prediction only recently [Hartford et al. 2016]. One possible reason is that models for predicting human behavior can not be easily productized like object detection models for autonomous driving and natural language processing (NLP) techniques for machine translation.

In this blog post, I will describe how to build a simple neural network model to predict human strategies in normal-form games. If you are interested in this topic but not very familiar with deep learning, I recommend taking the Deep Learning courses by Andrew Ng on Coursera or reading the Deep Learning Series blog posts by Jonathan Hui.

Method

As shown in Fig. 1, the task we deal with is to predict the distribution of human choices in normal-form games. The simplest idea of using deep learning for this task is to encode the payoff matrix as a vector and apply a multi-layer perception (MLP) to regress the distribution. However, such a model can hardly generalize to matrices of different sizes and permutations. In this part, we describe two techniques for improving model performance.

Pointwise Convolution for Permutation Invariance

The first design principle of the deep learning model is that the prediction of human behavior should be permutation-invariant. That is, permutating the cells of the payoff matric should result in accordingly permutated distribution. Therefore following [Hartford et al. 2016], we use a single learnable parameter for each feature map in the model. Such a layer can be easily implemented with the pointwise 1x1 convolutional kernels.

Several hyper-parameters can be adjusted when designing a deep neural network, including the number of layers and the number of learnable parameters in each layer. To minimize the search space, we fix the network to be a four-layer model with each layer share the same number of learnable parameters. By such a design, for each model, the only varying hyper-parameter is the number of kernels at different layers.

Global Information Aggregation with Pooling

Due to the permutation-invariant design of the layers in the deep learning model, the response of each position in the feature maps does not consider the values of other positions, which does not reflect human reasoning in making game decisions. Therefore we apply another technique in our model to aggregating global information.

Following [Hartford et al. 2016], we conduct the row- and column-pooling of the feature maps in each layer and concatenate the pooled features to the original feature maps. Then for the next layer, the convolutional kernels can compute the response from other positions even with the pointwise operation. For the pooling operation, we test both maximum pooling and average pooling. Note that although the pooling operations do not introduce extra learnable parameters, to aggregate the information, the channel of the convolutional kernels needs to be expanded by three times to conduct the weighted sum of the original, row-pooling and column-pooling features.

Experiments

We use a dataset that contains the experimental outcomes of 200 3x3 normal-form games. We conduct 5-fold cross-validation and use the out-of-sample regression loss and prediction accuracy as the evaluation criterion. Note that we fix the division of the subsets for all experiments instead of conducting random sampling to minimize the influence of randomness.

Fig. 2 | Average regression loss in the cross-validation experiments.
Fig. 3 | Average prediction accuracy in the cross-validation experiments.

Since we only have a small dataset, we restrict the number of layers to be four and the maximum number of kernels at each layer to be 512 to restrict the number of trainable parameters. With such a design, we observe that the overfitting problem is not severe as the training accuracy is only slightly higher than the test accuracy (Fig. 3), and the training loss is only slightly lower (Fig. 2). The training and validation accuracy improves with the number of kernels. We also observe that average pooling is consistently better than maximum pooling. The best validation accuracy achieved by the model is 0.816, and the best validation loss is 0.148. This result is significantly better than the Nash Equilibrium model, which can only achieve a prediction accuracy of approximately 0.5.

Next Step

So far, we have described how to build a simple deep learning model that achieves reasonably good performance for predicting human strategies in normal-form games. We have been collecting data from multiple publications. Improving the generalization ability of the model to normal-form games of different sizes (e.g., test 2x2 games with a model trained on 3x3 games) and different value scales will be our next aim. The code for training/testing the models is publicly available at https://github.com/zudi-lin/human_behavior_prediction.

--

--