Let’s build a neural network (NN) for the task introduced in my previous post. I expect the network to learn an algorithm of a poker hand ranking from the given examples. It is only a part of solving poker game with reinforcement learning (RL) and neural networks, but as we will see later, it is very interesting and important one.
Let’s look at the following example (single case for two players hands). It shows possible steps of the ranking algorithm.
What type of neural architecture are we looking for: feedforward (dense), a CNN, an RNN, other?
First, I will introduce the numbers:
- 52 cards in a deck
- 2,598,960 different sets of 5 cards (the number of combinations of 5 cards from a deck of 52)
In a real poker game, similarly to the example above, there are 7 cards in a set to consider (2 player cards and 5 on the table) and we have to select 5 of them with the strongest rank, so there is another number to face:
- 133,784,560 different sets of 7 cards
If our NN had to recognize (classify) only the rank, it would be the end of big numbers, but we have to compare the hands of two players. Assuming that both players have 2 cards and there are 5 shared cards on the poker table, we have to consider 9 cards for possible wins or draws.
- 3,679,075,400 different sets of 9 cards
That is quite a huge number!
Why have I mentioned those numbers? It gives us an outline of the problem complexity. If an NN had to remember the data, it could be quite a big network, but I expect the network to learn an algorithm. It means that an NN with a good architecture should be able to solve the task with a small number of parameters (variables). Rather than store all the input data cases, it should model the dependencies and rules learned from the given data samples.
But what is a good architecture and how small could the number of network variables be? You will find the answer in the following text.
The neural network model outline
The core NN (cards encoder network) takes 7 cards for the input and outputs a single vector representation (of those 7 cards). I expect the representation to store all the information to easily classify the rank of the encoded cards. Comparing the two representations, we should easily select the winner or classify the given case as a draw. What I mean by ‘easy classification’ is using a simple classifier (like a dense layer with softmax) built upon prepared representation. If the task was completed with 100% accuracy, we could say that the NN understands cards and poker hands perfectly. Below is the general scheme of the proposed system:
Input data representation
I will encode the cards with embeddings, precisely 53: 52 for deck cards and one for no ‘card’ (‘no card’ embedding will be explained later). That is quite a small number of embeddings and those should keep only the information about the card rank and suit. Therefore, in contrast to typical NLP cases, I expect the embeddings not to consume a lot of variables space here. I will initialize them with random floats.
We can think of seven cards as of a sequence of seven vectors (embeddings). In that sequence, the order is not important (I will discuss that later in the next post), so it is more a set than a sequence, but let’s stay with the sequence term. Our sequence has a fixed length of 7. That property offers many opportunities for neural architecture research.
To keep things simple I have started with a single vector of concatenated seven cards embeddings and just used a deep dense network for encoding and classification. For a single vector classification task, I usually use the Deeply Regularized Residual Neural Network (DRR NN) architecture network described in this paper.
You may think about it as a dense equivalent of convolutional ResNet. It usually performs better than just a deep stack of dense layers thanks to the addition of layer normalization and residuals. The deeper the network was, the better the results were but they never reached high accuracy. In detail: the rank classification reached 100% accuracy quite easily but the classification of the winner never went above 85%.
At this point, I have found some interesting statistics of the input data. If we randomly pick cards from the deck, we get strongly imbalanced data. Below is the table of ranks probability for randomly selected 5 cards:
As we can see, we will have a straight flush (the strongest rank) with the probability of 0.001544%. It means that we will pick a straight flush once every 64767 hands. Similarly, the probability of a draw for two players is very low — about 2%. It was a quite straightforward observation that the NN makes most errors with draws and rare ranks. It will be a hard task to train the network with such unbalanced data. I have modified the batch preparation algorithm to balance the ranks (equal frequency of ranks at a single batch) and force a given number of draws (10% of batch samples in the final settings). Having that implemented, the accuracy on winner classification increased slightly to over 92%. It is half of the previous error rate, but still not what I was looking for.
I realized that encoding the cards is more like looking for some patterns in a picture (or a sequence) — I hope you get the analogy. I’ve tried with the convolutions (CNN). I built a network with convolutions and residuals and that worked much better. I got 100% accuracy on the whole batch from time to time. One batch is equal to 1000 samples of 2 hands = 1000*(2+2+5) cards. Then I fine-tuned many parameters: the number of layers, filter size, learning ratio and optimization policy, dropout and many more, but as you can see on the graphs below, I never got permanent 100% accuracy. At this point, I was quite disappointed with the result and couldn’t find the explanation for the failure. The convolutions usually worked fine with encoding sequences, especially for short time dependencies, and here we have only 7 cards in a sequence. It looks that I need something more powerful.
The Transformer encoder
I decided to give the Transformer a try. The Transformer (Attention Is All You Need) is a really powerful architecture for sequence processing. We have a lot of transformers in the field of NLP — there are at least a few Muppets (BERT, RoBERTa, AlBERT, XLNet etc.).
After fine-tuning the Transformer encoder architecture, input layers and classifiers I finally got the perfect results. Just after 25 minutes of training, I got a permanent 100% accuracy during the training and on the test set. The test set of 20 000 hands (10 000 hands of 2 players) was prepared before the training. No hand from the test set appeared while training the network. This is a drawing of the Cards Transformer Encoder architecture:
Below are the graphs with training and test performance of the network. Training loss decreases far below loss of previous networks. I logged inverted accuracy (error rate?) since it fits better logarithmic scale (inv accuracy = 1-accuracy). The lower the inv accuracy the better, 0 is perfect.
Summarizing, the network built with the above architecture learned the algorithm of poker hands ranking and comparison after about 45 000 batches of training data. It took about 90 000 000 hands with the correct answer to ‘detect’ all the rules of the algorithm. This is below 3% of all possible poker hands combinations in the training data. It means that the network learned all the dependencies from only a portion of all possible cases. The final architecture got about 290 000 variables. It could do the task even faster with a larger number of variables (a bigger network). I have done the tests and a bigger network did it just after 30 000 batches. Similarly, the smaller network could also solve the task but it took longer to train.
This subtask of solving poker game with reinforcement learning and the NN leads to one important conclusion. Poker is a high variance game with a lot of unknown data. It is said that you can point a good poker player after about a few hundreds of thousands played hands! Also, many hands finish before the showdown (the showdown is the moment when players show their cards, an interesting fact is that even during the showdown not every player at the table has to show the cards, usually). In fact, most hands do not face showdown, because all but one player fold and the one wins. If someone expected an NN agent to learn all the poker rules and tactics ONLY from the game it for sure would take a lot of time. Maybe because of that the problem is still not solved in that way.
The subtask of card understanding is a small part of the whole poker environment. We can assume that every player at the table should know the cards ranking algorithm before starting the game. I can say that the above NN knows a lot about poker hands. In the future, I will try to extend this knowledge with the cases of partial data, when some cards are unknown. The model should then only estimate the power of a hand according to all the information it holds at the moment. I think you can guess now what the ‘no card’ embedding is for.