Poker Hand Prediction

An iterative approach to solving a prediction problem.

Prakhar Rathi

Published in

Analytics Vidhya

6 min readMay 25, 2020

If you’re stuck behind a paywall, click here to get my friend link and view this article.

Recently, I came across a problem. Based on the card combination that you get, you had to predict the hand you have in a game of poker. If the word ‘predict’ made your ears stand and you’re already thinking about which classification model you’ll be using then there’s a chance that you may have already lost. Here’s why…

Background

Before I start discussing the approach, here’s a quick poker refresher for those of you who are new to the game. Poker is probably the world’s most popular card game at the moment. It’s not a very complicated game. Each play can make hands using 5 cards, 2 of which are in your hand and three on the table. There are different kinds of hands that one player can have and each of the hands have a ranking based on Poker Rules.

Data Description

You can get a copy of the test and training files here. Even though they follow the same format, the problem set that I found had 1,000,000 training samples and 25,010 testing sample while it is the reverse in case of the link that I have attached. However, as I said, a machine learning approach may not get you anywhere. I think the reason they gave almost 40 times more testing examples than training was so that a generic machine learning approach would fail. We’ll get to that in a while. Following is the data description of the file. Each hand consists of five cards with a given suit and rank, drawn from a standard deck of 52. Suits and ranks are represented as ordinal categories:

Attribute Information:S1 “Suit of card #1”
Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
C1 “Rank of card #1”
Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)

...

S5 “Suit of card #5”
C5 “Rank of card #5”

Each row in the training set has the accompanying class label for the poker hand it comprises. The hands are omitted from the test set and must be predicted. Hands are classified into the following ordinal categories:

0: Nothing in hand; not a recognized poker hand 
1: One pair; one pair of equal ranks within five cards
2: Two pairs; two pairs of equal ranks within five cards
3: Three of a kind; three equal ranks within five cards
4: Straight; five cards, sequentially ranked with no gaps
5: Flush; five cards with the same suit
6: Full house; pair + different rank three of a kind
7: Four of a kind; four equal ranks within five cards
8: Straight flush; straight + flush
9: Royal flush; {Ace, King, Queen, Jack, Ten} + flush

Note that the Straight flush and Royal flush hands are not representative of
the true domain because they have been over-sampled. The straight flush
is 14.43 times more likely to occur in the training set, while the royal flush is 129.82 times more likely to occur than in an actual game. Here’s what the data in DataFrame looks like after train_data.head()

Problem Solving

I am pretty sure the first machine learning technique that came to your mind was a tree-based approach and, yeah that makes a lot of sense, right? Except, the Random-Forest benchmark was 62.4% accuracy. Yeah, pretty low.

Source: Kaggle Leaderboard

Now, those of you who have made up your mind that this is an ML problem and that is just the baseline results and you can do better, I appreciate it. In fact, I am open to reading up about any machine learning approach that you may have come up with to solve this problem with very high accuracy (preferably, 1). Do talk about those in the comments below. However, I am almost certain that there isn’t a ‘feasible’ ML solution to this problem. You may stack layer after layer and build a deep learning model with a huge number of parameters but would you really be proud of yourself?

The moment that I saw this problem I started thinking like a poker player. They tend to get their hands always right and they don’t seem to be using any sort of ML algorithm. Sure, they do have the best neural network architecture i.e. the brain but how do I mimic that approach? That’s when I realised that instead of machine learning, I should use a simple iterative approach with some basic functions. So, let’s get our hands dirty.

It will always come in handy to have a testing entry. This will let you modify the inputs and test each of your functions.

# Creating a testing dictionary 
test = {‘S1’: 1,
 ‘C1’: 2,
 ‘S2’: 1,
 ‘C2’: 3,
 ‘S3’: 2,
 ‘C3’: 4,
 ‘S4’: 2,
 ‘C4’: 5,
 ‘S5’: 2,
 ‘C5’: 6}

Libraries

This is a fairly simple task so you just need the pandas library for data manipulation and scikit-learn for accuracy measures.

import pandas as pd

Data Extraction

Our next job is to create a function which extracts data from a row of the DataFrame which will be in the Series data type and stores it in a dictionary with each feature as a key in the dictionary.

Function to convert Series data into a Dictionary

Output:
{'S1': 3, 'C1': 12, 'S2': 3, 'C2': 2, 'S3': 3, 'C3': 11, 'S4': 4, 'C4': 5, 'S5': 2, 'C5': 5}

Functions to check card hands

Now, we need different functions to check what sort of hand a player has. These functions work in a very basic manner. We pass card information in a dictionary and they check whether a particular poker condition is satisfied or not. Return 1 for yes and 0 for no. You can be as imaginative as you want. First, let’s check for a flush. A flush is achieved when all five cards belong to the same suit. A fairly simple check function.

Function to check whether a player has a flush or not

Next, we need to check whether a player has a straight or not. A straight is when a player has all cards in a particular order (eg:- 3, 4, 5, 6, 7). Pretty lucky!

Function to check whether a player has a flush or not

Let’s check for the best card combination now — a royal. A royal is a special card combination having {Ace, King, Queen, Jack, Ten} in the card hand.

The next function is to categorize the hands in all the remaining categories. It might get a little tricky in some places but I have used a tree-based approach to create this function. I sorted the cards and based on the different combinations, I was able to assign different classes to the hands. I will add a more detailed explanation if someone requests it in the comments below.

Function to check the remaining classes

Assign Labels to Hands

It’s time to use the above functions to categorize the data into different labels.

Function to assign labels

Iterate over the training data

After all this hard (smart) work, let’s put our algorithm to test by iterating over the training data.

Iterate over the training data

The above function seems to be going pretty well. Let’s evaluate the results.

Evaluating our results

Output:
1.0

These results look really satisfying and we barely had to use an hour of our time. It was a good day! Here’s a link to my notebook for a more detailed explanation.

Conclusion

The conclusion? Pretty basic. Not everything requires machine learning. Sometimes, plain old iteration can do the magic. Start thinking from the stakeholders perspective and the problem might start solving themselves. This doesn’t mean that what we have done here is not Analytics or Data Science. We are dealing with data and analysing it but what I present is a way of thinking to move beyond a conventional approach to solve problems. Again, this is just an approach and maybe you’re more comfortable with ML and that’s alright. It’s totally your call as long as you can get the job done.