# Poker Hand Prediction

## An iterative approach to solving a prediction problem.

Published in

--

Recently, I came across a problem. Based on the card combination that you get, you had to predict the hand you have in a game of poker. If the word ‘predict’ made your ears stand and you’re already thinking about which classification model you’ll be using then there’s a chance that you may have already lost. Here’s why…

# Background

Before I start discussing the approach, here’s a quick poker refresher for those of you who are new to the game. Poker is probably the world’s most popular card game at the moment. It’s not a very complicated game. Each play can make hands using 5 cards, 2 of which are in your hand and three on the table. There are different kinds of hands that one player can have and each of the hands have a ranking based on Poker Rules.

## Data Description

You can get a copy of the test and training files here. Even though they follow the same format, the problem set that I found had `1,000,000` training samples and `25,010` testing sample while it is the reverse in case of the link that I have attached. However, as I said, a machine learning approach may not get you anywhere. I think the reason they gave almost 40 times more testing examples than training was so that a generic machine learning approach would fail. We’ll get to that in a while. Following is the data description of the file. Each hand consists of five cards with a given suit and rank, drawn from a standard deck of 52. Suits and ranks are represented as ordinal categories:

`Attribute Information:S1 “Suit of card #1”Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}C1 “Rank of card #1”Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)...S5 “Suit of card #5”C5 “Rank of card #5”`

Each row in the training set has the accompanying class label for the poker hand it comprises. The hands are omitted from the test set and must be predicted. Hands are classified into the following ordinal categories:

`0: Nothing in hand; not a recognized poker hand 1: One pair; one pair of equal ranks within five cards2: Two pairs; two pairs of equal ranks within five cards3: Three of a kind; three equal ranks within five cards4: Straight; five cards, sequentially ranked with no gaps5: Flush; five cards with the same suit6: Full house; pair + different rank three of a kind7: Four of a kind; four equal ranks within five cards8: Straight flush; straight + flush9: Royal flush; {Ace, King, Queen, Jack, Ten} + flush`

Note that the Straight flush and Royal flush hands are not representative of
the true domain because they have been over-sampled. The straight flush
is 14.43 times more likely to occur in the training set, while the royal flush is 129.82 times more likely to occur than in an actual game. Here’s what the data in DataFrame looks like after `train_data.head()`

# Problem Solving

I am pretty sure the first machine learning technique that came to your mind was a tree-based approach and, yeah that makes a lot of sense, right? Except, the Random-Forest benchmark was 62.4% accuracy. Yeah, pretty low.

Now, those of you who have made up your mind that this is an ML problem and that is just the baseline results and you can do better, I appreciate it. In fact, I am open to reading up about any machine learning approach that you may have come up with to solve this problem with very high accuracy (preferably, 1). Do talk about those in the comments below. However, I am almost certain that there isn’t a ‘feasible’ ML solution to this problem. You may stack layer after layer and build a deep learning model with a huge number of parameters but would you really be proud of yourself?

The moment that I saw this problem I started thinking like a poker player. They tend to get their hands always right and they don’t seem to be using any sort of ML algorithm. Sure, they do have the best neural network architecture i.e. the brain but how do I mimic that approach? That’s when I realised that instead of machine learning, I should use a simple iterative approach with some basic functions. So, let’s get our hands dirty.

It will always come in handy to have a testing entry. This will let you modify the inputs and test each of your functions.

`# Creating a testing dictionary test = {‘S1’: 1, ‘C1’: 2, ‘S2’: 1, ‘C2’: 3, ‘S3’: 2, ‘C3’: 4, ‘S4’: 2, ‘C4’: 5, ‘S5’: 2, ‘C5’: 6}`

Libraries

This is a fairly simple task so you just need the pandas library for data manipulation and scikit-learn for accuracy measures.

`import pandas as pd`

## Data Extraction

Our next job is to create a function which extracts data from a row of the DataFrame which will be in the Series data type and stores it in a dictionary with each feature as a key in the dictionary.

`Output:{'S1': 3, 'C1': 12, 'S2': 3, 'C2': 2, 'S3': 3, 'C3': 11, 'S4': 4, 'C4': 5, 'S5': 2, 'C5': 5}`

## Functions to check card hands

Now, we need different functions to check what sort of hand a player has. These functions work in a very basic manner. We pass card information in a dictionary and they check whether a particular poker condition is satisfied or not. Return 1 for yes and 0 for no. You can be as imaginative as you want. First, let’s check for a flush. A flush is achieved when all five cards belong to the same suit. A fairly simple check function.

Next, we need to check whether a player has a straight or not. A straight is when a player has all cards in a particular order (eg:- 3, 4, 5, 6, 7). Pretty lucky!

Let’s check for the best card combination now — a royal. A royal is a special card combination having `{Ace, King, Queen, Jack, Ten}` in the card hand.

The next function is to categorize the hands in all the remaining categories. It might get a little tricky in some places but I have used a tree-based approach to create this function. I sorted the cards and based on the different combinations, I was able to assign different classes to the hands. I will add a more detailed explanation if someone requests it in the comments below.

## Assign Labels to Hands

It’s time to use the above functions to categorize the data into different labels.

Iterate over the training data

After all this hard (smart) work, let’s put our algorithm to test by iterating over the training data.

The above function seems to be going pretty well. Let’s evaluate the results.

`Output:1.0`

These results look really satisfying and we barely had to use an hour of our time. It was a good day! Here’s a link to my notebook for a more detailed explanation.

# Conclusion

The conclusion? Pretty basic. Not everything requires machine learning. Sometimes, plain old iteration can do the magic. Start thinking from the stakeholders perspective and the problem might start solving themselves. This doesn’t mean that what we have done here is not Analytics or Data Science. We are dealing with data and analysing it but what I present is a way of thinking to move beyond a conventional approach to solve problems. Again, this is just an approach and maybe you’re more comfortable with ML and that’s alright. It’s totally your call as long as you can get the job done.

--

--

I like to write about data science, machine learning and finance. I document personal experiences and projects. I love to hike and swim! Reading when not coding