A Simple Pokemon Battle Predictor
Pokemon battles are complex. There are stats to consider, the movesets, the human decision making. Is it possible, to accurately predict the winner of a battle? I think it is, and this post is going to go over the steps I’m taking to start down this journey.
First, I needed some data. For the sake of getting started, and seeing what even a simple dataset would be like, I chose some generated battles. These battles were unable to account for some special game rules, and the human element, so this won’t be perfect, but it’s definitely a start.
The dataset I found contained 50,000 algorithm generated battle results, based mostly on stats. With that in mind, I also grabbed a dataset with the Pokemons’ names and base stats. Using that, I was able to create new features for each of the two Pokemon involved in each battle.
Once I had my data, and the features made, I moved on to fitting my model. One thing I needed to know, what was the distribution of my target variable? This would be pretty important for determining my scoring metric. So, I made a simple, easy plot to show it.
Looking at the above plot, we can see that while there is some skew to the distribution, it’s fairly close. Knowing that a simple accuracy score from sklearn should be a fine metric. With that in mind, I used a model from xgboost, the classifier specifically. First, I ran it with no hyperparameters changes, just to see what kind of accuracy I’d get at base. So I fit my model, ran some predictions, and got my score, 0.924! It’s certainly not perfect, but it’s not bad either. Just to be sure I wasn’t getting a lot of false negatives, I went ahead and made a confusion matrix to see my results.
As we can see with this matrix, neither option had any more false predictions than we expected. Knowing that, I wanted to see which features were most important to my model. With another simple few lines of code, I produced a plot to show me which features were the most important to my model’s decision making.
Now we can see which features our model though made the biggest impact. Speed is a lot more important than the others. Seeing that, I decided to try just a simple bit of feature engineering. I created a new feature that simply showed whether Pokemon 1(P1) had higher speed than Pokemon 2(P2). After engineering the feature, and adding it to the training and validation sets, I was able to get an accuracy score of 0.953! A solid 3% increase. Not only that, I decided to see just how important my simple feature was to my model.
As we can see, my simple little feature became the most important feature the model had by leaps and bounds. This is an important insight to how my model was making decisions, and how others may in the future. Even just some simple feature engineering got me a marked increase in accuracy, and became the most important feature the model had for decision making!
In conclusion, I think this was a great first step in predicting the winner of live Pokemon battles. Now I know more about how my model makes decisions, and I have a good idea of which features might be important in light of further data.