FPL Classifier 2.0

Raghunandh GS
DataComics
Published in
7 min readAug 20, 2019

For people wondering WTF is 2.0? Here is some back story: Building an FPL Captain Classifier

Last year building the captain classifier model was a basic start on trying to pick captains for the upcoming game week. Even though the model did not manage to finish in top 4 it did not get relegated either. There was a fair share of predictions that it got right and there were some howlers as well. There were a bunch of problems which came built-in along with the model and burnt-in with the way that I was publishing results and calculating the model performance. Hence for this season, my prime focus was to improve the reliability of the model and ingest some confidence on the model for people who are consuming its results. Along with that, I am also aiming to improve the process of how the predictions are shared and results are evaluated. So let’s get the ball rolling.

The Problem With Probability

Last years binary classification model was calibrated to give better precision numbers. The model, when fed with the historical performance of a player with some stats around the opponent which the player will be facing in the upcoming game week gives a probability score for him blanking or not (≥ 4 pts). I had set a threshold of say, 0.8. So every player who crossed this threshold will be picked as a possible captain choice for the upcoming game week. There were game weeks in which none of the players crossed the threshold. Here is a problem, probability of 0.8 and 0.9 doesn’t mean that the player will score. It means that the player is likely to score. It means that if a player with such attributes and such performance in the past weeks plays against such teams with such strengths and weakness he will score 80 out of 100 times. So there is always a 20% chance that he won’t score and that’s the uncertainty. Because of this last season, there were multiple occasions in which the captain picks blanked. And Since results were evaluated week on week basis there was no way to compute how the model performed since the weekly number always represents either 0% accuracy or 100% accuracy. Hence, this season along with model picks model accuracy also will be computed and shared quarterly. Will share the details on how it will be computed later in this blog.

The Pickle with Precision

As the model was build to give good precision numbers the accuracy wasn’t given much importance. Because of that, we were able to pick only one or two players each week from 100s of players who were starting. Hence this time the model was tuned to give better accuracy. So the model will be giving a probability score for scoring 4 or more points in the upcoming game week (probablity_hit) for each player who has featured (had playing time)in the past two weeks. I will be publishing the entire list of players along with their probability score in this git hub repo before each game week. It will include predictions for 200 plus players for each game week. So this time along with picking captains, this probability score can also be used to make transfers, set starting line up, order players in the bench etc. This probability numbers will also help you identify whom not to pick and whom not to captain.

Yea. It does seem like a lot.

Meddling with Model

The new model was trained on 3 years worth of FPL data. Compared to last year a few more features were added which includes attacking strength and defensive weakness (for and against goals) of the team which the player belongs to and the opponents the player faced last two game weeks and the next game week. Most of these features topped the feature importance chart of the from the model in terms of scaled importance.

The model that was finally selected based on the AUC score was a stacked ensemble model. It had an AUC score of 0.73.

Putting the Model to Test

The model was put to test on a sample with 3748 samples. The model gave a probability score for each of the samples. To check how well the model probabilities match the actual numbers, the probability score was rounded off to a single decimal number and then grouped by the rounded off probability score. Then the number of predictions falling into each group and number of players in each group actually scoring 4 points or above in that game week was calculated (num_hits_actuals). From these two numbers, the actual probability was calculated by taking the ratio of num_hits_actuals / num_predictions.

The way to read the above table is as follows. There were 977 data points out 3748 samples for which the model gave a probability score of 0.1. Which means out of 100 players falling in this group only 10 would hit 4 points or more. The other 90 would blank. Comparing this with actual numbers, out 977 players only 88 scored 4 or more points. This gives a ratio of 0.09 which is very close to the model probability numbers. You can see how well the model probabilities are calibrated with the real world probabilities in most of the cases. This would be much more evident when represented as a calibration chart.

The red dots represent the model prediction vs actual probability. Closer the dots are to the 45-degree line (green line) better the accuracy. You can see in multiple places the red dots hugs the green line closely. There are predictions like 0.6 where the dot is below the green line which means overprediction by the model. Meaning the model has given a higher probability score for players belonging to the group than the real-world probability. Similarly, there are under prediction cases like 0.9 where the model had given a probability score of 0.9 but the actual probability turned out to be 1.

I will be using the above method to compute and monitor the model performance throughout the season. To compute these numbers at least a sample size of 1.5k to 2k data points is needed so that we get a good distribution of predictions falling into each range. So I am planning to do it once in every 9–10 game weeks. 4 times in this season and an overall score at the end of the season.

The usual suspects and unusual subjects

As this model was well calibrated this time we got interesting results in both ends of the horizon. This can be used to classify players into various different groups based on their probability score and their past week performance.

The Usual suspects
Players with a score of 4 or more last game week. It was able to predict players who will hit and who will miss among the usual suspects pretty well. Sharing a screenshot of the sample of predictions (probability_hit) and actuals (points_nextweek) below

The Hit List
The Miss list

The Unusual subjects
It was able to predict a good upcoming performance from a player who had poor past week record too pretty well.

And of course, the top picks ordered by the probability score were a combination of usual suspects and unusual subjects.

The Follow-up

Will be sharing top picks from each of these categories for every upcoming game week actively in my twitter account week in week out. Will be sharing the model performance results too once in a quarter. Leaving the link for my twitter handle below.

Also will be maintaining a git hub repo of with prediction for each active player across game weeks.

Wait! Did I tell you that the repo already has the predictions for the upcoming game week?

Happy Playing!

--

--