Building an FPL Captain Classifier

Published in

DataComics

11 min readSep 26, 2018

Imagine, you go through a pretty rough week and you wanted to find some peace of mind over the weekend. On that Saturday evening, you switch on the television only to find that your FPL captain (whose points will be doubled) has played 87 mins already haven’t scored any more than two points for appearance. The week can’t get any worse. So in a way choosing a captain who will score at least a point more than the appearance points is a must to have serenity over the weekend.

This is a problem that was intriguing me for while. Hence, I decided to build a classifier that will give me a list of possible players whom I could captain in the upcoming game week without much ado and hopefully get peace, attain enlightenment and revive the soul during the weekend.

The Data

When I was searching for data to build models, I luckily stumbled upon this github repo. This guy has done the hard work of scrapping the FPL data and store it in flat file systems. He has also given scripts that will enable us to scrap the data for the ongoing season. It is because of guys like this the earth still revolves around the Sun. I can’t thank him enough for getting this done.

There were around 1000 plus players over the past two seasons. The Data has information on the performance of each player on every game week. This is the snapshot of information that is available for each player on a game to game basis.

The Distribution

Let’s quickly do a histogram of the point distribution for the past couple of seasons.

I have removed non-playing players as that will make the zero bar touch the sky. As you can see here, most of the players end up getting points 1–3 and there is a long tail that extends to 15. There are also bars lying after 15 but they are all very few in number hence you don’t get to see them clearly in this histogram.

Why God, Y?

Next step is to choose the Y variable for which we will be building a model to predict. We will be building a binary classifier. The two classes here are

0 — Players Who will get 3 Points or Below in The upcoming Gameweek

1- Players Who will get at least 4 points in The upcoming Gameweek (Peace)

First, we need to create a column for each player which is nothing but the points he scored in the next game week and based on that we will create the binary variable isCaptain which will be our Y.

For a better future, add features

The list of features in the dataset is no way enough for us to build a good classifier. Plus all the features that are given for a player correlates with his performance for the current game week but that is not what we need. We need a way to classify his performance in the upcoming game week. For that let’s add few features regarding the opponent and their strength, whether it is a home match or not etc for the upcoming game week. We will also add a few features regarding the player’s performance in the past two game weeks. We will also add a feature about the strength of the team for which the player is playing for.

I used the overall normalised goal difference (total Goal Diff/number of matches) for the team in that particular season as a proxy for their strength.

Minimising The Noise

Next step was to minimise the noise in the data set. There were too many players who do not play at all. There were few players who appear once in a blue moon so they do not have historic pattern. There are also players who are injured and out for a week , two or more. If a player did not feature in the latest game week that has ended it is very difficult to predict how well he will fare in his next week. Including these kind of data while training the model will bring down the model performance. Hence, it is better to keep the out of the input data table. let them warm the bench.

No Cheating

Before the data was ready I remove a bunch of features that might act as a cheat sheet for the model in evaluating players performance in the upcoming game week. I removed all features the corresponds to the player like, Player name and player id, selected percentage etc. I removed round information. I removed the transfer activity information of the player too like transfers_in, Transfers_out, Transfer_balance.

The Precision Game

The model that we will be building is a binary classification model. The performance of a classifier can be measured based on different metrics. Three most widely used metrics are Accuracy, Recall and Precision.

Before delving into understand these metrics let’s recall the problem statement once. We are building a model which will give us suggestions for captains for the upcoming game week.

Accuracy: This is a metric which measures the models ability on how well the model is able to classify the players rightly as a captain material or not. Clearly this is not our need. We need do not need a model that can classify each player correctly. We ourselves can tell 99 out of 100 times Morata won’t score more than 2 points. We don’t need a model for that. Plus it is very difficult to build a model with a good accuracy as events related to sports are pretty random in nature.

Recall: This is a metric that measures model’s ability to find all relevant cases in the data set. To make it clear it measures the ratio of how many captain materials we were able to identify out of all actual captain materials that were there in the dataset. This too is not much in our interest as we do not need 25 captains for the next game week out of possible 30. We just need three or 4 proper candidates, after all we can captain only one player per game week.

Precision: This metric expresses the proportion of the data points our model says was relevant actually were relevant. So if a model says a player is a captain material, this defines on average how many times the model got it right. This is the only thing that can bring a level of certainty in choosing the captain, and there by help us achieve serenity.

The Model

I had split the data into train, validate and test randomly. The test set had a sample of 2482 players. After training all the metrics were computed on this data set to find evaluate the models performance. There are 20 teams in the league with 11 players starting for them each match. So out of this 220 players from each game week we must be able to rightly classify at least one captain with a good precision score. So for a sample of 2482 we need a model that can classify at least 12 captains (2482/220) with a good precision score.

After trying bunch of tree models with different hyper parameters I decided to stick to GBM model with parameter tuning as it was giving a good base precision number. For each player in the test set the model gives an output which is a number that lies between 0–1. It is upto to us to set a threshold to classify them as captains.

Run 1

Threshold for captain: 0.659

Number of Captain Identified: 12

Precision: 75%

Oh! The missing type

75% precision is okay for an initial model but is not good enough. So I was a little upset. When I checked the data I found that there was one big piece of information missing. There was no positional information on where the players play. So I added one more field with player type information. 1 — Goal Keeper 2- Defenders 3-Mid & Forwards. With this information added to the input dataset I ran the model again and Voila! I got more upset as the precision din’t improve at all. After all this is modelling not maggi making. Nothing is ready in two mins.

Run 2

Threshold for captain: 0.659

Number of Captain Identified: 12

Precision: 75%

The World is round

I added the round information that I had kept it aside initially. I though adding round will help the model identify some patters regarding the a bunch of fixtures around the Christmas and new year where fatigue kicks in and even top players don’t perform. I thought this round can be a identifier which can act as a proxy to the time of the year. But unfortunately the precision went even down.

Run 3

Threshold for captain: 0.725

Number of Captain Identified: 12

Precision: 50%

Wisdom of the Crowds

Wisdom of crowds is the idea that large groups of people are collectively smarter than even individual experts when it comes to problem solving, decision making, innovating and predicting. Basically, the wisdom of the crowds theory relies on the principle that a group of people possess a greater level of knowledge than an individual, or just a few individuals. Because of this fact, it is easier for a crowd to make better decisions than just one or two people. In other words, “collective judgment” can be a lot more accurate and higher quality than individual judgment.

So in the dataset I started looking at features that in a way can be used as collective intelligence. There were features like selected %, transfers_in, transfers_out, transfer_balance. Since millions of players are playing FPL all around the world this can be a good collective intelligence measure. Even though the game week begins on a weekend, when you mine data during the preceding Wednesday or Thursday you will be able to get the the trend of the crowd. Hence, I decided to key in these features into our model. And Yes, It did improve the numbers. Not only the precision improved but also the recall. It was able to rightly classify more number of captains with a good precision.

Run 4

Threshold for captain: 0.75

Number of Captain Identified: 18

Precision: 83.33%

These are the captains that the model classified as captain materials from the test set of 2482 entries. I have highlighted the ones whom don’t get picked as captains often.

After all, It requires balls of steel to captain a defender.

The Real Test

All this tests were performed on the randomly selected samples belonging to the test set in the 2016–17 and 2017–18 seasons. All this effort will be vain if this model doesn’t show good performance on the ongoing season. No one wants a model who was doing better sometime ago. If the model needs to get used it should show performance in the current season. No one wants Maradona but everyone wants Messi. Let’s test this on the test from the current season. The test set has all the data from 4,5,6 game weeks as 1,2,3 game weeks can’t be used as it does not have lag features like previous game week points for all 3 previous weeks. The test set had a total of 590 entries. We need 3 captains at least one belonging to each game week.

Test

On giving a threshold of 0.75 it was able to identify two captains with 100% precision.

But we need at least one more to satisfy the minimum captains condition. Let’s lower the threshold to 0.70. We get 6 captains with 83.33 % precision.

Salah was un characteristically poor against Leicester.

I believe the accuracy in the current season has improved because of the fact that team strength and opponent strength reflects much more from goal difference of the current season. Last couple of seasons for training data I was using the overall goal difference, now I am using the current goal difference which is a very recent measure of their strength. I also expect the precision to go up and down throughout the season but on average hoping it to be around 83%.

The Test is not over Yet

For the upcoming game week Model classifies only one player as a captain material and he is not Sergio Aguero!

Gods Must be Crazy

Model thinks Lucas Digne of Everton will score at least 4 points against Fulham. Being a defender it is possible if he gets clean sheet against Fulham next week or out of the blue scores a goal or makes an assist. Not even in my wildest dreams I thought this guy will get picked. Unfortunately there are no other players the model is able to classify with a good probability.

Predictions Week in Week Out

No model is perfect and there are always improvements that can be done. I will try adding more features and tuning the model to improve performance every now and then.

I have all pipelines in place for running these models for every upcoming game week. Every week I will be suggesting a list of players who are captain materials with good probability score. You might have better captain choices and these choices from model are not always the best choices. But it gives you certain amount of surety. Even if you don’t captain them if you want to make an extra transfer for that week these are good candidates as there is a good chance that they will return you the extra spent 4 points in the upcoming game week.

I will be Tweeting these suggestions from the model through my twitter handle a day or two before the deadline for every game week. You can find my twitter handle here.

raghunandh (@raghoonandh) | Twitter

The latest Tweets from raghunandh (@raghoonandh). Desi data Scientist. https://t.co/3Vhp2TtCLq. Bengaluru South, India

twitter.com

Do not abuse me if the models are wrong . — Aristotle

Happy Playing.

Web scrapping code to get the FPL data — 5 days
Getting the data in the desired format — 7 days
Model selection — 3 days
Feature Engineering — 9 days
Model building and parameter tuning — 11 days
Repeating the last two steps — 7 days
Testing and error metric computation — 2 days
Getting scolded by wife for watching too much football— 4 days
Triple Captaining Gabbiadini on a double GW and scoring 3 points — Timeless

Building an FPL Captain Classifier

The Data

The Distribution

Why God, Y?

For a better future, add features

Minimising The Noise

No Cheating

The Precision Game

The Model

Oh! The missing type

The World is round

Wisdom of the Crowds

The Real Test

Test

The Test is not over Yet

Predictions Week in Week Out

raghunandh (@raghoonandh) | Twitter

The latest Tweets from raghunandh (@raghoonandh). Desi data Scientist. https://t.co/3Vhp2TtCLq. Bengaluru South, India

Written by Raghunandh GS