Fantasy Football as a Data Scientist [Part 1] Random Walk

Eugine Kang
5 min readAug 2, 2018

--

I never played fantasy sports and the main reason was I never TRULY followed any of the major sports in the US. I’m not religious about NBA, NFL, and MLB. I enjoy going to the stadium to watch but not on my couch on a weekly basis. The only 2 sports league I watch and follow are UFC and Premier League.

When the opportunity came to play fantasy football (soccer) with a couple of people, I was excited about the future trash talks and planning the strategy to WIN.

Table of Content

  1. Understand how points are earned in fantasy football
  2. Collect players’ stats for 2017/2018
  3. Constraints in picking my XI
  4. Best XI with random walk
  5. Is this it?
  • Data and code for these posts are available in github

Terminology

  • GK: Goalkeeper
  • DEF: Defender
  • MID: Midfielder
  • STR: Striker
  • XI: Starting eleven (lineup)

Understand how points are earned in fantasy football

Points are earned through individual contribution to the above stats. I personally feel “Goal Scored (GK)” should be +15, and “Penalty Given” should be a stat measured. I like how “Clean Sheet” is almost as important as a goal, and there are a lot of ways for GK to earn points. Match wins don’t mean anything in fantasy football and its all about making the above stats or not.

However, there are constraints in picking your XI. Must spend less or equal to £100, and actually be a real football formation. You can’t pick 5 GK and 6 STR.

Collect players’ stats for 2017/2018

My first obstacle was to find data for the players available in fantasy football. Luckily, Sky Sports website already had season stats and cost available for scraping. Data cleaning process available in github repository [Part 0] Collecting Stats and Costs.ipynb.

Constraints in picking my starting XI

Now that I have clean data to work with, can I write a python script to pick the best team using this data? The biggest obstacle was to write a python function to test for all the constraints.

  • XI must cost less or equal to £100
  • Can’t pick the same player more than once
  • Only 1 GK allowed
  • 3 to 5 DEF allowed
  • 3 to 5 MID allowed
  • 1 to 3 STR allowed

Here I’m looking at the sum of cost for XI and checking if total cost exceeds £100. If the output of this function is more or equal to 0, this constraint is met and is a viable XI.

To not pick the same player twice, I look at the unique entries and see if that matches with XI. If the output of the above function is equal to 0, this constraint is met and is a viable XI.

Overall, I set 9 constraints for my XI and True must be returned to be a viable XI. More code available in github [Part 1] Random Walk.ipynb

Best starting XI with random walk

If we assume the players will earn the same exact stats in 2018/19 as 2017/2018, which XI will earn the most points? Let’s also assume you won’t change your XI throughout the entire season.

Now let’s plan to pick the best XI with the available data.

  • Randomly pick XI
  • Test for all constraints
  • Calculate total points from XI
  • Repeat 10,000 times
  • Select XI with max total points and pass all constraints

Here I’m randomly selecting XI with random_walk(), calculating total points with objective(), and testing all constraints with constraint_all(). One odd thing you will notice is negative sign in objective(). I plan on using scipy.optimize.minimize for the next step, and will optimize to get the lowest value.

After 10,000 trials, I see one blue data point which passed all the constraints and located on the top right corner.

This XI…

  • Costs £97.7
  • Earned 1831 pts in 2017/18 with Salah, M as captain (X2 pts)

Looks like a decent XI, especially with the strong Liverpool present. However, let’s try to brute force to a better XI by random_walk 1,000,000 times.

TRIALS = 1000000

This XI…

  • Costs £96.4
  • Earned 1936 pts (+105) in 2017/18 with Salah, M as captain (X2 pts)

This XI costs less but earned 105 pts more. Salah, M is Egyptian King, and two faceless DEF had 0 pts in 2017/18. I guess its better to save your money and focus on buying top shelf players.

Is this it?

How good is this XI? Can it get better? We tested 1,000,000 lines up but how many are there?

There are 800,789,882,655,737,251,012,174,233,427 total combinations to pick a lineup

yikes… we barely tested any portion of all possibilities. Of course, not all combinations are viable XI. What is the distribution of pts and Constraints for our 1,000,000 trials?

Simply eyeballing the above plot, 1/8 lineups are viable XI. Our final XI is located somewhere around the red arrow. For me to consider any lineup the XI needs to be viable and earned pts at least 1700.

After 1,000,000 trials, only 126 passed all constraints and had pts above 1,700

Raw count of 126 and 0.0126% doesn’t give much confidence that the final XI is the best. Can we brute force through 523¹¹ lineups? The current 1,000,000 trials took 1–2 minutes.

TOTAL_LINEUPS = 523 ** 11
TIME_FOR_1M = 1
TOTAL_TIME_IN_MIN = TOTAL_LINEUPS / (10**6)
MIN_IN_YEAR = 525600
TOTAL_TIME_IN_YEAR = TOTAL_TIME_IN_MIN / MIN_IN_YEAR

This will take 1,523,572,836,103,000,800 years to complete all combinations. Let’s see if we can get a better XI using scipy.optimize.minimize

NEXT: Fantasy Football as a Data Scientist [Part 2] Knapsack Problem

--

--