Fantasy Premier League — Lineup Optimization Using Mixed Linear Programming (A.K.A “The Knapsack Problem”)

Published in

Sports Analytics and Data Science

7 min readOct 19, 2021

In this article I will discuss an approach to optimizing a 2 level “Knapsack Problem” while demonstrating on Fantasy Premier League data.

Background

The Knapsack problem is a known “NP-Hard” problem where you have N items, each one has a different size (m) and a different value to you (v) . You don’t have room for all the items, only M amount of room so you would like to choose the items that in total will produce the highest value while obeying the constraints. (*)

This model is considered an integer based Simplex algorithm (*)

Our Problem

In this problem we are trying to solve the following:

Buy the relevant players that return the best performances. (v)
Constraint yourself towards a salary cap. (M)
Constraint yourself to player positions on a roster.

The Data

The data was collected pertaining the historical data from the current year (2021) (*). Each player per match week receives fantasy points according to their performance in the actual game. Points are rewarded by playing , scoring, assisting, clean sheets ,and saves. They are penalized by conceding goals, yellow and red cards.

Each user for the sake of the example starts with a salary cap of 100 million dollars, and needs to build a team made of :

2 Goalkeepers
5 Defenders
5 Midfielders
3 Forwards

Each player has its cost, and most likely the higher the cost, the more points the player produces on a weekly basis (the are some arbitrages as well which will be the “gem” of this procedure.)

We will treat the value of our problem as the “Average Fantasy Points” , and use the Standard Deviation as the penalty to create a more robust and tighter optimization.

In addition to these indicators, I created a few more indicators to characterize the players performance with more closure.

Z Scores

Where MeanPlayer points for player i is :

GroupMean is :

GroupMean

Standard Deviation

Per Dollar Value — Total points per price of player
Confidence Index

Confidence Index

Example to this dataset that was created is the following:

Single Player Optimization

In this part, I decided to address the weekly substitutions that a manager can perform. The rule is that you usually get 1 free substitution, every next one is a penalty of 4 points. In this procedure you sell a player and buy one in return.

This is pretty trivial. I can iterate through all the players in my current set and per player display the top N players that would be good replacements. Once I have the list, I can sort by the most profitable substitution (most gained value from the substitution).

In the following example, I calculate per the goalie Sanchez, which player would be best to replace him with. We might be at a maximum point where we don’t need to replace. In this case you can see that there are 4 goalies that are preferable in terms of mean points while obeying the constraint of the salary cap (current reserves + current player cost — candidate cost). You can see the added value that the player would provide in terms of mean points on a weekly basis.

In the case above, given the constraint of the salary cap, it would supply us a profit of about 1.3 fantasy points a game to substitute Sanchez for Raya.

The next step is to iterate over all players to find the substitution that would provide the highest added value. Of course this take in account that you only have 1 sub, though if the added value is higher than 4, it may be profitable to perform the second one as well (long term injuries would provide 0 vs the mean points as an edge case)

Multi Player Optimization

In this problem we deal with building a team from scratch. Just like the knapsack problem , we have multiple players , each with a current price (cost) and a pre-defined value metric. In this case I decided to work with the average points per game of each player. This works fine once we reach a nice amount of games within the current season, or by utilizing pervious year’s data. Once we have these 2 categories defined, we would need to define the constraints. Per the Fantasy Premier League roster spots, we require 2 GKP , 5 DEF , 5 MID , 3 FWD. We would need to define within the optimization process that we maintain exactly that fit. Also , we would like to maintain the salary cap as a constraint.

Lets review the formal problem:

Lets define the following :

C — Cost limitation (total reserve at disposal)

N — Number of active players in the FPL

X — A binary vector size of N. X[i] = 1 if the player is chosen to be included in the roster, 0 otherwise.

P — Value vector. P[i] = value of player i

W — Cost vector. W[i] = Cost of player i

Subject to:

In this constraint we require the salary cap

Next we need to create a constraint for roster formation

From here we have a simple problem which we can enhance with using a disabled list

In the example I supplied here, I implemented this written in Python , using MIP — Mixed-Integer Linear programs. With the output we would receive the filtered list of the players that were chosen to be bought according to our problem.

Enhanced Optimization Problem

Here I wanted to briefly discuss tie breakers. Our value function was pretty straight forward , and due to the sample size and numeric values that the players can produce, there may (and are) many players with the same values in their average point rate. Theoretically we would assume that the “cheaper” player would be selected, but is this indeed true?

In addition , what if there are 2 players with the same value and same cost? how would I choose between the two?

I decided to add a layer of complexity to address these issues. In the maximization phrase I decided to add another element which resembles the penalty.

In this problem I decided that I would lean towards more “stable” and consistent players. This represents the risk of each player, implemented as the standard deviation per each set of values per player. So per player we would hold another value which is the STDev of the values per historical data.

To rephrase the problem, we are searching for the maximum total value while reducing the Log total risk.

It is worth noting that per group we are looking at similar scales since all players in each group (position) have the same scoring system.

Building Strategy

This current model does not take into account players that remain on the bench and ones that start. The model is trying to balance the total objective function. Given we want to prioritize the starting lineup, we can do the following:

Fix the bench . Choose the 4 players that would be on the bench. Example of that would be choosing a cheap sub goalie, and possibly use the first model to choose the optimal single players to choose the subs. In my case I optimized the cheapest “mean to dollar” players and minimized the cost.
Reduce your salary cap to not include the bench player costs
Add selected players from former step to disabled list
Adjust the constraints to roster spots to be only starting lineup [1,3,4,3] , [1,4,3,3] ….
Re-run the model

In this case I decided to test several cases and found that formation of [1–3–4–3] performs best having 2 DEF on the bench since there are several players performing exceptionally which are considered “mean to dollar” optimal.

Example for selected roster — Example of selected roster

Sources

(*) Knapsack Problem: https://en.wikipedia.org/wiki/Knapsack_problem

(*) Symplex Algorithm: https://en.wikipedia.org/wiki/Simplex_algorithm

Github to data extraction + lineup optimization: https://github.com/NachiLieder/fpl.git