I’m not a big sports fan but I always liked the numbers. That’s why I was interested in Fantasy Football. It struck me as a relatively simple optimization problem. And with the rise of DraftKings and FanDuel, I figured there would be a lot of historical information available.

The data was very easy to get and I was able to generalize the problem into an optimization problem. I then used PuLP to solve the problem.

To skip to the code, skip to Fantasy Football Using Linear Programming or check out my jupyter notebook.

# Linear Programming

If you’re familiar with linear programming, feel free to skip this section.

Linear programming is a method to achieve the best outcome of a given function given a series of constraints. The goal and constraints require linear relationships to have the math work in your favor.

For instance, suppose you have flour and eggs from which you can make pasta or unleavened bread to sell. Assume that pasta and bread units are in pounds and we can make partial pounds.

`flour = 30eggs = 40pasta = flour * 2.5 + egg * 5.0bread = flour * 3.5 + egg * 2.5pasta_sale_price = 3bread_sale_price = 2.5`

Now consider the constraints and objectives.

`Constraints:pasta * 2.5 + bread * 3.5 <= 30 # flourpasta * 5.0 + bread * 2.5 <= 40 # eggsbread >= 0pasta >= 0Objective Function:maximize pasta * 3 + bread * 2.5Solve for pasta and bread such that revenue is maximized.`

If we swap out the pasta and bread with x and y, respectively, we can graph all the possible solutions with the following equation:

`2.5x + 3.5y <= 30 and 5.0x + 2.5y <= 40 and x >= 0 and y >= 0`

WolframAlpha tells me the chart looks something like this:

The shaded area is the feasible region. Everything inside the region is sub-optimal. We only care about the edge. Since everything is linear, we can actually get away with only looking at the vertices.

Consider the extremes of putting all resources into making pasta or bread alone:

`100% pasta:   flour_constraint: 2.5x = 30 => x = 12  egg_constraint: 5.0x = 40 => x = 8  x = min(flour_constraint, egg_constraint) => 8  profit = 8 * 3 + 0 * 2.5 => 24100% bread:   flour_constraint: 3.5x = 30 => x = 8.57  egg_constraint: 2.5x = 40 => x = 16  y = min(flour_constraint, egg_constraint) => 8.57  profit = 0 * 3 + 8.57 * 2.5 => 21.42`

If you notice, producing solely pasta or bread, we are leaving some ingredients unused. In fact, there is only one point at which we use 100% of all our ingredients: 5.78 units of pasta and 4.44 units of bread (i.e. the final vertex).

`x = 5.78y = 4.442.5 * 5.78 + 3.5 * 4.44 = 30 # 100% of flour5.0 * 5.78 + 2.5 * 4.44 = 40 # 100% of eggsprofit = 5.78 * 3 + 4.44 * 2.5 => 28.44`

Making 5.78 pounds of pasta and 4.44 pounds of bread is the best solution, resulting in revenue of \$28.44 given our constraints.

Just because we use 100% of the ingredients doesn’t mean that’s necessarily the best choice. In this case it is, but it depends on the objective function. If the sales price of pasta were 10 times that of the bread, then it would make sense to make more pasta even if it means leaving unused ingredients.

This just scratches the surface of linear programming. If we had more variables, we could add more dimensions. In this example we only have 2, but everything generalizes to any number of dimensions. More constraints would just mean a different shape. For instance, we could have made a constraint that said no more than 6 pounds of pasta and that would just be a vertical line at x = 6. Our feasible region would only included the shaded are to the left of that. More examples can be found here.

We can use these same principles when deciding on our Fantasy Football pool.

# Fantasy Football in a Nutshell

The fantasy football I experiment with (DraftKings) consists of picking players to make up your team. Your score is a function of the players that you picked. The function isn’t so important but it’s based on their real world performance. For instance, 1 point per 25 passing yards, 4 points for a passing touchdown, and so on. You’re limited by a salary cap and positions (1 quarterback, 2 running backs, 3 wide-receivers, 1 tight end, 1 flex and 1 defensive special teams).

DraftKings makes it easy for people. They give you the average points per game for each player for the season. A general strategy involves going through each position, looking at the best historical performers (most points in the last game), and doing some kind of cost analysis (e.g.absolute points or points / salary). Here some sports fans may let their own biases influence their choices. Also, someone may consider the athlete’s opponent this week. Finally, players try to get as close as possible to their salary cap.

# Fantasy Football Using Linear Programming

I’ll be using python, pandas and PuLP to make my decision. We’ll be working off the naive assumption that whatever the person scored last time, he will score this time. We’ll optimize for the highest possible score given our salary and position constraints.

Note that API results may have changed since I ran this. The value they provide is season average so as games played, this value updates.

First we have to download and clean up the data a bit.

`import urllib, jsonimport pandas as pdimport refrom itertools import permutationsfrom pulp import *LATEST_URL = "https://api.draftkings.com/draftgroups/v1/draftgroups/21434/draftables?format=json"response = urllib.request.urlopen(LATEST_URL)data = json.loads(response.read())current = pd.DataFrame.from_dict(data["draftables"])# Remove players that are out or questionablecurrent = current[current.status == "None"]`

DraftKings has a Flex position that can be filled by any running back, wide receiver or tight end. Generally a player can only fill one role, so we need to add those eligible to the flex position back to our data frame and label them as position “FLEX”.

`# Add flex positionflex = current[current.position.isin(["RB","WR","TE"])].copy()flex.position = "FLEX"current = pd.concat([current, flex])`

The previous points the player scored is nested inside a “draftStatAttributes” field. For instance:

`"draftStatAttributes":[{"id":90,"value":"46.1","sortValue":"46.1"},{"id":-2,"value":"29th","sortValue":"29","quality":"High"}]`

For some reason its in a list. What we want is the “value” float in the list. It’s not always the first element so we need to find a float and extract that.

`def get_float(l, key):    """ Returns first float value from a list of dictionaries based on key. Defaults to 0.0 """    for d in l:        try:            return float(d.get(key))        except:            pass    return 0.0points = [get_float(x, "value") for x in  current.draftStatAttributes]current["points"] = points`

We now have everything we need. A few of the records are duplicated, so we can trim everything down and group by the fields we need: position, displayName, salary and points.

`availables = current[["position", "displayName", "salary",  "points"]].groupby(["position", "displayName", "salary",  "points"]).agg("count")availables = availables.reset_index()availables[availables.position=="QB"].head(15)` Available quarter backs

Since we have a constraint on position (i.e. only one QB, two RB, etc), we need to pivot our salaries and points on position. We also need to define the number of each position we will be constrained to.

`salaries = {}points = {}for pos in availables.position.unique():    available_pos = availables[availables.position == pos]    salary = list(available_pos[["displayName","salary"]].set_index("displayName").to_dict().values())    point = list(available_pos[["displayName","points"]].set_index("displayName").to_dict().values())    salaries[pos] = salary    points[pos] = pointpos_num_available = {    "QB": 1,    "RB": 2,    "WR": 3,    "TE": 1,    "FLEX": 1,    "DST": 1}`

If we look at the salaries variable, it’s just a dictionary of player names and salaries pivoted on position. points is the same.

`salaries{'QB': {'AJ McCarron': 4600,  'Aaron Rodgers': 6800,  'Alex Smith': 6000,  'Andrew Luck': 6200,  'Baker Mayfield': 4600,  'Ben Roethlisberger': 6900,  'Blaine Gabbert': 4700,  ...}`

Our salary cap is 50k.

`SALARY_CAP = 50000`

Now we have to define our variables. We want a variable for each position (e.g. QB). There will be an index for each player and the variable will be binary (0 or 1) meant to represent whether the player is included or excluded.

`_vars = {k: LpVariable.dict(k, v, cat="Binary") for k, v in points.items()}`

The _vars is a dictionary of position and an LpVariable.

`_vars{'QB': {'AJ McCarron': QB_AJ_McCarron,  'Aaron Rodgers': QB_Aaron_Rodgers,  'Alex Smith': QB_Alex_Smith,  'Andrew Luck': QB_Andrew_Luck,  'Baker Mayfield': QB_Baker_Mayfield,  'Ben Roethlisberger': QB_Ben_Roethlisberger,  'Blaine Gabbert': QB_Blaine_Gabbert,  ...}`

Now we can setup our problem. Our cost will just be our salaries indexed for the player times either 0 (not included) or 1 (included). Same is true for our reward. And finally we have a constraint on the positions available that we had defined earlier.

`prob = LpProblem("Fantasy", LpMaximize)rewards = []costs = []position_constraints = []# Setting up the rewardfor k, v in _vars.items():    costs += lpSum([salaries[k][i] * _vars[k][i] for i in v])    rewards += lpSum([points[k][i] * _vars[k][i] for i in v])    prob += lpSum([_vars[k][i] for i in v]) <= pos_num_available[k]    prob += lpSum(rewards)prob += lpSum(costs) <= SALARY_CAP`

Now that everything is setup, we can solve:

`prob.solve()`

The prob object is now solved. It has a variables function that has all our variables and each variable has a varValue which will be either 0 or 1. Below is a helper function to display the results.

`def summary(prob):    div = '---------------------------------------\n'    print("Variables:\n")    score = str(prob.objective)    constraints = [str(const) for const in prob.constraints.values()]    for v in prob.variables():        score = score.replace(v.name, str(v.varValue))        constraints = [const.replace(v.name, str(v.varValue)) for const in constraints]        if v.varValue != 0:            print(v.name, "=", v.varValue)    print(div)    print("Constraints:")    for constraint in constraints:        constraint_pretty = " + ".join(re.findall("[0-9\.]*\*1.0", constraint))        if constraint_pretty != "":            print("{} = {}".format(constraint_pretty, eval(constraint_pretty)))    print(div)    print("Score:")    score_pretty = " + ".join(re.findall("[0-9\.]+\*1.0", score))    print("{} = {}".format(score_pretty, eval(score)))`

And now for the optimal Fantasy Football picks for the week of 9/10/2018:

`summary(prob)Variables:DST_Jets_ = 1.0FLEX_DeSean_Jackson = 1.0QB_Ryan_Fitzpatrick = 1.0RB_Alvin_Kamara = 1.0RB_James_Conner = 1.0TE_Jared_Cook = 1.0WR_DeSean_Jackson = 1.0WR_Randall_Cobb = 1.0WR_Tyreek_Hill = 1.0---------------------------------------Constraints:2500*1.0 + 4900*1.0 + 5500*1.0 + 9500*1.0 + 6700*1.0 + 3600*1.0 + 4900*1.0 + 4600*1.0 + 7600*1.0 = 49800.0---------------------------------------Score:26.0*1.0 + 34.6*1.0 + 45.3*1.0 + 46.1*1.0 + 38.2*1.0 + 30.0*1.0 + 34.6*1.0 + 32.2*1.0 + 45.3*1.0 = 332.3`

Note that since these scores are selected based on results from last week, this is a very optimistic score. We essentially picked the best performers and their scores from last week, which likely contain a lot of outliers. In fact, if you check actual results, you’ll see that this lineup performed terribly this week, but I’ll leave that as an exercise to the reader.

Full source is available on Github.

# Alternative Methods

How much better is our search versus what most people do? Earlier I said that I would look at the numbers and do a relative kind of comparison, or just pick the top QB in points, then the top RB in points, and so on. That’s called a greedy search. To make it fair, we’ll consider all possible order permutations and pick the best one.

`def eval_players(players):    return sum([current[current.displayName == player].iloc.points for player in players])def greedy(val):    remaining = SALARY_CAP    positions = current.position.unique()    best_players = []    best_so_far = -float("inf")    for comb_position in permutations(positions):        players = []        for pos in comb_position:            for _ in range(pos_num_available[pos]):                available = current[(~current.displayName.isin(players)) &                                  (current.position == pos) &                                  (current.salary <= remaining)]                if available.size > 0:                    best = available.sort_values(val,ascending=False).iloc                    players.append(best.displayName)                    remaining -= best.salary        cur_eval = eval_players(players)        if cur_eval > best_so_far:            best_players = players            best_so_far = cur_eval    return best_players`

How does it do?

`greedy_points = greedy("points")print(greedy_points)eval_players(greedy_points)['Alvin Kamara', 'James Conner', 'Tyreek Hill', 'Michael Thomas', 'DeSean Jackson', 'Ryan Fitzpatrick', 'Jared Cook', 'Jets ']307.5`

About 25 points behind our optimal choice, which isn’t too bad. If you notice there is only 8 choices since we ran out of money by the time we got to the end. Let’s try a points per salary dollar.

`points_per_dollar = current.points / current.salarycurrent["points_per_dollar"] = points_per_dollargreedy_points = greedy("points_per_dollar")print(greedy_points)eval_players(greedy_points)['James Conner', 'Isaiah Crowell', 'DeSean Jackson', 'Randall Cobb', 'Tyreek Hill', 'Ryan Fitzpatrick', 'Jared Cook', 'Jets ', 'Austin Ekeler']300.40000000000003`

Here we got all 9 players, but a lower score. It’s likely due to leaving money on the table and although the bargains are good values, if you have some extra salary, you would do better picking up that more expensive marginal point.

# Final Thoughts

The linear programming method is not likely to bring you into elite status of fantasy football, but it greatly simplifies the problem. The basic assumption we were working on was that the points generated last week will be generated this week. Now all there is left is to solve for how many points we can expect a player to generate. This is a simpler problem that the original problem. We can run a regression, enrich the data with other sources, look at trailing averages or a combination of any number of techniques. The goal is now to predict expected points scored this week. And when we have our numbers, we can run it through this selection method.

## ml-everything

#### Machine Learning Everything

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just \$5/month. Upgrade