Drafting a Fantasy Basketball Team With Help From Statistics and a Knapsack

Published in

Fun with data and stats

10 min readOct 21, 2014

You can find the accompanying draft optimization tool here

How does one draft the best possible fantasy basketball team? I like to think of this as a statistics and optimization problem rather than a sports knowledge problem (though knowledge only helps).

In this post I’ll present an approach to drafting an optimal team in a weekly H2H 9-cat league, an approach that I’ll be testing for my upcoming draft. This strategy by no means solves the problem holistically (in fact it’s far from complete), nor can I claim that it performs significantly better than any other approach, qualitative or quantitative. It’s just one that I find interesting and somewhat sensible, a thought experiment if you will.

For those unfamiliar with the rules of fantasy basketball, refer to the appendix below or to the notes on the right.

Setting the scene

Projection data —any draft strategy necessitates that we have some notion of players’ values. I’ll rely on third-party projections of players’ stats for the coming 2014–15 season since such projects generally perform well. In this case I’ll use CBS’s projections as an example. Our first difficult task is to determine how to value a player given their known season performance in the 9 categories.

Single-number value — a player’s overall value must come from some extrapolation of the player’s 9 stat categories. To create a simple basis for comparison, I want to distill each player’s value down to a single number that represents his relative value vs. others. And I want to do this so that I not only have an idea of the ranking of players, but also a good idea of the magnitude of difference in value between them.

Scope of the draft pool — we don’t want to compare players to the entire pool of NBA players since we really only want to get an idea of relative value within the group of players that are most likely to get drafted. In a 16 person league, 203 players will be drafted, so I’m going to choose the feasible draft pool to be 250 players — CBS’s top 50 centers, top 100 guards, and top 100 forwards. Also note that I’ll be comparing projected season totals, which take into account, for example, that Kevin Durant will likely miss around 15 games from his current injury.

A valuation approach

Summing a player’s absolute stats obviously has no meaning. Instead, I want to draft players who are as above average as possible. At this point, it’s informative to examine frequency distributions for each of the 9 stats across our feasible draft pool.

One thing to note — these stats are not normally distributed. Extreme outliers are present but infrequent. Because our distributions vary so widely, we need to normalize our data for comparison to be possible.

How do we measure how far away a player is from average? For this analysis, I’ll simply use a player’s standard deviations above the mean. This will give us a player’s score for each category (z), which we’ll calculate like this:

where x is the player’s projected total, µ is the mean of our feasible draft pool, and σ is the standard deviation.

In fact, this z-score (standard score) is commonly used to normalize data when comparing data from multiple variables. Normalization gives us an additional benefit — we can sum a single player’s z-scores for each of the 9 categories without allowing a single category with a large variance to dominate the sum. Consequently, we’ve arrived at our method for distilling player values into a single number: calculate a player’s z-score for each stat category, then take their sum, giving us his total standard deviations away from the mean as follows:

Here are mean and standard deviation data for the 250 players in our pool:

Valuation results

Calculating player values using the z-score method, we get a resulting value for each player. The full results are in this Google Doc. Here are the players with the 50 highest scores:

Some observations — Russell Westbrook is at 34, which is almost certainly too low, and Eric Bledsoe at 93 is almost worse; John Wall seems low at 15; DeAndre Jordan stands at 5, which is too optimistic; Kyle Korver at 24 and Robin Lopez at 39 are too high. But these values are unsurprising for the most part, giving us a bit of confidence that we’re on the right track.

Looking at a distribution of scores makes it clear just how much better the top players are:

Observe that we also have a measure of the magnitude of value differences between players. #1 Lebron James is almost twice as valuable as #22 Paul Millsap. That is, Lebron should be getting us points totals that are twice as many standard deviations above the mean than Millsap.

By studying DeAndre Jordan’s stats, we can understand why he received such a high score. He is the most extreme outlier in both categories, and more extreme of an outlier than outliers in other categories (he also happens to top the rebounds category). He scores greater than 4 deviations above the mean in blk and FG%; every other category only contains scores under 4. But since we’re attempting to filter for extraordinary players, it’s not imprudent to score outliers highly. Note that we make no value judgements about the merits of dominating a small number of categories in comparison to showing strength in a large number of categories. In practice, a draft strategy would take this difference into account.

DeAndre Jordan’s strength here may also be due to overly optimistic projections by CBS, which has him throwing down 2.62 blk/game and 13.44 reb/game compared to ESPN’s projections of 2.1blk/game and 11.2 reb/game. This shows the importance of our input data. We’ve assumed that our projections are population data. We can improve on this by sampling projections from a broader range of sources, then take into account standard error when doing calculations.

Solving the Knapsack Problem for draft picks

With a list of player values at our disposal, we’re now in a position to create a methodology for our draft. Snake style drafts can be trivially solved by selecting the highest value player with every pick, so I’ll focus on auction-style drafts.

We can frame the auction draft problem like this: given a list of players with corresponding values, and given a price that each player will likely go for, how do we pick players such that we maximize our total team value? Using these prices as a starting point, we can solve this problem with updated bid prices during the actual draft to find our new optimal team. This means that we’ve arrived at our final problem:

Given a set of players, each having a price and value, pick at most N players such that we maximize the sum of player values and the sum of the player prices is less than our budget B.

It turns out that this problem is identical to a well-known optimization problem, the Knapsack Problem. How fortunate! It states: given a set of items, each having a value and weight, pick items to put in a knapsack that maximize value given a weight constraint. The 0–1 knapsack problem restricts the copies of each item to either 0 or 1. Fantasy drafting can be modeled as a 0–1 knapsack problem with one additional constraint: the total of items (players) cannot exceed the max team size. Note that the optimal picks for a team size of at most N is the same as the optimal picks for a team size of exactly N, since the remainder of the team can be filled with 0 bid players.

The knapsack problem must be solved algorithmically. I used a modified version of the dynamic programming solution from Rosetta Code.

The last piece is a list of likely player prices. For this, I used Yahoo’s auction draft prices averaged from drafts thus far this season.

Here are the results with our values calculated above and a budget of 200 with a max team size of 13:

To help you solve your own knapsack problems, I’ve made an app that does the calculations for you: http://fantasy-ballsack.herokuapp.com/

That’s it! This is the optimal lineup produced by our z-score/knapsack methodology. Of course, these are not the actual players you should bid for. The solution changes with every new piece of player price information. In the beginning of the draft, if Lebron is bid at $20, the knapsack algorithm will choose Lebron to be on your team, given that each other player’s price is set to their likely price by default.

Shortcomings

We don’t take player positions into account — we can’t draft a team with all centers. Fantasy matchups mandate that certain certain player positions are filled. Basketball’s positions are flexible enough where we can get away with this level of analysis, but a deep analysis should take this into account.

Percentage categories don’t take number of attempts into account — a player may have a FG% of 100% during the week’s matchup, but that hardly helps if he’s made 1/1 field goals that week. Our analysis gives too much credit to players who have high percentages but low number of attempts, and vice versa.

Category saturation isn’t accounted for — this method maximizes total player z-scores, but those deviations above the mean might all come from the same few categories. In our result, we drafted DeAndre Jordan, Blake Griffin, Andre Drummond, and Marc Gasol, meaning we’ve probably overkilled blocks and rebounds. Our money may have been better spent in other categories.

We used a single data source for values — sampling from a pool of different projections might bring us slightly closer to a player’s true performance for the season. We did not take a player’s performance variance into account at all, instead assuming that their projected stats happen with 100% certainty.

Outliers distort our variance — even a few outliers can inflate variance for that category, pushing down the scores of other players for that category. A robust measure of variability that accounts for outliers is Median Absolute Deviation. Some method using MAD may work better, but the modified z-score using MAD inflates outlier scores even more in my analysis.

We don’t account for data dependence — the stat categories are not independently distributed data. A player with a high 3pm will also tend to have more points; a player with more rebounds likely has fewer assists. We don’t account for this, but it may not affect our analysis since we wanted to capture excellence in as many categories as possible without preference for category.

Comments on improvements are more than welcome!

Appendix — The basic rules of fantasy basketball

A league usually consists of 12–16 team managers, each of whom have 13 NBA players on their fantasy team. Each player has 9 associated stat categories by which matches are scored: field goal percentage (FG%), free throw percentage (FT%), 3-pointers made (3pm), rebounds (reb), assists (ast), steals (stl), blocks (blk), points (pts), and turnovers (to). If Kobe Bryant is on my fantasy team, then when he grabs a rebound, my team’s reb score goes up by 1; when makes a steal, my stl score will go up by 1; when he makes a 3-pointer, my team’s 3pm score will go up by 1, my team’s pts score will go up by 3, and my team’s FG% will increase. The higher the number the better, except in the case of turnovers, where lower is better. Each week, every team in the league faces off against another team that week. The team and the end of the week who has a higher score in a majority of the stat categories wins.

Managers can bid for players throughout the entire season, but each manager must select 13 players to begin with. The draft is the process at the beginning of the fantasy season that allows managers to select their starting roster.