The Algorithm: How to Predict the Outcome of Sports Games

Jacky Feng
4 min readDec 20, 2019

--

During this past year, I joined my friend’s fantasy football league. My friend (currently working on his PHD in Mathematics in Northwestern) was insistent on my presence in the league, so that I could bear witness to his “Algorithm” first hand. The Algorithm is the name he has given to his football statistics prediction model that he uses to help him during fantasy football season. Ever since the creation of the “Algorithm”, he has finished in 2nd place, 1st place and now this year, he has an 80 point lead in his Finals matchup. So you could say that the “Algorithm” has served him well.

The actual calculations done by my buddy

As a result of his meteoric rise to fantasy football stardom, I decided to pick my buddy’s brain for an upcoming project. The goal is to simulate a game between two NBA teams. More specifically, a user would be able to pick two teams from any era/season. For example, one could simulate a game between the 1991 Chicago Bulls and the 2015 Golden State Warriors. The box score of the game would be generated for each individual player on the roster and the results of the game would be calculated based off of how well each player does. I talked to him about what my intentions were and we came to a solution that would not be too difficult to implement.

Who would win?

The simulated box score must be close enough to reality. For example, Steph Curry (the point guard for the Golden State Warriors) would realistically NOT grab 20 rebounds during a game. Is it possible? Sure, but it is highly improbable given the circumstances. The simulated stat line for each individual player must accurately reflect what their stat line COULD potentially be during the specific season that the user selected for that team. So how can we accurately determine what a potential stat line could be for every single player from that team?

The solution my buddy came up with relied only on concepts from an Intro to Statistics class. Most of the referenced season stats for NBA players are already “per-game” averages. In other words, we can already determine how much of a certain statistic they will accumulate per game on average. For example, let’s look at Steph Curry again. During his legendary 2015–2016 season, Steph Curry averaged 30.1 points per game. Using that as the mean and assuming that the amount he scores per game is normally distributed, we can find the standard deviation of the amount of points he scores per game during that season. With the mean and the standard deviation, we can derive a normal curve model and use that to randomly generate a stat that will accurately reflect the amount that Steph Curry can accumulate in a game.

Normal distribution: a probability distribution that shows that data near the mean are more likely to occur compared to data farther away from the mean

Standard Deviation: a measure of the amount of variation in a set of values

Formula for Sample Standard Deviation
Normal Curve

In layman’s terms, this random number generator, when passed the mean and the standard deviation, will realistically spit out a stat that Steph Curry (or any player) would have likely accumulated in a game. If this random number generator runs 100 or 1000 times, and each result was plotted on a histogram, the histogram will resemble a normal curve. Since Steph Curry’s PPG that season was 30.1, most of the results of the generator would be close to or around 30 points. However, it would still allow the possible explosion of points, say a 50 point game, but it would just be less likely.

A potential result of the random number generator

This is a very crude way to simulate statistics. We are making a huge assumption that a player’s stat is normally distributed throughout a season, but we feel it was a safe assumption to make. My buddy and I discussed about potentially incorporating advanced statistics to this model (having an opponent’s defensive rating factor in to the results of the random number generator, having home court advantage, etc.) but for now, this model will realistically model a potential outcome between two teams.

--

--