Elo Against the Spread

Elo ROI in the NBA

Lucas Calestini
Sports Analytics
7 min readFeb 4, 2020

--

Key Takeaways

  • Elo is a great baseline model for spread bets in the NBA
  • ROI for the 2019–20 season is +3.46% so far
  • Elo probabilities can be translated to spread using gamma distributions

When it comes to modelling, we tend to associate complexity with performance, convolution with quality. But not all neural networks are informative and basic rules at times can be generalizable. Sometimes simple models might be very telling, particularly when their nature follows observed patterns.

For instance, if I wanted to guess which team would win at a game tonight, I could simply look at their historical data and choose the team with the most wins in the current season. I probably wouldn’t be too far off the mark simply because over a long period of time better teams are expected to win more.

Sounds like Elo

Over the past few weeks we have noticed that some predictions based on the Elo Rating (particularly FiveThirtyEight’s NBA predictions) had really interesting picks on some games, which were often times right. At torneo, we studied Elo-like rating systems (such as TrueSkill, and Glicko) in the past, and implemented it for 3 years of soccer matches all around the world to see which teams would come on top. This time, we decided to take a better look at its potential in the NBA.

Cathy Ha summarized the model in her soccer article:

The Elo rating’s strength lies in its simplicity, and its ability to account for not only wins, but also how good the opposing team is.

We decided to test the power of a pure Elo model in sports betting, and whether it could be used as a sports betting system with a fixed-wager strategy. We can estimate the probability of a team winning given its current ratings— resulting in a naive prediction of the outcome. However, we wanted to test Elo against the spread (the 50/50 line), and not predicting the winner. We had to think of a way to reliably allocate a spread to Elo’s probabilities.

Spread as a gamma distribution

One way to calculate the expected score difference based on the probability of winning is through the past distribution of score differences for NBA games. We can map the frequency of past score difference to the closest probability distribution, and then use its probability density function to associate the prediction to the score.

Sounds complicated? Let’s break down the process:

  1. Probability of a team winning

Assuming teams A (home team) and B (visiting team), with ratings RA and RB, the Elo expected score, or probability of winning, is given by the formula below:

We can tweak the formula by adding an h factor for the home-team advantage, and add that to the home team ratings in the formula above. If h=100 then we can say that RA = RA + 100. Per consequence, the expected score for team B is simply 1-b.

2. Past score distribution

Once we have the probabilities of each team winning the match, we turn our attention to the score distribution. We can assume that the expected score difference follows the frequency distribution of past score differences*.

Score difference distribution in the NBA from Aug/2014 to Jan/2020

Looking at the score difference from the 2014–15 season to Jan/2020 (when this article was written), we see that 50% of the games ended with a spread between 5 and 16 points, with a maximum of 61 points (Grizzlies @ Hornets in the 2017–18 season).

The distribution is important. It tells us, under our naive approach, what is the likelihood that the spread will be greater than x for any random game. For instance, looking at the box-plot above, we could say that there is a 75% chance the spread will be lower than 16 points. Or a 25% change it will be lower than 5.

3. Fitting to a Gamma Distribution

Rather than looking at the exact points for the past five years and a half, we can fit the score difference to a parametric probability distribution. We used gamma distributions based on the observed frequency of the data, and the fit was very close to the actual distribution (slightly conservative). There are a few benefits of doing this (vs. looking at the actual scores):

  • We have a continuous distribution, so we can have an exact spread calculation for every fractional change in win probability. For instance, a 60% change of winning should not have the same expected spread as a 61% change of winning.
  • We remove the noise of outliers, or random gaps in the score distribution.
Actual scores vs. random variates (10,000) of fitted gamma distribution

With the parametric distribution, we can test for specific probability values and compare to what was observed in the past:

+-------------+---------------+-------------------+
| Probability | Actual Points | Gamma Func Points |
+-------------+---------------+-------------------+
| 25% | 5 points | 4.82 points |
| 50% | 9 points | 8.91 points |
| 75% | 16 points | 15.22 points |
+-------------+---------------+-------------------+

4. Winning probability and spread

Now that we have the distribution, we can then allocate the spread to the winning team. To do so, all we need to do is map a 50% difference in win probability to a 100% distribution in score. For instance, if the probability of a team winning is exactly 50%, that represents a 0 score spread. Likewise, if a team has a ~100% probability of winning, the spread would tend to infinity, or somewhere around the 61 points we observed in the past.

In other words, every 1% deviation from the probability of tie (50%) represents 2% in spread probability (this is due to the symmetric characteristics of the scoreboard). Let’s look at some examples below:

  • If the Lakers have a 63.5% chance of winning a game, that represents |50–63.5|*2 = 27% in our gamma function, or 5.11 points. As they are the favorite, we usually would write the spread as -5.11 points.
  • If the Warriors have a 30% change of winning, then the spread is at 40% of the distribution function (|50%-30%|*2 = 40%), or 7.14 points.

Is it profitable ?

We can decide which side to bet on by comparing our predicted spreads to Vegas odds. Then, we can run a simulation for past games to see what the ROI would look like given the Elo predictions. Below is a summary of the net return (net as of today / total wagered) and the ROI evolution through time.

If we had bet $1.00 for every game, since the first game in the 2014–15 season, we would have won 51.29% of bets. The total amount wagered adds to $6,842, with a net return of $ -101.54 and an ROI of -1.48%. Although we win more than half of the time, it would not have been profitable because of the vigorish. However, if we look closer, we do see that for some seasons the net return is positive. For instance for the current season, as of today, we would be on +$ 23.96 from $ 692.00 invested.

2019–20 Season Return on Elo Spread Model

A few things to note about the calculations:

  • We used a $1.00 wager size for all games
  • We only used Regular Season games
  • The vigorish in our odds data was on average 9% of net wins (mean odds on spread were at -110 [American] / 1.91 [decimal] )
  • We used an Elo model with h=100 for home teams and a logistic decay for the score difference** after each game

Conclusion

It turns out Elo is a great baseline model for NBA spread prediction. By looking at past wins and past score difference against teams at different levels, we are able to understand fairly well the dynamics of upcoming games, and how teams stand, comparatively. In sports betting, it would be a great baseline model, which likely wouldn’t make anyone too rich or too poor, but rather bounce up and down across different season periods. Given the simplicity, Elo is still a heck of a model in the age of AI.

To see our Elo predictions for upcoming NBA games, check our torneo page.

*We used the absolute score difference, which means the spread from the winning team’s perspective.

**The logistic decay is a factor [1,2) to impact the Elo rating based on the score difference. It ranges from 1.0 (win by 1 point) to ~2.0 (win by ~60 points).

--

--