Predicting The Results of NBA Matches using ELO Rating System and Pythagorean Expectation

Sameer Koppolu
6 min readAug 12, 2022

--

Introduction

As sports fans, we’ve all had interesting episodes in our lives where we’ve tried to predict the outcomes of our favorite sports events. The same exists when it comes to the NBA. Season after season, fans and analysts state their case as to who might emerge as the winner in every NBA matchup that the season offers. So why not put some Sports Analytics knowledge to use in order to predict the outcomes of matches? This article will primarily focus on the usage of the Pythagorean Expectation and the ELO Rating System to determine the outcomes of NBA games in the regular season and the postseason.

The ELO Rating System

Created by Arpad Elo, the ELO Rating System is a zero-sum rating system that was originally developed to calculate the relative skill level among Chess players. Today, the system is used in several sports, including the NBA, to measure the strength of teams on a game-by-game basis.

For example, for a matchup between 2 NBA teams, A & B, each team will have an ELO Rating prior to the start of the game. Based on this, the win probability of each team is determined. If Team A wins, its ELO Rating increases by a certain amount, while the ELO Rating of Team B decreases by the same amount. That’s why it is a zero-sum rating system.

Let’s look at the variation of the ELO Rating of the Golden State Warriors through the 2021–22 NBA season

As the season began, the Warriors had an ELO Rating of 1529. However, as the franchise entered the postseason, the ELO Rating of the Warriors went into accession. In fact, they finished on a high with an ELO Rating of 1712. A point to note is that during Game 6 of the NBA finals, the Warriors had a lower win probability than the Boston Celtics, but still managed to capture the game and the title. So from the perspective of the ELO Rating System, the Golden State Warriors were the underdogs!

For the NBA, Project FiveThirtyEight has encapsulated the working model of the system. This includes the home team advantage factor and a margin of victory factor that affect the calculation of the win probability for each team in a given matchup. The dataset that holds the ELO Ratings of teams and the win probabilities is also available on Project FiveThirtyEight.

However, I have linked the dataset and the Python code that calculates the win probabilities and ELO Ratings in each matchup as a Google Colab file. (Link at the end of the article).

Using the win probability that is obtained from the ELO Rating prior to the completion of a matchup, we can determine if the team in question wins the matchup or not (i.e. if the win probability of the team in question is greater than 0.5, they are expected to win the match). Applying this logic, the results for every game can be obtained and can be compared to the actual game results using a confusion matrix. This can then be used to estimate the accuracy of the ELO Rating System.

As it can be seen, the ELO Rating System predicts the outcomes of matchups with an accuracy of 64.17%

The Pythagorean Expectation

Developed by Bill James for Baseball, the Pythagorean Expectation is used to determine the Winning Percentage of a team based on the number of runs the team has scored and the number of runs the team has allowed. In the case of basketball, the points scored and conceded by each team are considered instead of runs scored.

Here, the RHS of the equation is the Pythagorean Expectation. So, in essence, the points scored and conceded by the team are used to estimate the Win Percentage of the team. This means that the Pythagorean Expectation has a strong correlation with the Win Percentage of the team.

Consider the recent 2021–22 NBA Regular Season. If we calculate the Pythagorean Expectation and Win Percentage for each team and plot them on the X & Y axis respectively, we get the following graph.

As expected, there is a strong correlation between the Pythagorean Expectation and the Win Percentage. Moreover, the correlation coefficient between the 2 variables is 0.946, to be precise.

Extending this further, we can use the Pythagorean Expectation to predict the game results for each team in the regular season and postseason. To do this, we make use of the logit model. This model essentially calculates the natural log of the odds of a team winning a game based on the independent variables (i.e. Pythagorean Expectation and Home Team Advantage).

Here is the formula for the logit model.

As shown above, when the natural log of the odds is considered, the RHS of the equation is that of a linear regression. We can then use the natural log of the odds to evaluate the probability of a win. If the probability of a win is greater than 0.5, then the game result is a win. If the probability of a win is less than 0.5, it is a loss.

It should be noted that the logit model is used because the game result is a categorical variable that can have just two distinct values (i.e. win or loss) unlike a regular dependent variable.

Once again, the dataset and the Python code that employs the logit model has been linked as a Google Colab file. (Link at the end of the article).

The variable ‘win’ indicates the game result for every game that every team has played in the entire season which forms the dependent variable. It takes only two values- 1 if the result is win for the team in question, and 0 if the result is a loss for the team in question.

The variable ‘pyth’ indicates the cumulative Pythagorean Expectation over each game, and together with the ‘home’ variable (indicates whether the team in question is playing at home or not, i.e. home = 1 for home games and 0 for away games), become the independent variables.

Let’s now look at the model that is generated.

Using the independent variables, the model generates values for the natural log of the odds of a win for every game that every team plays. From this, the probability of a win is calculated in every game. As stated earlier, a game result is a win for the team in question only if the obtained probability is greater than 0.5, otherwise it is a loss for the team in question.

We can then generate a confusion matrix to analyze the accuracy of the game results generated by the model.

As it can be seen from the confusion matrix, the model predicted game results with an accuracy of 63.08%

Python Code

As stated earlier, a Google Colab notebook containing the necessary Python code is linked below.

URL: https://gist.github.com/sameerprasadkoppolu/d0300cfd0a682ae0c00319f49d85c952

As for the dataset, it can be accessed and downloaded using the link below

URL: https://drive.google.com/file/d/1xTBQN-2cyl3LEmH8EAsUF60eVwoBEtej/view?usp=sharing

And Finally…

Remember that is article is just an explanatory one. In no way is this article a means to ensure that one who places bets on the results of NBA matches comes out victorious. Nor is it an encouragement to get into sports betting in the first place.

--

--