Predicting the NBA playoffs using data

Kruthika Kumar
6 min readJun 12, 2023

--

“The Celtics are going all the way this year, it’s finally time”

“This is KD’s comeback year, there’s no stopping him and Booker”

It’s NBA playoff time in 2023! Amidst the excitement and shock (at what are probably going to go down as one of the most exciting playoffs ever!), I decided to finally take the plunge and answer the question “How predictable are the playoffs, really?”, by diving into 30 years of NBA playoff data, and seeing what came up.

This is a post about my predictions, and the real treasure — the insights I found along the way.

1. What makes a winning playoff team?

I looked at 450 playoff series between 1989–2018, and found a few key indicators. None of these will shock you (at least not if you’ve heard NBA commentators blaming everything from altitude & oxygen concentration to whether the look in the players’ eyes in the 4th quarter was “killer” enough).

i) Regular season offensive & defensive rating

These basically represent the points scored per 100 possessions on offense, and points allowed per 100 possessions on defense.

Conference Finals seem to have the most even match-ups

Note: Net rating = offensive rating — defensive rating

The team with the better offensive & defensive ratings in that regular season has much better odds of winning — seems fair.

Fun facts:

  • Best Net Rating: The 2 best scores of 13.0 and 11.8 were registered by (surprise, surprise), the Jordan Bulls (95–96, & 96–97)
  • Best offense: 20–21 Brooklyn Nets (Off. rtg: 119)
  • Best defence: 03–04 San Antonio Spurs (Def. rtg: 94)

ii) Regular season win-loss ratio

Stronger regular season performers are likely to win playoff series — also seems expected and along similar lines as the Ratings — but these 2 don’t quite tell the full story.

iii) Playoff experience & past playoff performance

What this is: For each team’s top 8 players for a season:
- Their previous 5 years’ cumulative playoff minutes
- Their cumulative playoff stat totals (Pts+Ast+Rbs+Stl+Blk) during that time

Here’s how the series break-down by playoff experience.

68% of all playoff series between 1989–2018 were won by the team with higher playoff experience, and a higher difference increases wining odds.

  • Most experienced team in this period: 1988–89 Lakers (1135 combined playoff minutes over the previous 5 years)
  • Biggest advantages won: Heat over Hornets (2014), Heat over Bucks (2013), Warriors over Pelicans (2018)
  • Biggest advantages lost: Blazers loss to Spurs (1993), Spurs loss to Grizzlies (2011)

iv) Average age of players

Older teams tend to do better when the age gap between the teams is 2+ years.

Fun fact: The highest losing & winning age gap happened with the same 2 teams, 1 year apart.

  • Highest winning age gap: 2011 West Finals - Mavs (31 yrs) def. Thunder (23.6)
  • Highest losing age gap: 2012 West 1st round —Thunder (25 yrs) def. Mavs (31.6 yrs)

2. Predictions

Using these parameters, I i) trained a model to predict the winner of playoff series, and ii) use the series predictions to get an overall championship winning probability.

i) Predicting playoff series winners

The model was able to pick the winner 73% of the time
for the 60 playoff series of 2019–2022, which it hadn’t been trained on

Biggest series upsets from 2019–22:

  • Heat def. Bucks (2020) —21% probability
  • Hawks def. 76ers (2021) — 23%
  • Heat def. Celtics (2020) —25%

There’s just something about the Heat (let all doubts about playoff Jimmy cease immediately).

ii) Predicting the champion

Here are the models’ top 3 picks for the last 4 playoffs.

Championship probability prediction 2019–22

Clearly, predicting the final champion is significantly more difficult. Over the course of an entire playoff stretch, the intangibles & finer details start mattering a whole lot more than what the numbers can capture. In 2021, none of the top 3 predicted teams even made the Finals. In 2019 & 22, the Finals had 2 of the top 3 predicted teams.

Odds-defying playoff runs

  • Over-achievers: Heat (2020) — Runners-up with 11th best odds, Bucks (2021) — Champs with 7th best odds
  • Under-achievers: Bucks (2020), Jazz (2021), Suns (2022)

Now, for my 2023 predictions:

Championship probability 2023
2023 championship probability (results as of 10-June)

Condolences, Celtics fans. Seems like this was your best year yet, odds-wise. Also, I’m so glad I hopped on the Jokic bandwagon in time to follow & enjoy the amazing Nuggets’ 2023 run.

3. What should we make of this?

Stats can give you probabilities — odds for a team to win/lose. But that’s not why we watch sports.

We watch sports because we want to see great players & teams defy the odds.

I won’t remember most of the series that went down as predicted. But many that didn’t, will stay with me for a long time — Lebron cementing his legacy with the absolutely insanity of Games 5–7 of the 2016 Finals, Luka’s “everybody acts tough when they up” + Game 7 demolition job on the Suns, and Giannis putting his name up among the greats in 2021 with “the block”, leading the Finals charge back from an 0–2 start.

As a fan, I hope my model’s prediction accuracy never hits 100%.

If you’d like some more details of how I went about doing this, read on.

4. How I went about this

Creating the playoff series prediction model
(pro-tip: use a catchy & easy-to-remember model name like NBAMLPSPM_xgboost_v4.1_final_final)

  1. Watch/read 2,817,904 basketball-related videos & articles to try and figure out which data points might make sense for your prediction
  2. Now take 30 years (450 series) of playoff series data to train your model (I used 1989–2018)
  3. Note: Here, I structured the input features as {(Parameter for team 1) / (Parameter value for team 2)}, and the dependent variable as a 0/1(Did team 1 win the series?). I also added the reverse of each entry (i.e. Team2/Team1 was a training row separate from Team1/Team2)
  4. Keep one set of data outside the training set (4 playoff seasons — 2019–22, in my case), for Out Of Time (OOT) validation — basically testing if your model can predict for time periods it wasn’t trained on.
  5. Train a model & evaluate its prediction performance on individual series (both within the period, as well as OOT).
  6. Note: The model was able to achieve an F1-score of 0.80 & 0.72 for in-time & OOT validation sets, respectively.

Using the playoff predictor to get championship odds

  1. Chart the probable paths for each team to the championship based on its playoff bracket — and all the potential match-ups in this process
  2. Run the model for all possible match-ups and get the win probabilities for each possible series there
  3. Stitch the probabilities together along each team’s potential paths to a championship

5. Shortcomings

Some shortcomings of this model (that I hope to rectify in future attempts):

  1. No accounting for injuries & mid-season team changes — the model assumes that the players whose playoff experience is counted are the same players who played in the regular season, and that they will be available throughout the playoffs.
  2. No game-level predictions — that stuff is actually difficult & will need much more time & data.

Credits

  • Ajith Patnaik, the friendly neighbourhood data scientist, who was a constant brainstorming buddy & guide throughout this project. It was his ideas to use the ratio between the 2 teams’ stats instead of differences — which helped improve the model accuracy.
  • Basketball reference — source of the data used. Thank you for the amazingly curated collection of player & team stats going back decades. Yeoman service for data enthusiasts.
  • Dozens of helpful youtubers & article authors (special shout out to JxmyHighroller here)

--

--