Predicting 3-pt FG%

Ben Howell
Texas Sports Analytics Group
4 min readFeb 4, 2021

Last week, I published a piece here that explored some NBA tracking data for LeBron James passes that led to Laker three-point attempts last season. You can find that dataset here.

Now that we’ve explored the data and some insights from simple manipulation of the data, what if we wanted to predict how likely a shot was to go in? After all, as we saw with Kuzma, he shot 60% on corner threes, which is just not sustainable. What if we could put together an “expected” 3-pt% for Kuzma, based off of where he shot the ball from, where the pass came from, and other variables, such as how close the defense was?

Every shot has a chance to go in, but not all shots are created equal. For example, Steph Curry taking a wide open corner 3 is very different from JaVale McGee taking a contested 27-footer with a second left on the shot clock. The quality of the shooter also influences the probability of making the shot, so we’d want to account for that. With that in mind, here are the variables that we’ll use to try and predict the probability of a shot being made.

  • x and y: the location of the shot
  • pass_x and pass_y: where the pass came from
  • shottype: Corner3 or Arc3
  • wide_open: how close the defender was, designated by “true” or “false”
  • seconds_remaining_on_shot_clock: how much time was left on the shot clock

What we’ll do is create an overall xShot% for every player, then we’ll regress that xShot% to the player’s career average 3-pt%. To do so, we’re going to use a Generalized Additive Model, which is a great catch-all and excellent for instances where we have binary outcomes, like a make or a miss. In the past, I’ve used a gam to represent swing probabilities for baseball players.

#using the mgcv package that we loaded earlier
#check out the first article to pick up with the code
train <- shots %>%
dplyr::select(x, y, pass_x, pass_y, pct, shottype, wide_open, seconds_remaining_on_shot_clock, pctFG3)
set.seed(2253)train2 <- slice_sample(train, prop = 0.75)shot_mod <- gam(pct ~ s(x, y) + s(pass_x, pass_y) + wide_open +
seconds_remaining_on_shot_clock, family = "binomial", data = train2,
scale = 0) #using the 'binomial' family b/c we have an either or situation
train$pred <- predict(shot_mod, newdata = train, type = "response") #response gives us the probability of each result

We run the model, then apply it with the predict function in R. One note of caution: here, we selected 75% of our dataset to train this model, but tested it on the entirety of the shots file, which includes the 572 shots that we used to train the model. Normally, we’d like to take 70–75% of the model to train and then test on the other 25–30%. However, with just 763 data points, I opted to go this route so that we could see how the xFG% is represented on the court.

Below I’ve plotted our xFG% metric on the court. Unsurprisingly, the reddest sections are on the corner, while the above the break threes have an xFG% around 25–30%. One note, the left corner is about average (35%), while the right corner is 40+%. I believe that is due to the small sample, and could be influenced by the players who were included in the dataset, even though the player name/identifier wasn’t used as a variable.

train %>%
court() +
stat_summary_hex(aes(x = x, y = y, z = pred), fun = mean, binwidth = c(20, 20)) +
labs(x = "Baseline", y = "Sideline",
title = "xFG% by Location") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
plot.subtitle = element_text(hjust = 0.5)) +
scale_fill_gradient2(low = "blue", mid = "grey", midpoint = 0.35, high = "red") +
facet_grid()
xFG% by location

Bringing back our table from earlier and comparing it with our xFG%, we see the same pattern; Corner3 shots consistently rate higher than Arc3 shots. There’s a little bit of variation between the sample FG% and xFG%, but not too much.

x3P% by Shot Type, compared with Sample FG%

What we’re really interested in is who over-performed and who under-performed by xFG% (Negative values indicate bad luck and positive values indicate good luck between the player’s Sample 3P% and their x3P% which has been regressed to their Career 3P%).

x3P% vs Career and Sample 3P%

A few names pop out. Alex Caruso had extremely poor luck on passes from LeBron, with a Sample 3P% of just 28, and an x3P% of 38.2. That’s a difference of -10.2.

Kyle Kuzma, who we talked about earlier, had just as large of a swing, though he was the beneficiary of good luck. He posted a difference of 7.5 between his Sample 3P% of 42.9 and x3P% of 35.4.

This could potentially be a sign that playing with LeBron elevated Kuzma’s level of play while hurting Caruso’s, but we’d need more data to draw such a conclusion.

This was a fun project, and a good way to get started with NBA shot tracking data. You can find the code to this piece here.

--

--

Ben Howell
Texas Sports Analytics Group

Sophomore studying Sport Management and Economics at the University of Texas. Writing about Baseball from an analytical and scouting perspective