The Most Clutch Shot Clock Shooters in the NBA — A Statistical Approach

Chris Buetti
Oct 25, 2017 · 8 min read

A friend made a comment that stuck with me when we were watching basketball last week. “Steph Curry is just so clutch, he said. He definitely gets better as the shot clock runs down”. At first I thought there was no way that’s possible, but it did make me curious. Is that the case for any one player? Is there any shooter that is so clutch that not only does the shot clock not phase him, but he actually gets better as it approaches zero? A simple google search didn’t sufficiently answer my question so I took it upon myself to find out. I found a data set on Kaggle that has info on every single shot taken during the 2014–2015 season. I know it’s a bit old, but it’s the most recent information of its kind available. It has a large number of attributes such as time on the game clock, the closest defender, how many dribbles were taken before the shot, and more. My hypothesis is that no single shooter has a statistically significant increase in shot percentage as the shot clock runs down — rather every outcome will be random noise or even a decrease in shot percentage. Let’s see if we can use the data to answer the question.

*Note: This is a beginning-to-end statistical analysis using the R programming language. The code for this analysis can be found in my github, here. The post includes exploration and data cleaning in the section “Exploratory Data Analysis”. For just the results, please skip to the subheading “Shot Clock Analysis”.

Exploratory Data Analysis

I’ve loaded in the data along with the the dplyr (data manipulation) and the ggplot2 (data visualization) packages. Let’s look at the data and get a feel for its structure.

Data Structure

The dataset has 21 variables and 128,069 recorded shots. Many of these variables, although intriguing, aren’t applicable.

The variables of interest are the shot clock, game clock, player name, and shot result. The shot clock values are in a float format, so we will need to round them to the nearest whole number.

str(nba) — Structure of the Data Set
head(nba) — First Six Rows

Shot Attempts per Player

The data consists of 281 players and each of their 126,065 combined shot attempts. The data only includes two and three point attempts; free throws were not recorded. Let’s look at the distribution of shot attempts.

We can see that its distribution is right-skewed, meaning that a small number of players took way more shots attempts compared to the rest of the NBA. James Harden had the highest number of field goal attempts with 1,044 (shocker), which is about 13 shots per game. Monta Ellis, Lamarcus Aldridge, and Lebron James follow him up at #2, #3, and #4, respectively. Conversely, Greg Smith took the fewest amount of shots at 47, which is almost one shot attempt per every two games. There was an average of 456 shots taken per player (dashed red line), which means each player had 5.5 shot attempts per game.

I decided to limit my dataset to players who took at least 164 shots. I chose this number semi-arbitrarily, but I felt including players that took less than two shots per game was unfair. After sifting through the dating and implementing this filter, the new mean shot attempts rose to 483, which brings the SPG to around 6. André Roberson became the new low, with 168 field goal attempts.

Missing Values

It’s important to look at missing values before we start. If there are missing values in relevant variables we’ll need to account for them.

The only NAs are in the shot clock variable, which is obviously pertinent to our analysis. They only make up 4.3% of our data so one option would be to simply throw those observations out. Let’s take a closer look first and see if we can figure out what went wrong before doing so.

In basketball, when the game clock runs down between 0:24–0:00, the shot clock turns off. My guess is that most, if not all, of these NA values are there because the game clock was less than 24 and there was no shot clock number to record. We can examine this by looking at the distribution of missing shot clock values verses the game clock. We’ll need to alter the game clock into seconds rather than it’s current mm:ss format.

We can see that almost 65% of the missing shot clock values are where the game clock is less than or equal to 24. It becomes obvious why these values are coded as NA — because the shot clock wasn’t turned on when they were taken. I will impute those missing values by replacing them with the game clock value if the game clock is less than or equal to 24.

Now that I’ve imputed most of the NA’s, let’s have another look at the histogram.

We fixed 3,554 missing shot clock values and now we are left with only 2,013. By looking at the histogram, they seem evenly distributed along the game clock. I can’t figure out why this is occurring so i’m assuming there was just some sort of imputation error. Regardless, the remaining only make up 1.6% on the overall data so I will simply remove them.

Selecting Relevant Variables

After accounting for the missing values and selecting only relevant variables, we now have our final dataset. This new dataset consists only of the player’s name, the field goal outcome, and the time on the shot clock when he made the attempt. This is all we need for our analysis.

Shot Clock Analysis

We will need to look at the relationship between the shot clock and the shot percentage for each second of that 24-second period. We can do this by running a linear regression with the shot clock time as our predictor variable, and shot percentage for each second as the response variable. For example, we will calculate the FG% for every shot a player took with 24 seconds remaining on the shot clock, 23 seconds remaining, 22, and so on.

Let’s take a look at the results for Steph Curry:

We can see that my friend was inaccurate — Steph does not become a better shooter as the shot clock runs down. In fact, it looks as if he gets worse as the clock approaches zero. We can even go as far as saying this negative relationship is statistically significant at a 10% significance level (p-value: 0.0919).

Aside: Examining a p-value is a method of interpreting the strength of a regression model; comparing the relationship of its coefficients to its response variable. Standard practice is to use a value of 0.05 (5%) for an assessment of significance, however, 0.10 is commonly used for a looser interpretation of correlation. If a p-value is small, we say the probability that we received a relationship as strong as we did by random chance is extremely unlikely. In fact, we can say the chance we received a relationship as strong/weak as we did is a 1 in (1/p-value) event. In our case, 1 in 1/0.919 or 1 in 11.

This is not to take away from Curry ’s phenomenal shooting ability. Our intuition tells us that as the shot clock runs down, your shot percentage goes down as the nerves kick in. I’m willing to say that Steph’s results are analogous to the results of most, if not all, of the players in the league. However, our goal was to see if any players broke from the norm and actually did the opposite.

After applying same process for Curry to every player in the data, we can see that there are ten players whose shot percentage actually get better as the clock runs down. Here they are graphically:

Notice that the slope for each player here is positive, unlike Curry’s, which had a strong negative correlation. We can see that these ten players cover all five positions as well as a very diverse playing style — there are actually more big men up there than pure shooters. It’s important to note that the players listed are by no means “big name” players, let alone starters for the most part. No player averaged more than 10 points a game that season — Jason Smith had the highest PPG at 8.

We can only separate one player’s results from randomness — Kosta Koufos, who had a p-value of 0.04. Let’s take a closer look.

His increase in shot percentage as the shot clock runs down is apparent. The chance that we got a relationship as strong as we did is 1 in 25, so it’s rather significant. My initial hypothesis that no one player would have a statistically significant increase was incorrect.

So if the shot clock is running down and you are desperate for a basket, call a timeout and bring this guy off your bench:

Kosta Koufos: Via NBC Sports

Limitations of the Findings

It’s important to note that I didn’t weigh the number of attempts taken at each second on the clock differently. For example, if Steph took 100 shots when there were 20 seconds left, and only 10 shots when there are 5 seconds left, those percentages were counted equally. Although I played around with different weights, it didn’t drastically alter the results. I couldn’t decide on a convention that was correct so I stuck with weighing them equally.

And as I mentioned earlier, this is only data from the 2014–2015 season, which is about three years old at the time of writing this. Therefore, it is hard to take this as absolute fact and apply it to today’s NBA. Kosta could have had an unusually good season that will never be repeated. However, in reality, many statistical analyses are done with the most recent data available and I do believe it to hold true for the most part.

Chris Buetti

Written by

Data Engineer at NBCUniversal, B.S. Mathematical Statistics Wake Forest University 2017, New York, NY