# Python and Basketball: When did the NBA become all about the three?

In an article written January 23 of this year on NBA.com (http://www.nba.com/magic/news/3-point-shot-more-valuable-ever-20170123) , a telling statistic was shared. “This season, only Houston, Golden State, Cleveland, Boston and the Los Angeles Clippers rank in the NBA’s top 10 in 3-point makes, attempts and accuracy percentage”. These teams made up the NBA finals matchup, 3 out of the 4 Conference Finals competitors, and all 5 teams made the playoffs. It’s clear that in today’s NBA, 3 point success correlates with team success. But when did the three pointer become so prominent?

To explore this question, I wrote a python script (full code here: https://github.com/kevinwan1996/NBA-Stats-Analysis-) to scrape some data from http://www.basketball-reference.com/ and do some analysis. I classified 3 point shooters into clusters (which I called tiers) using the Mean Shift Algorithm. This algorithm works by “hill climbing” up each data point in a given radius (called the bandwidth) until you can’t anymore. Once we hit a point where we can’t take another step (access another data point) without reducing the number of data points in the radius, we take the mean of all the data in the radius and call that the cluster center. As such, we don’t control the number of clusters found by the algorithm and Mean Shift is a textbook example of unsupervised learning.

I used a couple of different statistics to classify the three point shooters. The first was 3P% vs. 3P made. I wanted to identify the tiers, or clusters, of shooters while taking into account both accuracy and volume. The NBA is more nuanced than just “good” shooters and “bad” shooters, so I wanted to split them into data-discovered clusters to categorize them. Generally, using the Mean Shift Algorithm, I found 4 tiers of shooters, with tier 1 having the best shooters. I then looked at the change in shooting percentage and volume of each tier through the seasons. The second thing I looked at was the league wide change in the number of 2 pointers attempted versus 3 pointers attempted to discover any trends. The third statistic scrutinized was how many wins each tier of 3 point shooter collectively contributes over time. This was looked at using the Win Shares statistic. I will compare the average win shares that shooters that I have ranked as “Tier 1” and “Tier 2” contribute to their teams over the years. If this number rises, it will indicate either a) three’s directly contribute to more wins or b) better players are shooting more threes. Regardless, this will indicate a rise in three point importance and will confirm whether shooting more threes is actually a viable strategy to win more games.

Additionally, I also separated the players into Guards and Forwards while looking at their three point shooting. It wouldn’t be fair to compare Forwards and Guards as Guards are clearly better shooters. Furthermore, this distinction will highlight the emergence of the “Stretch 4” in the NBA. Centers were thrown out due to insufficient 3 point attempt data.

3PT% Vs. 3PM

The change in these 37 years is enormous. To find when the change occurred, I found the change in both makes and percentage for each tier from 1980–2017.

The 3pt shot wasn’t a huge factor during the 80s and 90s. The first year it was introduced, the San Diego Clippers led the league in 3-point attempts, 6.6 per game. The average team took 2.8 3-pointers a night. Now we have individual players taking more than that in a game!

This quote by Larry Bird further supports the notion that the three was an afterthought in his era: “Somebody asked me in an interview if back in the 1980s did your coach design plays for you to take 3-point shots. We never thought of that. If you look back at 1980 we didn’t take a lot. We wanted to pound it inside.” As the years progressed, the three gained popularity, but the thought of pounding it inside continued through the 1990s and early 2000s with Hakeem, Shaq, and Tim Duncan dominating. Furthermore, the generational stars in these eras, Michael Jordan and Kobe Bryant were mid-range assassins and drivers primarily. The three point line still hadn’t been fully adopted.

As seen in the graph, the 3 point shot spiked a little in 2006. This was due to Ray Allen’s best season, when he hit 269 threes. However, it’s clear he was far above the rest of the league (second place Gilbert Arenas hit 199), the three hadn’t truly “arrived”. 2013 was another spike, but it was again really only led by one transcendent player. Steph Curry hit 272, but he was more than 60 ahead of second place. The league hadn’t caught up. According to the data and the eye test, the year the three pointer gained league wide prominence it has now was 2015. The truly terrible shooters, tier 4, shot on average, 27.5% with 158 makes, tier 3 jumped to 37.1% on 126 makes, tier 2 averaged 40.9% with 198 makes, and tier 1 averaged 44.3% with 286 makes (Steph Curry was in a tier of his own). Even removing the “Curry tier”, tier 2 shooters were shooting well enough to essentially lead the league in previous years, the 3 pointer had arrived, at least based on makes and percentage.

The three pointer is a shot for guards. However, forwards have begun to adopt as post up centers are being traded in for stretch 4s and 5s. The 3 point data for forwards is shown below.

What is curious about this data is that the volume for forwards doesn’t seem to be increasing for the forwards in tier 1. However, the spike in 3pt volume coincides with the shortened 3 point line from 1995–97, but then the 3 point shooting falls off of a cliff. As the years progressed, there have been great 3 point shooting forwards (ex. Robert Horry, Kevin Garnett, Arvydas Sabonis, Channing Frye, Dirk Nowitzki), so the tier 1 for forwards has been saturated. Now as most of these great forwards are retired or retiring, the need for shooting forwards has grown more prominent. This has led to a closer clustering of forward shooters in the league. There aren’t as many transcendent players, but more and more forwards are shooting. Thus, the tier 1 shooting is declining, but the other tiers have generally been on the upswing. Furthermore, in 2017, there are now only 3 tiers of shooters rather than 4. The forward shooters have clumped together so thoroughly that an entire tier has been removed. The averages for the three tiers are (from 3 to 1, respectively): 26% with 11 makes, 36% with 103 makes, and 38% with 189 makes. There are some forwards who are clearly not shooters, but it’s clear that forwards are transitioning. With the advent of the “modern three” being in 2015, it’s quite possible that forwards simply haven’t adjusted to the game yet. As such, we should see further clustering of the shooters as both 3 point percentages and volumes increase.

2PA vs. 3PA

This graph marks the ratio of 3 point attempts to total attempts throughout the years in an effort to determine how often players are taking 3 pointers. This shows a strong correlation between time and an increased amount of 3 pointers attempted. There are two spikes, one in 1995–1997 due to the shortened three point line, and one currently. The trajectory of the spike that is currently occurring is far steeper and it appears to begin in 2013. While this is earlier than the previous estimation that the three point explosion of 2015, it is important to remember that this is attempts, not makes. This marks the start of when 3 point shooting became more common while 2015 marks the start of when 3 point shooting became more common and more effective.

Win Shares of Tier 1 and 2 shooters over time

Win Shares is a stat that is used to measure how many wins a player contributes to their team over the season. I will take the average Win Shares for the players in the top two tiers of shooters since 2013. I will then take the average Win Shares for players in the top two tiers of shooters from 1980–2013. The comparison of the average Win Shares from 2013-present and 1980–2013 is illustrated below.

`def average_win_shares(year_1, year_2):`
`    W_S = 0.0`
`    total_players = 0`
`    for year in range(year_1, year_2 + 1):`
`    Z = (position_statistic_df(year, '3P%', '3P', 'Guard',    'totals'))`
`    Y = (position_statistic_df(year, '3P%', '3P', 'Forward', 'totals'))`
`    Z = df_to_array(Z.append(Y))`
`    threes, centers = Mean_Shift_classify(Z, '1980 Three Point Shooting Tiers Forwards' ,'3P%', '3PM')`
`    df = get_all_players_stats_year(year, 'advanced')`
`    for key in threes:`
`         if threes[key] == len(centers) - 1 or threes[key] == len(centers) - 2:`
`              total_players += 1`
`              W_S +=  get_win_shares(df, key)`
`return float(W_S)/total_players`

This is the code used to calculate the average Win Shares of players in the top two tiers across an interval of years.

• The first two lines initiate variables to zero
• The next two lines call functions that return Pandas DataFrames, holding all of the counting stats for a specific position (Guard or Forward)
• Then I merge the two DataFrames
• Then I use the Mean Shift Algorithm, which returns a dictionary that holds all the players and what label (tier) they were assigned to. It also returns the centers of each cluster