Playground Mining: Understanding NBA Teams’ Identity

Julien Milon
Read Between Numbers
11 min readFeb 12, 2020

Sports analytics are booming in the sport professional world, whether it focuses on team or individual sports, and on the amenities, on media, commercial strategy, …

Team sports in general are really extremely complex to analyse, and more specifically basketball: the play reaches a certain level of complexity, because most of the time all five players find its role through the action taking place, may they have touched the ball, or not. They all contribute to the outcome of the game. The NBA, since a few years now, it is setting up cameras and developing artificial intelligence to collect, store and analyse a large number of data both on plays and players.

For a few years now data is analysed in order to define “play type”. They are really interesting to understand teams strategies and games. Unfortunately, european basketball seems a little bit late on those capital studies and I find it an opportunity to analyze its game in this article and the many others to come. There is, in fact a significant difference between NBA and european basketball, although some NBA teams are getting closer and closer to european style with the increase in the number of european players and european coaches. The questions I am not answering because there is no enough data on european championships are : How can we characterize games to define teams’ play style and can we define some of them for the european championships ? The NBA’s game evolution throughout the decades since the game statistics are stored could also be parsed using data analysis and machine learning.

Speed, 3-points-shooting and pick and roll plays : this is what characterize the best the all current NBA. Through from the NBA, a multitude of different play types and styles emanate over time and get closer to european style on delayed game (offense). Some of them often even define all the team, such as the San Antonio Spurs, one which is well known for its rapid movement of the ball. In this article, I attend to show the result of my work on data that I gathered from the NBA official web site and my analysis of those results. I will characterize the game styles or identity of NBA teams.

Classifiate and build the Model

Imports

For the recall, my study aims at assembling teams by their game style. A group will be named cluster and to create a cluster I have to use unsupervised algorithm. I am using KMeans method and extracting libraries in order to classify.

Librarie imported for Classification

Loading Data

The data I decided to use is loaded from the official website of the NBA (section Teams — Play Type). I based my study on the former seasons, that is to say from 2015–2016 to 2018–2019. The data highlights 11 play types, being: Transition, Isolation, Pick & Roll Ball Handler (P&R_BH), Pick & Roll Roll Man (P&R_RM), Post up, Spot up (Catch and Shoot), Handoff, Cut, Off Screen, Putbacks, Misc. I had access to data from a few NBA regular seasons, though I wish I had access to 2012–2015 NBA seasons data which contains information on the San Antonio Spurs that have a game style that I love that is a quite european style. When I started my work, I could already predict a few things. First, all of the NBA teams would have a high percentage in transition, pick and roll ball handler and spot up. Second, the Houston Rockets (with James Harden) would record a high percentage of isolation. Last but not least, I could not wait to see what the data could say about the Golden States Warriors on the period 2015–2019 (with its 3 snipers Stephen Curry, Klay Thompson and Kevin Durant)

Offensive Frequency Play Types by Team — Season

Number of Clusters : The Elbow Method

The first and fundamental thing to do on any unsupervised algorithm is to determine the optimal number of clusters name k, hence the use here of the elbow method, one of the most popular method to determine the number of clusters.

As its name suggests, the method consists in building a graph with an algorithm detailed below and localize an elbow in it to determine the number of cluster k. This elbow illustrates a distortion which is calculated as the average of the squared distance from the clusters center of the respective clusters using Euclidian distance metric (detail on this site).

In this case, the number of cluster k seems quite hard to determine. Indeed, on the graph we cannot clearly identify an elbow. Some extensive researches led to the conclusion that 7 clusters were the most relevant choice, regarding of the play types it seemed to highlight which I knew was obviously going to be (isolation, pick and roll ball handler used a lot by the Houston Rockets)

Elbow Method

Unsupervised Classification : KMeans Model

Team sports in general, and above all basketball, areall about adapting the strategy to the opponent, but always staying loyal to the team spirit. As I am trying for now to identify teams styles, I will not go any further about how specific teams can adapt its game. I would rather add a little disclaimer: of course, each game is different, and spontaneous and inspired moves and actions during a game would never be entirely described by a mathematical algorithm.

Clustering is all about gathering elements, teams for example, in a multidimensional world (one for every 11 play types) and create groups, that is to say clusters, with the elements that are the closest to others.

[For the bravous one, here is the link to more detail on the mathematical principle and formulas used to calculate the distance between elements and create cluster with KMeans method]

Unsupervised KMeans algorith to classify NBA Teams on Play Types
Exctract of KMeans Model Classification

Results

KMeans model of NBA Teams Classification

The analysis of the results being very subjective, I will share my own point of view. Of course, you might have another opinion, which I would be glad to hear about in the comments. As said before and seen on the data extraction there are 11 play types and 7 clusters. The large number of it leads me to study them by groups of 2 or 3 that are significantly close. I will also illustrate those 7 clusters with the teams that are 3 or 4 times in the same clusters what is significant for game identity.

First, I attend to talk about the following clusters : Pick and roll ball handler NBA style teams (cluster 0), Big men post up NBA style teams (cluster 3), Penetration NBA ‘old school’ style teams (cluster 6). Those three clusters build their offenses play intransition, pick and roll ball handler and spot up, as the NBA classic typical teams that I mentioned previously. The difference lays on the use of isolation, post up and cut play. The pick and roll ball handler NBA style teams contain Portland, which uses a lot of isolation and pick and roll ball handler for Damian Lillard or CJ McCollum, and LA Clippers. Those teams are also the one using the most pick and roll ball handler in the NBA at this time.

Teams using a lot the post up play are the one formed with big people that can play back to the basket such Joël Embiid, Nikola Jokic, Brook Lopez and Giannis Antetokoumpo respectively playing for Philadelphia, Denver and Milwaukee for Lopez and the MVP 2019. Those teams are members of the cluster named big men post up NBA style.

Finally yet importantly, the third cluster, penetration NBA ‘old school’ style teams, equally between the 3 play types defining the NBA clusters, although it’s using a bit less spot up play (catch and shoot) for benefit of penetration plays (isolation, pick and roll roll man and cut). This type of teams gather among Indiana and Oklahoma. This cluster is very close to ‘old school’ NBA teams and also close to european style teams that we will see below, but it uses more transition and speed than european style.

The second group of clusters I would personally want to develop contains two clusters that gather teams that have the closest way from European. Indeed, they use less transitions and fast plays than their counterparts from the NBA. Speaking about European basketball type, it is relevant to visualize the San Antonion Spurs, hence the beginning of this analyse with the cluster Post up European style teams (cluster 1). This style offensive, balanced and offers many possibilities. It uses catch and shoot (named spot up), pick and roll ball handler, although the other play types are quite well balanced, their percentage of frequency during the game picking from 4 to 13%, with 4 play type between 7% and 13%. This is the biggest cluster in terms of number of teams, gathering San Antonio, Minnesota, Memphis, Dallas and New York between 2015 and 2019.

The second cluster uses pick and roll up to 25% of their game, and have fast guards and big people to make the screens : this cluster gathers Utah, Charlotte and Miami. The center of those teams is a big and often uses to make alley oop after the screen, there is the reason why I decided to call this cluster Pick and roll alternation European style teams.

The particularity of each of these two last clusters forming the third group is that a one and only team represent them. The first one bases its game on fast play, small ball and isolation, and though I guess you already know which team I am talking about… Houston ! Not very surprising, their coach Mike D’Antoni said he wanted the Rockets play in ‘6 second or less’ with the arrival of Russel Westbrook. However, between 2015 and 2019, the mind-set stayed still and they used play fast, building the game around James Harden’s offensive skills. I personally hate watching this type of game as European but well, certainly it is a strategy that gets some good results such Western Conference finalist in 2018.

Finally yet importantly, second cluster ‘third group’ : Golden State Warriors, with their unique game. More important the key players Stephen Curry, Kevin Durant, and Klay Thompson, those play a lot spot up, but a lot more transition and particularly less pick and roll. Between 2015 and 2019, the Golden State Warriors is the team using the most ball movement and screens without the ball that why the study shows that this team is the one with the highest percentage of cut and off screen plays. I would have loved to be able to get the data to compare this team to the San Antonio Spurs between 2012 and 2015.

You would be right if you would tell me I have not talk about all NBA teams. As a matter of fact, I decided to focus on teams, which were three, or four times in the cluster during the last 4 seasons. Of course, the game offensive style changes depending on the players and I will dedicate the following paragraphs to the classification of the teams in the current season.

Visualization of the different Game Styles

Imports & Radar Chart Building

Librarie imported for Visualization

[You could find the Radar Factory class on my script with the link below]

Radar Chart of the 7 Clusters

The radar chart seems like the best type of graphic to visualize the use of all play types by each cluster. The fact that this graph represents seven clusters makes the reading a little bit tough however, it highlights that many clusters are very close.

Gregg Popovich said not long time ago that the NBA was more and more boring on their game style and that each team copies each other. Those types of statistics and graphics prove him right on the offensive teams’ styles, although the Houston Rocket and the Golden State differ to other teams.

On the defensive side, I really love what is making the head coach of the Toronto Raptors with many changes in defensive play types, while other teams use man-to-man defense for most of the time. It could make an interesting subject of analysis with more statistics on defensive plays.

Classification for 2019

In this part, I attend to use the model and clusters built previously to see in which game style belongs each team in this season 2019–2020. Half of the games have been played, so this might be relevant.

Script Prediction

Script of Classification for Teams in Season 2019–2020

Results

Results Classification for Teams in Season 2019–2020

As mentioned in a previous section the team offensive game style obviously depends on its players considering teams can change their style. In the first part, I made a list of ‘Game Identity’ with teams that were often in each category, with the explanations. Here we will see all teams, and for the teams mentioned in the first part we will see which style they are playing this Season.

First, the colour green represents teams that are in the same cluster and offensive Identity. We also have seen previously that some clusters could be very close to others, such as the two European clusters, the European clusters and penetration NBA ‘old school’ style teams. Teams that change clusters with another very close are drawn in orange. As far as a team that was in the clusters Post up European style teams and are now in Pick and roll alternation European style teams are concerned, (San Antonio, Dallas, Minnesota) this is because those teams use more Pick & Roll on the first half of this season. For example, Dallas with Luka Doncic being the centre of the team and Dirk Nowitzki’s retirement, they replaced post up in favour of pick and roll. The case of Memphis that was a Post up European style too, but is now penetration NBA ‘old school’ style can be explained by the arrival of Ja Morant who accelerate the game and uses more transition.

Oklahoma makes the opposite way because the team changes its Point Guard, Russell Westbrook to Chris Paul. Chris Paul plays less transition than Russell Westbrook does. Moreover, he alternates more the offensive plays. Same thing with Indiana that has Victor Oladipo injured for the first half of the season.

Golden State Warriors is a red team and is more special, because since 2015 this team has its own Identity. The change is more prominent, but for the same reasons as the orange teams. Since 2015, the Warriors have its three snipers Stephen Curry, Klay Thompson and Kevin Durant, the last one left the franchise and the Splash Brothers (Curry and Thompson) that are seriously injured since last Playoffs. The team changes so much that this year it is not playing as it did in the past few years. Will they play like the past years when Stephen Curry and Klay Thompson are coming back on court? I cannot answer this interesting question, but I cannot wait to at least try.

The black teams are the one that were not mentioned previously because they had no strong Identity defined between the Season 2015–2016 and 2018–2019.

I will work on the next week to see if it is possible to use play type data to predict how many point will score one particular player.

[You can find the script that I made here]

--

--