Data Science: The science of moving dots in Basketball and shot value

Haider Hussain
Analytics Vidhya
Published in
7 min readSep 4, 2019

By representing players as dots and keeping track of their movement, data science methods can now study player behavior and provide insight to coaches/teams executives on better game planning and what plays to execute down the stretch for the highest success rate. Statistics have always been utilized in sports but how accurately are basic statistics measuring a players behavior? Are they an accurate measurement of what coaches and sports analyst have a ‘feel’ for? Having a ‘feel’ for the game and using the eye test is an important skill that coaches and experts have but how can you quantify this qualitative ‘feel’? For example in basketball, an average fan can tell when a player is taking a bad shot because it was heavily contested or there was a higher percentage shot available that another teammate could have taken, but how do you measure that? Companies like Second Spectrum are teaming with various NBA teams and use machine learning and spatio-temporal pattern recognition to help answer these questions. A simple description/illustration of how this information is captured through the STATS SportVU system :

‘STATS SportVU utilizes a six-camera system installed in basketball arenas to track the real-time positions of players and the ball 25 times per second. Utilizing this tracking data, STATS is able to create a wealth of innovative statistics based on speed, distance, player separation and ball possession.’

By utilizing these tools every single movement a player makes at every second can be captured as data. The machine learning model then uses this data/movement to learn a players behavior. The machine can learn to recognize important terminology such as what consists of a pass, a rebound, a shot. Even advanced plays such as pick/rolls, post-ups and down screen can be identified. What is even more amazing is that as more information is gathered the machines are learning and more accurate with time. In simpler terms what the machine learning model is doing is sorting what is a shot and what isn’t a shot or what is a pick and roll and what isn’t a pick and roll. It gathers information by classifying and identifying many variations of a play. Ultimately the machines can then identify when these plays were the most effective by identifying when and what type of play led to to success (i.e. points).

Here are some visuals of what the machine learning model is capturing by the second:

Machine learning and spatio temporal recognition can identify terminology such as rebounding
Machine learning and passing example
Machine learning postup example
Machine learning identifying a pick and roll
Machine learning and making a shot

Coaching staff and teams can then utilize this information to start identifying things such as which players return the highest success rate after utilizing a pick and roll. Which combination players are the most effective when the game is on the line? Which plays increase opportunities to score when the team needs it the most?

Highest Shot Value

A more specific application of this technology is shot value. Currently the most common and widely accepted method of establishing whether or not a player is a good shot maker is their effective field goal percentage (EFG%). It is simply a ratio of the shots made over the shots taken. Because 3 pointers are worth more EFG also weighs 3 pointers more heavily. Simply put EFG captures the ability to make a shot but it does not quantify the quality of the shot. Am I getting my best shooters the best shots available to yield the highest success? How much do factors like was the shot contested or how close was the defender to the shooter matter? How can I capture ability and quality of a shot and create a better ?

Some of the more important features second spectrum uses to define effective shot quality (ESQ) are defender distance to shooter and catch&shoot vs. off the dribble shot. They then enhance the EFG metric by factoring in ESQ to truly measure shooting value. Some other features second spectrum includes to determine shot quality are shot distance, shot angle, defender distance, defender angle, player speed and player velocity angle.

Visual of EFG:

Effective Field Goal percentage shot

Above is an illustration of the chances of an average NBA player making the shot from various distances. The closer you are to the basket the higher the chances are. Shooting from outside of the 22 feet line returns higher points (3 points). Besides EFG (shooting ability) there are other factors that effect shot value.

Components of Shot Quality: Defender distance and Catch&Shoot shot vs. Off the Dribble shot

Heatmap indicating how the same EFG is effected by defender distance.

This heat-map indicates that the further the distance of the defender from the shooter, the higher the chances of that shot going in. As the defenders distance increases by 1 ft the chances of making the shot go in by approximately 9%. Coaches should design plays where shooters are getting there best shooters the most wide open shots. Prior to the STATS Sports VU system information on every single player at every single moment was not being captured.

How catch and shoot shots increase chances of making a shot versus a off the dribble shot

Another important aspect of shot quality is whether the shot was ‘off the dribble’ or a ‘catch and shoot’. Catch and shoot means moments before the shooter received a pass and then shot the ball with no dribble. In a catch and shoot, it is more likely there is some distance between the shooter and defender and the play was designed for the catch and shooter, so the chances of them being open is higher as well. Additionally, off of a catch and shoot it is more likely the shooter was balanced and able to square up when shooting which means higher chance of shot going in and is a better shot to take. Off the dribble means the shooter was probably more off balanced when shooting which takes away from the fundamentals of shooting. It also means if the shooter/player has the ball in hand then there are more eyes on that player and it is very likely the shooter/player is already being defended by the opposing team which means a more difficult shot to take.

Model 5 has the lowest MSE and highest predictability at shot quality.

By taking these variables (defender distance and off-dribble vs. catch-shoot) into consideration as well as some others (i.e. shot distance, shot angle, player speed, player velocity angle) second spectrum designed models to see which model was the best at predicting shot quality. They determine the best model to be the one that returned the lowest mean squared error (MSE). Some of the models that were used included decision trees (using ID3, M5P), logistic regression, Gaussian process regression. Some data sceince methods used to validate the models included train test split and 10 fold cross validation .

EFQ+ vs. ESQ

This graph illustrates a players shooting ability and the quality of the shot they’re taking. Graphs like this can help General Managers, executives and teams make decisions on whether or not the player they are considering to pay millions takes good shots and is a good shooter. What combination of players would a team prefer? Is it better to pay one good shooter who takes good shots or a few good players who take okay shots but overall cost less? Visuals and insights like this can help teams better answer that question.

Real World Application:

Which shot is the best shot?

In this example by utilizing Second Spectrum's technology we can see second by second the shot value of the player. In this case Antetokounmpo has a 34% chance of making this shot when near the free throw line and having a defender a few feet away from him. Should he shoot this shot? Or should he pass the ball to a more open player like Ilyasova for a 3 pointer catch and shoot, who also has a higher chance of making that shot? Or should he pass the ball to Snell who is closer to the basket and has the highest chance for making a basket?

--

--