NBA Court Vision

Graham McAlister
7 min readJun 4, 2018

Introduction

In this blog post, I’m going to walk through the process of building a system to predict which home court an NBA game is being played on. I picked this as the first project to work on while going through the wonderful fast.ai course. I won’t heavily discuss any of the theoretical aspects behind deep learning, but rather discuss pitfalls I ran into while building the model. I have tried to link to blogs explaining relevant concepts (much better than I ever could) if you’re interested in learning more. There are plenty of resources discussing technical aspects of model tuning in DL, but I couldn’t find many that walk through the practical steps of finding model pain points and trouble shooting them. This is my attempt at building a resource I would have appreciated a month ago.

Getting the data

I knew I wanted to start experimenting with all of the new techniques I was learning but couldn’t find a data set that I was really interested in. So, I built my own. I’m a big sports fan and have been watching a lot of playoff basketball lately so something related to the NBA seemed like a natural choice. I wrote a little script using selenium to search for YouTube videos of NBA game highlights and take some screenshots. My first cut of pictures was ~3500 images at about 3mb each with 10 pictures per game. I took 5 screenshots in the first half then skipped to the second half (which was about 5 minutes into most videos) to get both teams playing offense on both sides of the court. Most of the images look something like this:

One reason I chose this data set is because there are multiple things you can do with it. A couple ideas I have so far on things to do:

  1. Identify which home court the game is being played on
  2. Predict which two teams are playing — seems like using model 1 somehow makes sense here.
  3. Use object detection to find all players, the ball and the basket in each frame.
  4. Combine models 2 and 3 to tag the team each player belongs to.
  5. Get a new data set where there is a series of images for every few frames of the video. Use the step 4 model to generate images of player and ball movement.

In order to start step 1, I’m going to need some labels. Unfortunately, even when you follow a uniform search pattern on YouTube (mine was “[Team 1] vs. [Team 2] highlights 2018”) different games and types of videos appear first in the results (the slot my simple selenium script pulled from). For instance, searching for a game between the Chicago Bulls and the Utah Jazz gave me a highlight reel of James Harden from a recent Rockets vs. Jazz playoff game. With no pattern to follow for knowing the home team, I had to tag the photos by hand (don’t try this at home, kids).

Modeling

With a dataset labeled with the correct home team in hand, I started building a model. I used the training method employed by fast.ai which is basically just fine-tuning a ResNet34 model. First, identify an appropriate learning rate, fine-tune the last couple of layers in the network, then train all of the layers with different learning rates (early layers have lower learning rates).

I was super excited to look at the results, but was slightly disappointed with what I was seeing. The accuracy in the validation set was only about 84%. Although there are 30 classes each team has a pretty distinct home court so I was expecting something a bit better than that. So, let’s figure out together what’s going on here!

My first thought is to look at the accuracy by team:

We see a good number of the teams are perfect but others have an accuracy well below 70% which seems strange. As an NBA fan I know that those teams have averagely distinct courts — this kind of skew is not something I expected. My next thought is the validation set. I had been careful when splitting the data to make sure the same games weren’t included in the train and val sets so I made the validation set the last 25% of the data.

These three pictures together clearly tell the story. Teams that had the lowest accuracy had the lowest ratios of train examples:validation examples and also the most images in the validation set by count. It looks like the raptors have no examples in the training set so their 40% accuracy seems like a miracle. Fortunately, it’s also a problem that looks fixable.

Taking a closer look, we notice that the pattern seems to be related to team names - teams towards the end of the alphabet have the highest validation set counts. Sure enough, when I looked at the .csv containing the file paths, the teams had been sorted when I scp’d them to my remote box. By taking the last cut of games as my validation set, I had given the model very few examples of several teams for training while those same teams made up the bulk of the validation set!

Like I said, definitely fixable. I changed the validation set to make sure that each team had about 25% of their home games in the validation set and that the games in each set were distinct. I reran the same modeling procedure and saw significantly better results — the accuracy was around 89%. Much better, but still definitely room to improve!

To diagnose the model this time, let’s take a look at the pictures that we classified incorrectly. Some of the records we missed look like this:

Predicted and actual labels for the second iteration of the model

Photos like these are not pictures I’d expect the model to get right. The middle picture has the best chance but keep in mind the model hasn’t seen many close ups of Jazz jerseys (if any at all); hardcore fans will recognize the court does kind of look like the Nets home floor (the class the model predicted). But there are also a fair amount of pictures the model seemingly should have gotten correct. In fact, going through them it looks like the majority of these photos are of Raptors games. More importantly, I notice that the Raptors seem to have two different courts they flip between:

So we’re going to need more data for the them. While we’re at, might as well stock up on some of the other teams that have lower accuracy for whatever reason (with only 3500 images, I figure more data is definitely going to help our model).

We tag the new data (~500 photos — more fun!) and then add it to the model. Again, we follow the same process and find that our accuracy is now up to about 96%. So, one more time we take a look at the pictures we missed (only 36/930!!) and see that they are mostly closeups of players or weird end of video shots we got as a result of the time skip to the second half for videos less than 5 minutes long (damn you, design decisions). Only around 5 of the 36 misses were images that I would expect the model to get right so I’m calling the accuracy somewhere around 99% which for 30 classes is a result that I’m pretty happy about.

Let’s take a look at the good work we did!

Images the model got right that I’m pretty impressed with. Even the most avid of fans might have trouble with top right since it doesn’t have a scoreboard. The others are also pretty impressive with a court in a subscreen, color bleeding and weird angle respectively.

That’s a wrap on this first article. I’ll try to write more as I continue working on different parts of the project. If you have questions feel free to leave them in the comments or drop me a line on LinkedIn; if you’re interested in the code most of it is on my GitHub. Thanks for those of you that made it all the way to bottom!

--

--

Graham McAlister

Data scientist interested in applying ML to everything - especially sports and data for good opportunities.