NBA Game Simulation Using RNN & Adversarial Networks — Part 1

Published in

Sports Analytics and Data Science

6 min readSep 22, 2021

We all heard of different machine learning based models which attempt to characterize , predict, or emphasize different aspects of the game.

We’ve also seen different ways to design a method to slice the data in a way that cause the hidden insights to rise from the underlying data.

Lets observe 3 scenarios :

Scenario A:

A coach is preparing his team for the upcoming game , he goes over and over the different tapes to detect the opponents weak spots and strong spots as well as his own teams’ spots. There might be footage of a previous game between the two, there might not. What if he was able to “look into the future” and watch the game ahead of time ? Obviously knowing what would happen would be one type of prophecy, and he would plan preventive actions to eliminate any uncertainty or failure in the real upcoming game!

Scenario B:

A gaming company wants to enhance their feature for simulating their games. They would like to simulate this not only according to single player attributes but also based on game flow which lifts them to a whole other level.

Scenario C:

A betting company would like to proof their odds by running simulations to validate that the odds are fair.

Research Proposal

Using GAN models or RNN event generation would give us a solid way to actually simulate the game by “learning” the flows of the other games. The deep learning models would take as an input the historical games per team, learn the patterns of that team and generate an entire simulation of a game play by play. These patterns could be individual teams’ characteristics, as well as mutual behavior of a team while playing another specific team of that class. This will be something I will address later in the paper.

What are these models?

GAN — Generative Adversarial Networks is a somewhat newer technique to generate new samples while learning patterns. The generated samples would & should be undetectable from the original set.

Discriminator vs. Generator:

The way the GAN works is setting up two sub models against each other. The classic example is the “Cop vs the Counterfeiter”. The counterfeiter tries to create a fake set of money and sends it to the cop, at first it is pretty easy for the cop to detect which dollars are fake and which aren’t, but as the counterfeiter learns from his mistakes , he becomes more accurate and loyal to the original. Over the iterations between the two, it becomes harder for the cop to detect. In this example, our generator would be the counterfeiter, trying to mimic the nominal behavior while making sure that the discriminator (cop) wont easily detect the differences.

In this particular case , I would consider using the SeqGAN , which is a sequence generator. Sequence -> Sequence. (1)

Word Level Neural Language model — Using this methodology we would try to use a more traditional set, using a sliding window of events to predict the next single upcoming event that would occur , using DL methodologies as RNN or LSTM. Each new observation would depend on the sequence preceding it. Given a new event, the model will re-evaluate itself and come up with the new observation. There are many pros and cons with this method such as domain shifting or local maximums, but generally this method is known to work quite well. (2)

The Data

For the sake of this research I decided to use the data coming from the NBA . We will explore this approach in multiple layers, such as :

Play by play data — This will be our set of events which we will try to simulate
Characterization of team based on clusters — This could be considered an indicator of who our team played against and “how” they play in terms of tactics, style and pace.

Under the play by play data, we have several layers of events. The first layer is the more general one which gives us the basic indication of the flow of the game, We can find a small set of events such as:

EVENTMSGTYPE:

1- shot scored
2- shot missed
3 — FT Attempt
4 — Rebound
5 — Turnover
6 — Foul
7 — Violation
8 — Substitution
9 — Timeout
10 — Jump Ball
11 — Ejection
12 — Start of Period
13 — End of Period

At first I will attempt to create a game sequence using the small set of events, and following that , will attempt to create a more complex version using the EVENTMSG_ACTION_TYPE which contains close to 100 different types of events ranging from different shot types, blocks, steals etc.

Test 1.1

I decided to work with the POC on the higher “macro” level data, which as stated above has a limited amount of events. This would be represented by a “language” of 14–15 words. The idea of this model was to create a sentence completion model where I would supply the model with an initial flow of the game giving it about 10 events (or even less) and it would complete the flow itself.

The model was represented as an LSTM Neural Network with an embedding layer, with a softmax activation due to the categorical classification.

The training set was at a size of 889,632 observations (sequences of events) , and the test set 214,656.

The model will train on sub sequences, which would resemble “chunks” of a game, and predict the next event given the current sequence. I then run this continuously for N events (assuming the mean number of events is N per game) and run on a sliding window the event generator.

Example:

For the sequence of events

[‘12’,’10',’5',’1',’1',’6',’3',’3',’1',’5',’2',’4'] — taken from an actual game, I run the generator over the next series of 60 events and receive the sequence of :

1 13 12 10 13 12 18 13 12 6 9 5 5 8 1 18 13 12 7 6 8 8 18 8 3 8 3 13 12 1 10 2 4 13 12 10 5 6 7 2 4 10 8 7 6 8 8 8 11 11 7 3 8 8 13 12 10 2 4 6

One note that needs to be stated here is that the Neural network given a determined sequence will always give the same prediction. For this model I didn’t want to stiffen the model with “pre-decided” sequences , and therefor added an additional layer of randomness as the following:

Given a sequence set, I pass it through the Neural Network , and receive a list of probabilities which resemble the odds that the next event is each one of the events. I then sample using these weights the next event.

This layer of randomness is quite helpful since the ground truth for this is not concrete. We can be very flexible to the probabilities of the next event and possibly generate a very large of possible simulations for a given game (vs only 1 without) obviously this generation is subject to the weights given, which are deterministic , but still we manage to create a stochastic environment.

Next article: Test 1.2

Word Level Neural Language model — Micro level

In the next article I will dive one level deeper into a more sophisticated sequence of events which contains 80–100 types of events vs the POC of ~15. We would expect a more diverse well distributed (not necessarily equally) with more permutations to simulate.

(1) SeqGAN https://arxiv.org/abs/1609.05473#:~:text=SeqGAN%3A%20Sequence%20Generative%20Adversarial%20Nets%20with%20Policy%20Gradient,-Lantao%20Yu%2C%20Weinan&text=Modeling%20the%20data%20generator%20as,directly%20performing%20gradient%20policy%20update.

(2) RNN “Seq2Seq” https://papers.nips.cc/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf