The Worst Passes in Ultimate — Modelling Completion Likelihood

Colin Scott
4 min readSep 11, 2023

--

Continuing on from my work developing an xG model for ultimate frisbee, I decided I wanted to start developing a more complex model, that gives value to all actions. I realised that part of this process should involve an investigation into how often different types of passes are actually completed. This will be useful for judging the decision making and throwing ability of a player.

The aim of this is eventually to find the optimal strategy in different situations, by finding a balance of how likely a pass will be completed, and how threatening it is. In this post, I will only investigate completions. I decided to call the model expected completions, or “xC”, in keeping with the naming convention of xG.

Preparing the Data

Once again I used every event from 2021–2023 for this model. Of course, we are only investigating passes, so I needed to select them. Passes correspond to the event types 19 (Goal), 20 (Drop), 22 (Throwaway), and 23 (Callahan). I also found that some drops/throwaways were duplicated, so I removed them.

I also decided to remove point blocks, which seem to be recorded as throwaways with no distance travelled, along with some similar drop data that seemed to be incorrect when I watched back some footage. Finally, each pass’s end location can either be recorded in ‘receiverX’ and ‘receiverY’ or ‘turnoverX’ and ‘turnoverY’, so I combined these into the columns ‘toX’ and ‘toY’.

# Get passes
passTypes = [18, 19, 20, 22, 23]
PASSES = GAME_EVENTS.loc[GAME_EVENTS['type'].apply(lambda x: x in passTypes)]

# Remove duplicate events
duplicates = PASSES.loc[((PASSES['type'] == 22) | (PASSES['type'] == 20)) &
(PASSES.duplicated(subset=['line', 'thrower', 'throwerX', 'throwerY', 'gameID'], keep='first'))]
PASSES = PASSES.drop(duplicates.index)

# Combine end locations into 2 columns
PASSES['toX'] = PASSES['turnoverX'].fillna(PASSES['receiverX'])
PASSES['toY'] = PASSES['turnoverY'].fillna(PASSES['receiverY'])

# Remove point blocks and weird drop data
pointBlocks = PASSES.loc[(PASSES['type'] == 22) & (PASSES['travelledX'] < 0.1) & (PASSES['travelledY'] < 0.1)]
PASSES = PASSES.drop(pointBlocks.index)
badDropData = PASSES.loc[(PASSES['type'] == 20) & (PASSES['travelledX'] == 0) & (PASSES['travelledY'] == 0)]
PASSES = PASSES.drop(badDropData.index)

# Drop events with missing data
PASSES = PASSES.dropna(subset=['throwerX', 'throwerY', 'toX', 'toY'])

Now, we need a single column to mark whether a throw was a completion or not, to use as a target variable. This is very simple, as all events of type 18 or 19 are completions, and others are not.

# Mark completions
PASSES['completion'] = 0
compTypes = [18, 19]
completions = PASSES.loc[PASSES['type'].apply(lambda x: x in compTypes)]
PASSES.loc[completions.index, 'completion'] = 1

Decisions for the Model

I experimented a lot with what input variables to use for the calculation of this model. I wanted to include both thrower and receiver location, and the distance the disc travelled. I also was aware that x values are negative for one half of the pitch and positive for the other. I wanted to allow this non-symmetry to an extent but also create something that was mostly symmetrical. I also wanted to use linear combinations of variables to allow for more complexity.

I also had another consideration, though. Complexity should not be too high because the end location of the throw should not be treated as exact. By this I mean — the thrower won’t throw it to the exact location they were aiming for, so the function should be smooth so as to give a similar value across the general area the thrower is aiming for.

Anyway, I ended up using the magnitude of ‘throwerX’, ‘toX’, and ‘travelledX’ — a new variable telling us how far the disc travelled in the x direction — while also creating variables to say which side of the pitch the disc was on. The normal y values were used and linear combinations of all variables up to 3 variable combinations were created. Complexity was limited using L1 and L2 regularisation, the model made using XGBoost.

Results

After creating the model, I used it to give an ‘xC’ value to every pass from the 2023 season. This value is an estimate of the probability that a pass will be a completion, i.e. that it reaches the receiver and is caught. Below are the passes with the highest and lowest xC values from that season.

The 150 highest-xC passes (left) and lowest-xC passes (right) made in the 2023 season.

So, according to this model, the passes that are most likely to be completed are short, generally in your own half, and often away from the centre. All of the passes in the plot on the left have been assigned an xC greater than 98%.

Meanwhile, we see that the worst passes in terms of xC are all hucks, either heading out to very wide areas or coming from wide areas to the centre near the front of the endzone. The passes in this plot all have an xC of 40-50%.

So, now I hand over to you. Do you think that these plots and xC values make sense? Personally, I think they seem fairly reasonable, but I feel that 40–50% might be a bit low for the worst passes. The model can very easily be tweaked though, so let me know what you think!

--

--

Colin Scott

Data enthusiast and occasional ultimate frisbee player