Using Location Statistics to map AFL player roles

Where’s Wally … statistics which tell the story of player roles

Published in

Analytics Vidhya

7 min readAug 26, 2020

“Reach For It” by NatalieTracy is licensed under CC BY-SA 2.0

A casual observation from previous research was that some player statistics contained sufficient information to define player positions in the absence of video footage and/or actual x-y coordinates of ball action.

Classifying AFL Team Stats

Identifying meaningful structures in AFL team stats using cluster analysis

medium.com

Our starting point is the following charts which shows the relative frequency of each variable grouped by player position. The chart on the left was a painful process of manually ordering of the statistics — it does clearly indicate the average location on the field — Defensive 50, Midfield, Forward 50 — which in turn corresponds well to player positions.

Breakdown of Player Positions by Location-Information Statistics : Manual Process vs PCA

The principal components chart on the right more elegantly illustrates our theory that location-information statistics can be a viable way of distinguishing player positions. The axis through the colour clusters can be interpreted as various locations on the field ie red=Defensive 50, beige = Forward 50, blue = Midfield.

In the analysis that follows, we backsolve player positions through the season using the location statistics as a proxy for the relative position on the field with a robust machine learning algorithm.

Player roles are fluid in that while there may be a dominant role that they play, there are occasions that players may drift to another area of the field and take on another role for short periods of time. We will use our data subset to identify the type of role, rather than compare the quality of the player performance in that role. Doing so allows us to identify the mismatch between AFL Fantasy player position labels and actual player actions — with the hope of gaining an edge in terms of player selection.

As part of this analysis, we have developed a simple app which allows users to explore the model results for each player (in the “Player” tab).

Data Quality

For this article, we use the player positions as defined by the AFL fantasy or dream team (DT) competition from Footywire as the source of truth. Here’s the rub : they are absolutely correct only for the end of the season. Why?

In DT, there are four key positions — defence, forward, midfield and ruck. Most players fall into only one of these categories.
Around 15% of players are dual position players — which means that they have been designated multiple roles. This designation is determined by Champion Data — our understanding is that it uses players statistics for the last 50 games. Dual position players are announced at the start of each season, and then after every 6 games thereafter.
Initial analysis of the data from Footywire does not reflect any player upgrades from single to dual position over the individual season s— which means that the position for the last match for the season is applied retrospectively to all past matches for the season — hence to look at these changes we need to rely on written articles which is non-ideal.

Dual position players are extra valuable in DT as they provide flexibility in terms of team construction. Hence the ability to identify players who have a higher chance of being announced as a DPP during the season was the initial motivation for this analysis.

Data and Methodology

Data for player positions and location statistics for the 2015-2020 season was scraped from Footywire which comprised a total of 48,000 player-match combinations.

The model was trained on single player positions for the 2015–2019 seasons and validated on the 2020 single player position set. The final analysis is applied to the data across the full player dataset. The variables that we have identified as providing information on field location are :

Forward 50 : Goals (GL), Behinds (BH), Inside 50s (I5), Tackles Inside 50 (T5), Marks Inside 50 (M5), Goal Assists (GA)
Defensive 50 : Rebound 50s (R5)
Midfield : Centre Clearances (CC), Score Involvements excluding Goals (SX)
not in Forward 50 : Tackles not Inside 50 (TX), Marks not Inside 50 (MX) -both calculated as the difference between the total vs the inside 50 statistic.

Additionally we have also included player information such as experience (logarithm of game number) and height in order to improve model results.

Given the short format games in 2020, we have adjusted the season statistics by a 1.25 multiplier so that they can be standardised on the same scale.

From the dataset we build several multi-class classification models — random forest, gradient boosting and knn — and aggregate the predictions into the final model. In the final unseen set of 2020 data, the model classifies 77% of observations accurately according to the DT player position data.

Results and Observations

In order to fully appreciate and understand the model predictions, we split the results into three categories (1) single position players accurately predicted (2) single position players inconsistently predicted and (3) predictions for dual position players.

The model calculates the number of votes of each role type for a given player, where the each vote can be interpreted as the percentage of time performing each player role (and hence mapped to specific locations on the field). Within the results, the model corresponds highest to the listed DT player positions for defenders and the lowest with midfielders.

Consistency of Model Performance vs Champion Data (DT) methodology

The charts for each player concisely presents the following information :

The dominant player role is the darkest colour block for each — Defence, Forward, Midfield and Ruck.
The average player role for the last 10 games at a 0.70 threshold is identified as the right-most observation on the chart. Note that while the DT methodology uses a 50 game average, we attempt to be more current by using fewer games.
In order to benchmark the quality of our results, we also consider the actual team line-ups for recent matches.

Single position players accurately predicted The model accurately predicts the single position players where there is consensus between the DT position and the actual team lineups. These players have generally remained in consistent playing positions across multiple seasons.

Results : Consistently predicted single position players

Single position players inconsistently predicted In the following chart, we identify players which the model has a different consensus opinion for single position players vs the DT methology.

Results : Inconsistently predicted single predicted players

We note the lag in the DT methodology in terms of identifying players who have transitioned positions between seasons. This is because the DT methodology uses a 50 game average — longer than two seasons.
The model assessments for individual rounds presents a richer view of the fluidity of player roles as seasons evolve — allowing us to visualise player breadth.

Dual position players Similar to the inconsistently predicted single position players, we can observe the lag in the DT methodology for DPPs.

Examining the model results in conjunction with the actual team line-up data gives us confidence in the model predictions and results interpretation.

Model Applications

One of the applications of the analysis is to consider the differences between the DT position and model predictions. The table below shows the top player discrepancies for the 2020 season to date.

Discrepancy between DT position and model predictions

Reflections and Next Steps

The model has produced promising results in terms of being able to identify player positions based on the statistics on the field which contain some location information.

Model inaccuracies arise from players who have barely touched the ball during the match — because the model predictions are based on the actual and relative interaction between the variables, and relatively few observations for the match can produce unusual predictions — this would most likely apply to injured players, rookies or serial underperformers.

As part of this analysis, we have developed a simple app which allows users to explore the model results for each player.

https://denisewong1.shinyapps.io/AFLapp/

References

1. Footywire AFL fantasy player rankings (link)

2. Dual position players (link) and (link)

3. Principal component analysis (link)

4. AFL Player Season Prices and Positions (link) and (link)

5. Classifying players’ positions using public data by the Arc (link)

6. Classifying Recent AFL Players by Position by MatterOfStats (link)