Introducing PELE: The Data-Driven Football Commentator

We built an insightful football commentator using Statsbomb data.

Chris Schon
Applied Data Science
7 min readNov 11, 2019

--

adsp.ai/pele

The ball. It’s the single most important object in football. Obviously. The clue is in the name. But think about it very simply for a second, what is football? Football can be boiled down to players from two opposing teams taking a series of actions to move the ball around the pitch to achieve their team’s objectives: score more and concede less. Each player tries to take actions that positively benefit their team’s chances of scoring and not conceding.

This isn’t a story that would go down well with purists, who dream up images of the players’ hearts and emotions dictating the ‘feel’ and the ‘flow’ of the game. These stories resonate with anyone who has played the game, be it cold winter nights at Shoreditch power league or the Champions League final. However the desired end result for everyone is the same: keep the ball away from your goal, and close to the opposition’s.

Statsbomb is one of the leaders in football data analytic

This scientific footballing philosophy has led to new markets in football data collection and analytics. Football data companies such as Statsbomb, Opta and Wyscout support clubs, gamblers and researchers in making objective analysis from game ‘events’. Every pass, dribble, foul is tracked and logged, to provide a foundation for objective analytics, whether it be for player analysis or placing bets.

We were curious, and after attending the Statsbomb conference we got our hands on some of their data. We asked the question: from the raw, event-based data, could we build a real feeling commentator, providing descriptions of events in a game, and give real-time insights into the top-performing players on the pitch?

PELE describing Eriksen’s strike

Introducing PELE

PELE is an interactive GUI that gives you a top-down replay of a game with commentary on each action taken in the game, as well as ‘live’ insights into the strongest and weakest players on the pitch. For example, PELE may just say ‘Kante plays it long to Morata’, but if that’s Kante’s 12th successful pass in the last 15 minutes, PELE will notice that that is a lot for a midfield player in such a short time, and will also say something like ‘That’s 12 successful passes by Kante in the last 15 minutes, he’s dictating the midfield’.

How we did it

Here’s a whistlestop tour of the process taken to develop PELE.

1. Process Statsbomb data using the SPADL framework.

Football event data is collected in a non-standardised and sometimes unintuitive fashion. For PELE to work, each row needs to represent in some form a logical and sequential action. With raw Statsbomb data, it’s often the case that you combine two or three records to resolve that a higher-level concept. For example, in the raw data, a pass in not represented by a single JSON record, but by multiple: the pass is described in one record, and the (intended) receiver with success/failure fields in another.

SPADL, developed by Tom Decroos and his team at the KU Leuven Machine Learning Research Group, gives us a standardised framework with a simplified set of ‘actions’ with a ‘success/fail’ column.

After standardising with SPADL, here’s what the data looks like for each match (I’ve only kept the most important columns here):

Here’s a description of each column:

  • action_id: incremental id of each action
  • timestamp: precise time of the action
  • player: full player name
  • type_name: the type of action
  • result_name: the outcome of the action
  • bodypart_name: body part used by the player to complete the action
  • start_x/start_y/end_x/end_y provide the pitch coordinates for the start and finish of each action

2. Inject additional events.

Once we have this consistent structure for our data, we can begin to picture how PELE would describe the actions in the match. There were some vital pieces missing, however. For example, whenever the ball went out for a goal-kick from an off-target shot the ball would immediately travel back upfield with no pause for the goal-kick. So we injected additional rows to introduce these pauses, covering all dead-ball situations, kickoffs and the half-time and full-time whistles.

An example injection for a throw-in

3. Build comments table for player actions.

Not to say football is boring or anything, but there are on average over 2000 actions in each game, most of which don’t amount to anything particularly noteworthy. Around 1700 actions will be passes or dribbles. Most ‘dribbles’ are very short too. We decided to annotate each action based on 4 categories:

  • action type (pass/dribble/cross/shot etc..)
  • result (success/fail/offside)
  • direction (forward/backward/sideways/insignificant)
  • distance (short/medium/long)

We built a table to describe all possible combinations of the above categories. For this, we made a lookup table of over 200 ‘comments’. PELE randomly samples an ‘action’ comment from this table where the action matches the type/result/direction/distance in the table. We then inject the relevant player names for that action. For example, a pass that is short, sideways and successfully from Eriksen to Alli, PELE could say

Short ball sideways from Eriksen to Alli

or simply

Short ball square to Alli

4. Calculate insights.

To make it so that PELE doesn’t only literally describe every single action of the game and nothing else, we brainstormed a bit and agreed it would be cool to have a running commentary on the best and worst performers on the pitch in real-time, based on their running in-game stats, as well as similar stats at the team level.

We did this by aggregating the previous 10 games of the season and building frequency distributions of successful and unsuccessful passes, dribbles, crosses, tackles and interceptions in 5, 10 and 15 minute time windows. Then we could know what would be unusually good or bad for a player or team to do in those time windows.

We built running statistics for all the players on the pitch for the game, so PELE could comment on individual performance. For example, PELE may say something like:

Sloppy period of play from Jan Vertonghen, who’s misplaced 3 passes in the last 5 minutes

or a positive one like:

N’Golo Kante has touched the ball 10 times in the last 5 minutes, he’s winning the midfield battle

Similar stats are calculated for each side so that PELE can provide the same insights at a team level.

5. Visualise in Tableau.

With the data all in the right structure and PELE ready to provide action and insight commentary, we just needed to bring it to life in an interactive front-end. Tableau was good for this; with its playback options and trace functionality, we could build a Football Manager-like experience.

Bringing PELE to life

GOAL! Here’s a stream of PELE’s commentary alongside Eriksen’s long range strike at Stamford Bridge. We can see under the hood a bit here: PELE annotates each action a direction and distance to understand how to describe it. PELE knows exactly the distance between each action, making for detailed insights.

PELE insights

Here’s a sequence of play where PELE picks up on a few insights in a short period of time:

PELE provides 4 insights in this passage:

  • Tottenham completed their 3rd interception in 15 minutes, showing they were reading the game well but perhaps were on the back foot.
  • Hazard had made 9 dribbles in the last 15 minutes.
  • Hazard also made 4 successful passes in the last 15 minutes, a high number for a forward.
  • Alvaro Morata had also completed 4 passes in the last 15 minutes, showing good hold up play.

This all in the minutes leading to Chelsea’s first goal — perhaps an indication that things were going in Chelsea’s favour.

Try PELE for yourself

Check out PELE here! We currently have two games live from the 17/18 premier league season: Chelsea vs. Tottenham and Arsenal vs. Everton. Enjoy!

Applied Data Science Partners is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website.

--

--