Tracking N’Golo Kante, the Sung Hero of FIFA World Cup 2018

A quick dive on sports visualization with Python, Seaborn, and Matplotlib

Kevin Rexis Velasco
6 min readMay 1, 2019

The mass amount of data and the methods we humans utilize to collect grows every year. Every corner of the business world is hoping to use it to learn more from our world, and the domain of sports is also apart of that. Values of sports teams, in particular, football(soccer) teams across the world grows every year. This year alone the German-based Transfermarkt website values the whole entire English Premier League, the country’s top flight football league, at a staggering $9.57 Billion. The prize money in the FIFA World Cup, the worlds biggest sports tournament, has also been increasing after every four-year iteration.

With the rise of cost to operate a club, teams are hoping to utilize new age tools of analytics to improve their players, using Big Data to help them draft, develop, and create tactics which before have been traditionally been made by old school gut feelings.

The best player at the 2018 World Cup has not scored a goal. He does not have an assist. He’s made no saves, had no moments that made SportsCenter’s top 10. He’s picked up one yellow card, but other than that, you won’t find him on the score sheet. His name is N’Golo Kante, he plays for France, and he deserves the Golden Ball. In this blog post, I will be using Python, matplotlib, and seaborn to follow the pivotal defensive midfielder’s performance throughout the tournament. In all but one of France’s first four matches, Kante covered more ground than any of his teammates, harassing opponents, cutting out passes, breaking up moves and getting back possession.

Kante vs the GOAT

Being a staunch fan of the sport, I wanted to find any data I could that related to anything soccer. Last fall, Statsbomb released a massive dataset that follows every play made during the 2018 World Cup, which can be found here. The data set is enormous, and I have just scratched the surface of what there is to obtain from it.

Utilizing code found from Tuan Doan Nguyen’s medium blog post found here, I was able to quickly parse through some of the data to create visualizations of Kante’s seven-match performance.

First, I had to create a soccer field, complete with the penalty box, center circle using matplotlib:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Arc

%matplotlib inline
def draw_pitch(ax):
# focus on only half of the pitch
#Pitch Outline & Centre Line
Pitch = Rectangle([0,0], width = 120, height = 80, fill = False)
#Left, Right Penalty Area and midline
LeftPenalty = Rectangle([0,22.3], width = 14.6, height = 35.3, fill = False)
RightPenalty = Rectangle([105.4,22.3], width = 14.6, height = 35.3, fill = False)
midline = ConnectionPatch([60,0], [60,80], "data", "data")

#Left, Right 6-yard Box
LeftSixYard = Rectangle([0,32], width = 4.9, height = 16, fill = False)
RightSixYard = Rectangle([115.1,32], width = 4.9, height = 16, fill = False)


#Prepare Circles
centreCircle = plt.Circle((60,40),8.1,color="black", fill = False)
centreSpot = plt.Circle((60,40),0.71,color="black")
#Penalty spots and Arcs around penalty boxes
leftPenSpot = plt.Circle((9.7,40),0.71,color="black")
rightPenSpot = plt.Circle((110.3,40),0.71,color="black")
leftArc = Arc((9.7,40),height=16.2,width=16.2,angle=0,theta1=310,theta2=50,color="black")
rightArc = Arc((110.3,40),height=16.2,width=16.2,angle=0,theta1=130,theta2=230,color="black")

element = [Pitch, LeftPenalty, RightPenalty, midline, LeftSixYard, RightSixYard, centreCircle,
centreSpot, rightPenSpot, leftPenSpot, leftArc, rightArc]
for i in element:
ax.add_patch(i)

There is a lot going on above but in essence, it’s a collection of lines and circles. Using this, I can overlay a heat map and pass map to give the visualization more meaning. Python’s Seaborn module makes plotting a tidy dataset incredibly easy with ‘.kdeplot()’. Plotting it with location data from a player’s movements during a match will give you something like this:

Cool, we have a contour plot, which groups lines closer to each other where we have more density in our data.

Let’s customise this with a couple of additional arguments:

  • shade: fills the gaps between the lines to give us more of the heatmap effect that we are looking for.
  • n_levels: draws more lines — adding lots of these will blur the lines into a heatmap

and the result:

Utilizing code again from Nguyen, I was then able to put a soccer field and heatmap together, along with some beautiful code to capture a player’s passes, I was able to create Kante’s performance from each match!

Kante Performance Group Stage #1 vs Australia
Group Stage #2 vs Peru
Group Stage #3 vs Denmark
#4 Round of 16 vs Argentina

In the visualizations to follow, blue arrows indicate passes made in the first-half, red passes made in the second-half. the left side of each graph is the defensive side with the right being the offensive side.

Although the data set is unable to recapitulate the player’s defensive dominance, one can see the spread on the field of the player throughout the whole match.

#5 Quarterfinal vs Uruguay

During the first two matches, you can see just how busy and how much of the field he covered frequently. Long arrows indicate successful crosses.

During this game, in particular, Denmark did an amazing job keeping him confined to a smaller area compared to the first matches. But he was still able to complete passes on the offensive end of the field

During this match, he was tasked on keeping an eye on Leo Messi (best player on the Argentinian side) and had to track back more to cut out passes made in their end. Compared to his other matches, he primarily stayed in the defensive side of the pitch.

During the match vs Uruguay, his heat map is more spread out throughout the pitch, and he positioned himself more central compared to the game before.

#6 Semifinal vs Belgium

His performance vs Belgium appears to be more frequent on the defensive side of the field. At the time, Belgium was the heavy favorites and had a potent attacking force with Eden Hazard and Kevin De Bruyne among others. Yet, with Kante, France was able to keep their runs and passes at bay.

The last heat map drastically differs from the other 6 above. Receiving a yellow card early in the game (a foul), he was limited in play and a tactical substitution was made 10 minutes into the second half. It was later revealed that he was actually sick, suffering from gastroenteritis(aka the stomach flu) the night before the final. Speaking from my own experience with the stomach flu, it isn’t hard to understand his lackluster performance. Never the less, France eventually won the final match against the Croatian national team with a final score of 4–2, securing their second World Cup Title. Being too shy, another teammate had to ask his fellow teammates to give Kante trophy during the trophy celebration! during other celebrations that followed, the team also sang of his heroic performance!

In this brief usage of data analytics tools into the data set, I was able to extract and display just one aspect of what can be extracted from the massive amount of data. I have just scratched the surface of the info, and I am excited to dive into it some later and I look forward to sharing what I find next!

--

--