Op-Ed: How data science is transforming soccer

By Jelly Zhu, MEng ’22 (IEOR)

This op-ed is part of a series from E295: Communications for Engineering Leaders. In this course, Master of Engineering students were challenged to communicate a topic they found interesting to a broad audience of technical and non-technical readers. As an opinion piece, the views shared here are neither an expression of nor endorsed by UC Berkeley or the Fung Institute.

Photo by Izuddin Helmi Adnan on Unsplash

If you are a soccer fan, you probably have heard this story: on the quarter final of the 2006 FIFA World Cup, Germany beat Argentina 4–2 in a penalty shoot-out and knocked Argentina out of the semi-final spot. The Germany’s goalkeeper, Jens Lehmann, saved two penalties from Argentina with the help of a piece of paper from the German coaching team. This paper contained manual reports of the most frequent kick directions taken by the Argentinian penalty kickers. This was likely the earliest application of data science in soccer.

As the world goes into the era of “Big Data,” many industries have been experimenting with changes in the application of data science and soccer is no exception. Sports generate enormous amounts of data every day, but how is data generated specifically from these soccer games? How can stakeholders in soccer games benefit from data science technology? How is data science changing soccer today?

How is data generated from a soccer game?

There are 3 main ways that data is generated from a soccer game: manual record, high-speed cameras, and wearable sensing equipment.

Manual record is the earliest but most inefficient way to collect data in a game. Companies hire data collectors to record movements of players during a game. The data generated by this method is usually not highly accurate and often used for game reports.

High-speed cameras are the most commonly used tool to collect data from soccer games. In nearly every big stadium, several specially designed high-speed cameras are installed at each position of the stadium to automatically track the motion trajectory of players and soccer through the optical tracking algorithm. With this, they are able to calculate data in real-time on running distance by the program, and they are even able to subdivide the running into jogging, fast running, sprinting and other subfields. The data generated by cameras is much more accurate than manual record and is widely used in many analyses like player scoring systems.

Some club teams in recent years have started using wearable sensing equipment to collect data from their players during a game. It can not only collect data about players’ movement but also some biological data such as heart rate during a game, which has huge potential application and analysis possibilities for the future.

What can data do in a soccer game?

Evaluating players in a game
It’s easy for us to tell which players perform well or badly after watching a whole game, but with data science methodology, we can evaluate the performance of players in a more statistical and scientific way. Soccer-mathematician David Sumpter created a machine learning model that assigns different points to players during a game based on how much their action increases the team’s chances of creating a goal-scoring opportunity.

Predicting expected goals
Winning in a soccer game is closely tied to scoring goals, which is an event that happens with low probability. Studies have shown that on average, a match produces 2.5 goals. Therefore, it is valuable to study how to score a goal and predict it.

xG is a predictive model that has been introduced to measure the probability that a shot will result in a goal based on a series of factors such as the distance from where the shot was taken, the angle of the shot with respect to the goal line, the current game score, and other game statistics.

Image courtesy of Ian Dragulet

Changing how teams strategize
With the player scoring model, we can even evaluate the players’ ability both in attack and defense, thus providing an important reference for scouting and developing a team. Also, with the analysis outcome of goal prediction, a team can design their tactics in games, increase their chances of scoring a goal, and subsequently win more games.

With widespread applications of data science, not only are soccer teams and clubs reforming the ways they train, play, and scout, but fans can also enjoy new ways to watch and talk about soccer games and players. There is also a noticeable need for data scientists and data analysts for soccer clubs which creates many potential working opportunities. Therefore, data science is helping soccer become more interesting, more competitive, and more appealing.


  1. David Sumpter, “Evaluating actions in football using machine learning,” Medium, May 29, 2021, accessed October 14, 2021, https://soccermatics.medium.com/evaluating-actions-in-football-using-machine-learning-69517e376e0c.
  2. Ian Dragulet, “An Exploration of Expected Goals,” towards data science, January 1, 2021, accessed October 14, 2021, https://towardsdatascience.com/a-guide-to-expected-goals-63925ee71064.
  3. Justin Harper, “Data experts are becoming football’s best signings,” BBC News, March 5, 2021, accessed October 14,2021, https://www.bbc.com/news/business-56164159.

Connect with Jelly.

Edited by Alison Huh.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store