Metrics Used in Sports Data Analytics

Swetank Pathak
Analytics Vidhya
Published in
6 min readJun 19, 2021

--

Are You Prepared to Meet The Next Level of Competition?

The various approach to demystify the data is to quantify it appropriately through key performance indicator could be stated as sports and performance metrics; that differs from sports to sports and body to body. When soever a player is participating in various training or events there is a roll-out of lots of data in two forms human body performance and sports specific data.

“The objectives of sports science data scientist is not only to look after match day metrics but also the bodily changes that may cause an impact of the performance.”

With context to human performance; metrics are stated as below:

Intrinsic Data: The human body generates a humongous amount of data every second whether anyone is in a state of rest or motion; with context to sports we cover human exercise physiology, physical therapy, nutrition, anthropometry, strength & conditioning, biomechanics, and psychological framework for example:

Hydration Level

Heart Rate

Rate of Perceived Exertion

Heart Rate Variability

Excess Post Oxygen Consumption

Anxiety Level

Lactate Threshold

Strength

Power

Aerobic Capacity and many more

Extrinsic Data: Everyone interacts with nature, living and non-living things to overcome their need, in context with sports; playing condition whether it be an indoor event or outdoor events has an impact along with confounding factors like kit and opposition.

Human Performance Data Set

With Context to sports-specific metrics; there is a constant effort in sports to improve metrics that assess player ability, but almost no effort has been made to quantify and compare existing metrics.

Whenever we are developing machine learning model metrics; we should consider three criteria:

Constancy: does the metric measure the same thing over time

Discrimination: does the metric differentiate between players

Objectivity: does the metric provide new information

Discrimination: To be efficient, a measure assessing player skill must be a method for distinguishing between different athletes. This means that the majority of the variance between players is due to true differences in playing capacity rather than chance or noise due to limited sample sizes.

Constancy: It’s vital to consider how often a particular player’s metric changes from season to season, in addition to inequality, which is a schema that describes variation within a single season. When it comes to future deals, the idea of consistency is especially important in sports management. After eliminating chance variability, we use constancy as a parameter to explain how often we expect a single-player metric to change over time. This metric tests a metric’s sensitivity to changes in circumstance or endogenous player ability over time.

Objectivity: We should not consider several metrics that quantify related facets of a player’s skill as different pieces of knowledge. This is particularly critical for sports management decision-makers who depend on these measurements to make decisions. Only by properly synthesizing the available data will accurate measurements of player skill be made.

Let’s talk about a few sports in brief:

Analysis in Basketball

With clear box score statistics, discrimination and stability scores match insight, the least discriminative and robust of the parameters is the raw three-point percentage; analytical Bayes calculations of three-point capacity increase both consistency and discrimination. Metrics like rebounds, blocks, and assists are strong indicators of player position and for this reason, are highly discriminative and stable. Per-minute or per-game statistics are generally more stable but less discriminative. Along with match statistics with the implication of Artificial Intelligence; video analytics had a great impact on the betterment of a player.

An Overview of Basketball Statistics Sheet

Analysis in Football

In football analytics, quantitative analytics accounts for possession rate, expected goals, the success of high press, the correlation between the event and the results, number of shots on target, pass rate, and many more. However, qualitative analytics comprises of player position, the outcome of the game is whether positive or negative. A semi-quantitative analysis for analyses of likelihood of counter-attacks, loss of the ball, and set-piece. Video analysis also plays a significant role during the technical and tactical sessions.

An Overview of Football Statistics Sheet

Analysis in Cricket

In cricket, data analytics and modelling are heavily dependent on data representation and the model, and their sophistication is directly proportional to the type of predictive questions that are put out during research. When realistic cricket play interpretations are sought, such as what would have happened if the batsman had struck the ball at a different angle, things get much more difficult in terms of calculation and data comparisons, and with the use of machine learning, it helps to predict the runs of not only an individual player but also a team and also able to help to state whether the player will score a fifty or hundred. However, descriptive statistics helps to gain insights about the average and strike rate of batsmen and bowler and many more. A vast range of variables can be tracked, including the number of players on the field, their characteristics, the ball, and a variety of possible actions.

An Overview of Football Statistics Sheet

Principal component analysis and Sports

With context to the above three criteria Constancy, Discrimination and Objectivity; there is the efflux of too many parameters whether be any sports; blindly we could not exclude few parameters and vice versa. For example; in the analysis of basketball, with simple box score statistics and sensor-based technology (like Garmen, Catapult, or Equivital); we could get human performance data such as heart rate, EPOC, heart rate max, recovery heart rate, etc. and sports specific metrics like Minutes played, Field Goal Attempts, 3 point attempts, 2 point attempts, Personal fouls, Points, Offensive rebounds, Defensive rebounds, turnover percentage, value over replacement and many more; with the implication of principal component analysis that is a method to reduce the dimensionality of such data sets, increasing interpretability with minimal information loss.

A simplistic linear model formula:

Defining the metrics for sports could be stated as 3- dimensional array of player, seasons and metrics of which s stands for seasons, p stands for player and m stands for metrics. If we develop a linear model over a time as seasons s for player p and metric m; it would be stated as:

Y = µm + Xsm + Xpm + Xspm + espm

Where:

µm: distribution of metrics mean

Xsm: random effect of session and metrics {with value from 0 to Variance}

Xpm:: random effect of player and metrics {with value from 0 to Variance }

Xspm: random effect of session, player and metrics {with value from 0 to Variance}

espm: variation induced by sampling

However, Y denotes the true value of the skill m of player p in season s

“Real-time data analytics can assist in extracting information even after a match, enabling the team and related companies to change strategies for economic advantage and growth.”

--

--

Swetank Pathak
Analytics Vidhya

Sports Physiotherapist ▶ Sports Scientist ▶ Data Scientist ▶ Sports Analyst ▶ Python ▶ React ▶ React Native ▶ Building App ▶ loading…..!!