The Number Games — How Machine Learning is Changing Sports

Nabeel Abdul Latheef
6 min readJul 21, 2017

--

Elite sport is now awash with data. As athletes and management look to gain every competitive advantage they possibly can, they are gathering information about all aspects of individual and team performances in booth training and matchplay, as well as a raft of other metrics.

The confidence with machine learning often needed to get coaches to the pinnacle of their field means that some are still reluctant to cede ground to algorithms and machines, but inherent prejudices and the fallibility of human memory mean that the brain is an inefficient tool for processing complex information, especially in the time required during sports games. This is especially true for team sports, where they must monitor a number of players at once.

Machine Learning can be applied to sports in a range of ways, with data now accessible about almost anything.

What’s Sports Analytics?

Sports analytics is the processes that identify and acquire the knowledge and insight about potential players’ performances based on the use of a variety of data sources such as game data and individual player performance data. These advanced and sophisticated type of analytics should be able to extract valuable actionable insights for the coaches and managers to utilize.

Sports analytics can be utilized in various domains including:

  • Predicting the outcome of a game
  • Predicting the performances of teams or individual players
  • Building new strategies for upcoming competitions
  • Deciding the price of a player if a club was to rent/sell/buy him/her
  • Connecting players to brands and sponsors

Of course, not all teams use analytical tools. In addition to the costs involved, there’s also the problem of explaining complex analytical methods to coaches in ways they can understand.

In soccer, more and more of that right data is becoming available thanks to wearable technology and RFID tags being worn by players.

S.L. Benfica — Portugal’s top football team — makes as much money from carefully nurturing, training, and selling players as actually playing football. Football teams have always sold and traded players, of course, but Benfica has turned it into an art form: buying young talent; using advanced technology, data science, and training to improve their health and performance; and then selling them for tens of millions of pounds — sometimes as much as 10 or 20 times the original fee.

With machine learning and predictive analytics running on Microsoft Azure, combined with Benfica’s expert data scientists and the learned experience of the trainers, each player receives a personalized training regime where weaknesses are ironed out, strengths enhanced, and the chance of injury significantly reduced.

Sample Data collected for Player Performance Indexing

Data collected, such as players’ vital stats and movements in training and in play on game day are being analyzed to enhance player performance and match strategy. Coaches can now have access to information such as how fast players are running, the distance they are covering in a game, and their levels of fatigue or dehydration in real-time and after a match.

This information is enabling coaches to identify weaker or stronger players, their physical state, and supporting their decisions when it comes to whom to replace during a match or whom to keep on the bench. And by studying patterns of play and player movements, coaches can reconfigure play strategy to make use of each player’s strengths and offset their weaknesses to improve overall team performance. Over time, coaches can study the impact of data-driven decisions and strategies on overall player and team performance by analyzing the change in player data.

In cricket, machine learning algorithms can be used to identify complex yet meaningful patterns in the data, which then allows us to predict or classify future instances or events.

A huge relief should be revamping the current Duckworth — Lewis method with ML as it has been an unfair treatment to teams who score big in crucial matches.

We can use data from the first innings, such as the number of deliveries bowled, wickets left, runs scored per deliveries faced and partnership for the last wicket, and compare that against total runs scored. Machine learning techniques like SVM, Neural Network, Random Forest can be used to create a model from the historical first innings data, considering the teams playing the match. The same model can be used to predict the second innings which is interrupted by rain. This will give a more accurate prediction than the D/L method, as we are using a lot of historical data and all relevant variables.

Another application is the WASP (Winning and Scoring Prediction), which has used machine learning techniques that predict the final score in the first innings and estimates the chasing team’s probability of winning in the second innings. However, this technology has been used in very few tournaments as of now. WASP was created by Scott Brooker as part of his Ph.D. research, along with his supervisor Seamus Hogan, at the University of Canterbury. New Zealand’s Sky TV first introduced the WASP during the coverage of their domestic limited overs cricket. The models are based on a database of all non-shortened ODI and 20–20 games played between top-eight countries since late 2006 (slightly further back for 20–20 games).

The first-innings model estimates the additional runs likely to be scored as a function of the number of balls and wickets remaining. The second innings model estimates the probability of winning as a function of balls and wickets remaining, runs scored to date, and the target score.

Let V(b,w) be the expected additional runs for the rest of the innings when b (legitimate) balls have been bowled and w wickets have been lost, and let r(b,w) and p(b,w) be, respectively, the estimated expected runs and the probability of a wicket on the next ball in that situation. The equation is –

V(b,w) =r(b,w) +p(b,w) V(b+1,w+1) +(1-p(b,w)))V(b+1,w)

Factors like the history of games at that venue and conditions on the day (pitch, weather etc.) are considered and scoring rates and probabilities of dismissals are used to make the predictions.

Machine Learning vs Human Interpretation

Improvements in technology and machine learning continue to progress the field towards artificial intelligence and real-time use in sport. But is it possible that artificial intelligence will ever replace the coach/manager?

Well, in some ways, they already have. Many elite sporting clubs already set specific thresholds for athletes during training. These are based on perceived reductions in performance or increases in injury risk if this threshold is overcome.

The judgement on what is appropriate treatment of the athlete is made solely by a computer-based analysis of data collected in the field. For the moment, at least, the decision on whether to act or not on this information still remains with the coach/manager. However, it may change in the future!

--

--

Nabeel Abdul Latheef

Pray. Wander. Eat. Love. Sing. Game. Gym. Repeat. Views are my own! #DiabolicalSoul #Revenant #Rebel #IndignationHegemony