Predicting the Outcome of the Premier League using Neural Networks

Tyler Marchiano
Bucknell AI & CogSci
7 min readDec 22, 2019

By: Said Mrad, Vince Minnocci, Tyler Marchiano

Introduction

Almost every year, the most-watched television program in the United States is the Superbowl[1]. Worldwide, the most-watched television program is the Olympics[2]. People love watching sporting events, and it is an integral part of their lives. People bet on the games, people cry about the games, and people spend lots of money to go to the games. People go to all sorts of extremes about sporting events all the time. Now imagine that you are sitting at home watching a game, but you already know who is going to win. That is the goal of our Midterm Project.

For our Midterm Project, we decided to try to build an artificial intelligence system that could predict the results of sporting events. We are training a neural network to predict the outcome of the next season of the Premier League. We are going to feed the neural network statistics from the previous season of the Premier League and use that information to train a neural network to predict the next season. We want our system to predict the overall standings at the end of the season and to predict the outcome of the head to head games during the season. This project is important for several reasons. First, sports betting has become more and more popular; having something like this could completely change the way sports betting is done. Also, having an AI that can predict the outcome of soccer games can help professional teams. Managers and coaches could look at the data and determine exactly what their team needs to change to be able to beat individual teams. Finally, prediction software like this doesn’t have to be used only for sports; it could be used for a vast number of things such as traffic patterns, weather, and a whole lot more.

Ethical Dilemmas

There are also ethical dilemmas that should be considered while creating an artificial intelligence system that predicts the outcome of sporting events. One thing that should be considered is the use of the system with sports betting. Using an artificial intelligence system that can predict the outcome of sports events as a way to make money can be considered cheating and unfair. If multiple people used these systems then there is a good chance that casinos and “the house” will start losing money since more people will be winning. Another ethical dilemma that should be considered is how could this prediction software affect the game. Will fans continue to watch the game and be involved in the sport if they can predict the outcome accurately before the game takes place? It could make the game boring and not entertaining since everyone knows what is going to happen before it actually happens. The final ethical dilemma that should be considered is how will this affect jobs. If there is an AI system that can predict the outcome of soccer games correctly, will there be any need for sports experts and analysts, or will those jobs become obsolete? We need to consider all of these while creating this AI system.

Our Implementation

To implement our project, we decided to use a neural network with 6 inputs, 4 hidden layers (with 32, 16, 8, and 4 nodes respectively), and 2 outputs. This can also be expressed as a 6->32->16->8->4->2 neural network.

Figure 1: Our 6->32->16->8->4->2 Neural Network

We decided to use Relu as our activation function for our neural network. Our 6 input nodes were the following for the home team and the away team: team name (represented by a number in the range 1–20), relative attacking strength, and relative defending strength. We calculated the relative attacking strength as (average goals scored per game - average goals scored for the league)/average goals scored for the league. We calculated the relative defending strength as (average goals conceded for league - average goals conceded per game)/average goals conceded for the league. The two outputs were the predicted goals for the home team and the predicted goals for the away team. We then simulated this on every game of the season and kept track of the outcome. If a team won that earned them 3 points, if a team tied that earned them 1 point, and losses earn them 0 points. This then gave us our predicted standings for the 2019/2020 Premier League season.

Alan McCabe and Jarrod Trevathan are two computer scientists who also tried to create an AI system that predicted the results of sporting events. They decided to make a neural network with 20 inputs, 1 hidden layer (with 10 nodes), and 1 output node (20->10->1). They decided to use a multi-layer perceptron implemented with backpropagation. They defined their 20 inputs for each team as follows[4]:

  • Points-for: the total points scored by the team in matches so far this season)
  • Points-against: the total points scored against the team in matches so far this season, expressed as a negative number
  • Overall Performance: the team’s performance based on their win/loss record. Two points are awarded for a win (or three points in the case of EPL), one point for a draw and no points for a loss. Performance is then the sum of these values over each round of competition so far.
  • Home Performance and Away Performance: the cumulative performance value calculated using only home games and only away games respectively. The Overall Performance feature can hide specific details such as home-ground performance, for example if the team has a 90% success rate at their home ground and a 10% success rate away from home, then an overall success rate of 50% hides some important information when trying to predict a winner.
  • Performance in Previous Game: the team’s performance in their most recent game. In the first round this value is typically taken from the previous season’s final game.
  • Performance in Previous n Games: the average of the performance for the most recent n games. Up to five previous games were considered in the feature set. This is an attempt to gauge the recent form and take into account whether the team is on a winning or losing streak.
  • Team Ranking: the team’s position on the league ladder based on a list of the teams, sorted by their overall Performance value. This feature’s use is obvious as, with all other things being equal, a team with a high ranking is expected to defeat a team with a low ranking.
  • Points-for in Previous n Games: the average of the points scored by the team in the most recent n games. Values for n of one to five were used as five separate features.
  • Points-against in Previous n Games: the average of the points scored against the team (expressed as a negative number) in the most recent n games. Values for n of one to five were used in the feature set.
  • Location: a value indicating whether the current game is played at the team’s home venue or elsewhere. The value 1 is taken for a home game, and 0 for an away game.

This model resulted in a 58% accuracy[4]. The average human soccer expert has 60–65% accuracy in predicting results[4]. So their model was a little worse than the current human experts.

Our project ended up performing worse than the human experts and McCabe and Trevathan’s model. Our model had only 40% accuracy. Our model didn’t predict a single number of goals scored by a team in-game correctly, but we did predict 43 out of 110 games accurately. This is a little better than a random chance considering there are 3 possible outcomes in a soccer game. So why did our model do so poorly? We have a couple of ideas of where we could have gone wrong. Soccer is very unpredictable, and there are many upsets, we don’t think our model accounted for this very well, or even at all. We could have used more inputs to estimate for more specifics like McCabe and Trevathan. We also could only check our results with the games that have been played already this season, considering the current season is only about halfway done we aren’t sure how our algorithm performed for the entire season, it is possible it could have done better or even worse in the second half of the season.

Sources

[1] “Super Bowl LIII Draws 98.2 Million TV Viewers, 32.3 Million Million Social Media Interactions.” Super Bowl LIII Draws 98.2 Million TV Viewers, 32.3 Million Million Social Media Interactions, 4 Feb. 2019, https://web.archive.org/web/20190205000209/https://www.nielsen.com/us/en/insights/news/2019/super-bowl-liii-draws-98-2-million-tv-viewers-32-3-million-social-media-interactions.html.

[2] Chase, Chris, and Email. “Rankings! The World Cup, Super Bowl and the 11 Biggest Sporting Events in the World.” USA Today, Gannett Satellite Information Network, 13 June 2018, https://touchdownwire.usatoday.com/gallery/world-cup-super-bowl-biggest-sports-events-world-globe-greatest/.

[3] Bunker, Rory P., and Fadi Thabtah. “A Machine Learning Framework for Sport Result Prediction.” Applied Computing and Informatics, Elsevier, 19 Sept. 2017, https://www.sciencedirect.com/science/article/pii/S2210832717301485?via=ihub.

[4] McCabe, Alan, and Jarrod Trevathan. “Artificial Intelligence in Sports Prediction.” IEE Xplore, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4492661.

[5] “Getting Started with the Keras Functional API.” Guide to the Functional API — Keras Documentation, https://keras.io/getting-started/functional-api-guide/.

[6] “BT Sport: The Heart of Sport.” BT.com, https://sport.bt.com/.

--

--