Estimating Football Game Results with Statistics.

Cem Kilicli
Productization
Published in
5 min readDec 8, 2018

What I am going to present here is a simple model to predict football (soccer) game results. By using this model I have created you can bet or you can improve it and just use it as a basis.

I have used data from season 2017, also this data can be altered with the previous season or so.

I have selected English Premier League data from the given data set, and try to estimate the upcoming games. For this, I have created a more simplified version of the stats and use it to create a model on Poisson distribution. The main reason for selecting Poisson distribution is that it is a perfect fit to estimate an event to happen. It is defined as a discrete frequency distribution which gives the probability of a number of independent events occurring in a fixed time.* I have used MS Excel to create distilled data to work on. In addition, all the calculating about a model is conducted by using this software.

I have created a model based on home attack strength, away attack strength and also home defense strength, away defense strength. Using newly created variables I have constructed my model on Poisson distribution. Using the output of model I have calculated the probability of each possibility to happen and come up with an overall probability for a certain result to happen ( Home win, Away win, Draw, Over 2.5 goals, under 2.5 goals ).

Based on these results I have made my decisions on which team to bet.

Generating Statistics

Using the data that is given I have distilled some new variable. These variables become the basis of my modeling process. All the calculating are created on the basis of these variables, so if we change the league and add a different data set the calculating will continue to work.

The idea is to create an attack score and defense score for both home and away cases. I start with the full-time goal scored (FTGS) and full-time goal conceded(FTGC) for each team in the premier league.

Using the created data I have found the average goal score by using;
for Home :
Average Goal Scored = Full-Time Goal Scored (home) / Total Home Games Played for Away :
Average Goal Scored = Full-Time Goal Scored (away) / Total Away Games Played

Secondly created a variable is average goals conceded ;
for Home :
Average Goal Conceded = Full-Time Goal Conceded (home) / Total Home Games Played for Away :
Average Goal Conceded = Full-Time Goal Conceded (away) / Total Away Games Played

Next phase is to create and adopt the attack and defense strength. For this, I have utilized a simple idea how well does the team performing compared to the average of the leagues. The formula that I use to create is;

Attack Score = Average Goals Scored / League Average Goal Scored

By this formula, I can say how well the team is performing compared to the rest of the league. As an example;

Attack Score for Arsenal = 2.1 / 1.646 which is equal to 1.275. Looking at this score we can say that the Arsenal is performing 20% better than the rest of the League on average.

Using the same methodology defense score for home and away is created.

Creating the Poisson Model

Since the attack and defense score for away and home cases generated I have moved on the next stage of finding the expected goals for each match. For calculated team goal expectancy I have used below given formula;

For Home Team:
Expected Goals = Home Team AS X Away Team DS X League Average of Home Goals For Away Team:
Expected Goals = Away Team AS X Home Team DS X League Average of Away Goals

So the result calculated as follows for the next match in English Premier League week 19

Since I find the expected goals could happen in the match, I have calculated probabilities for each score from 0–0, 1–0, 2–0, ……. to 10 -10. by using below given Poisson formula.

P(x; μ) = (e-μ) (μx) / x!
This leads the below-given matrix.

Using this table I have calculated probabilities for the home win, away win, draw, over 2.5 goals, under 2.5 goals. The table is given below comparing the generated decimal odds by the formula of 1/probability. I multiplied my odds to create more comparable odds with the betting companies.

Conclusion

Finally, by looking at the odds and my calculations I have decided place my bet on under 2.5 goals for Liverpool vs. Southampton. The reason behind this is the probability of under 2.5 goals is 93.31% which is a well-calculated risk. Secondly, the odd that betting sites are offering is much more than my calculated odd so this created a good chance to grow profit out of this bet.

If we consider the overall model and the variables that effect, we can clearly say that a more precise model can be build by adding factors like player attack and defense score or team coach effect. As a conclusion this model is open for improvement. Adding more variables to the model may make it more precise or more ambiguous. This is a process of test, learn, adapt cycle.

Also, the data that is used can be increased to create a more settle distribution. Which may be last season data. Also, we can compare seasons of the same team to understand the difference between the average attack and deface score.

As a conclusion, this model is capable of delivering statistical results to predict match result and with excel file provided it can be adapted for any data that is in the same structure.

--

--