Football Odds data analysis using Poisson distribution in Python — Part 1

Aritra
9 min readMar 20, 2023

Introduction

For the past couple of days, I was studying a little bit about statistics, statistical distributions and a lot about football odds data. I do not promote gambling, just interested in the mathematics behind it. Read a lot about how bookmakers set the price on each team according to their winning chances. Came across many good articles, and particularly in this article, they say “In betting, the trick is to find markets in which the bookmaker’s odds are bigger than the true odds, for that is where the value lies. (quoted from the article itself)”.

Now the question is how do you predict chances? If someone involves in gambling, they would notice that there is a price for every single event such as team winning, team total shots, team shots on target, Over and under a certain number of goals, correct score, player scoring or assisting and there are many more events. That means bookmakers have information on each of the things they set the price on. As a result, we need to start at some point to predict chances by ourselves if we want to bet on something (Please bet if you do it for fun or see the accuracy of your prediction and obviously if you have enough pool of money otherwise please don’t).

Bookmaker’s odds

At very first before indulging ourselves in mathematical calculations and simulations or predicting chances let’s take a look at bookmaker’s calculations. On the 18th of March, 2023, there was a match between Aston Villa (Home) vs Bournemouth (Away) where the odds offered by a certain bookmaker look like below -

Aston Villa (Home): 1.73
Draw: 3.75
Bournemouth (Away): 4.75

That means if you stake 1 pound on each of them, if the Home team wins you will get 1.73 pounds, if the draw happens, you will get 3.75 pounds, and if the Away team wins, you will get 4.75 pounds, meaning that having a profit of 0.73, 2.75 and 3.75 pounds respectively.

Now, there is a formula for calculating probability from odds, which is -
odds = 1/ probability

So, now if we calculate the probability of winning chances of each team we get the below:

Home team winning probability = 1/ Home team odds = 1/ 1.73 = 0.578 = 57.8%
Draw probability = 1/ Home team odds = 1/ 3.75 = 0.2666 = 26.67%
Away team winning probability = 1/ Home team odds = 1/ 4.75 = 0.2105 = 21.0%
Total probability set by the Bookmaker is: 0.578 + 0.2667 + 0.2105 = 1.05547
Total winning chances set by the Bookmaker is: 57.8% + 26.67% + 21.0% = 105.47%

But, we know the probability of an event can never be greater than 1, because the number of trials in which the event can happen cannot be greater than the total number of trials. That means the sum of the above should never be greater than 1 or 100%. Then what is the meaning of an extra 105.47% - 100% = 5.47%? This 5.47% is bookmakers' margin (also called overround) to balance their books. Different bookmakers have different margins. Now the probabilities calculated above are then simply called implied probability from implied odds.

Data collection and calculating necessary team statistics

Now, the question is how we calculate the true odds and probability of winning each of the above-mentioned events. Our calculation begins now, where we will use mathematical concepts and predictions. First, we need data. Everything is over there on the internet, trust me. Football-data-co-uk provides match results of home and away along with odds offered by different bookmakers. We need access to that data, below the simple line of code will do the needful -

The result will look like below -

We will keep our process as simple as we can so that we can build our intuition on top of that. We do not need these many columns. We need the home team name, away team name, home team goals and away team goals. That’s it for now. so our final working data set will look like this -

Remember our sole purpose is to find true probabilities for ourselves, that means finding the true winning chances. First select the home team (i.e., Aston Villa) and away team (Bournemouth).

Task 1. Calculate the average total home goals scored.

Task 2: Calculate the average total away goals scored.

Task 3. Calculate average Home team goals scored at home.

Task 4. Calculate average Home team goals conceded at home.

Task 5. Calculate average Away team goals scored away.

Task 6. Calculate average Away team goals conceded away.

Task 7. Calculate the Home team attacking strength. (i.e., average Home team goals scored at home/ average Home goals scored in the league)

Task 8. Calculate the Home team’s defensive strength. (i.e., average Home team goals conceded at home/ average Away goals scored in the league)

Task 9. Calculate the Away team attacking strength. (i.e., average Away team goals scored at Away/ average Away goals scored in the league)

Task 10. Calculate the Away team’s defensive strength. (i.e., average Away team goals conceded at Away/ average Home goals scored in the league)

Task 11. Calculate Home team score expectance (home_team_att_strength * away_team_def_strength * total_home_goals_mean)

Task 12. Calculate Away team score expectance (away_team_att_strength * home_team_def_strength * total_away_goals_mean)

The concept has been borrowed from here. In my GitHub, one can take a look at the coding I used to calculate the above-mentioned parameters.

The results of the above tasks look like below:

Now we have to ask ourselves a simple question. Can a team score 2.61 or 1.22 goals? Simply NO. That is where Poisson distribution comes to play. If you want to have an essence of Poisson distribution and monte-carlo simulation and how they are implemented in football data please take a look at these posts — monte carlo simulation, poission distribution, and randomness.

Now a single line of code will generate the number of goals for us from the above-mentioned goal expectance with 100000 different values (i.e., samples).

Next lines of code will generate number of chances of the goals to happen for home and away teams.

If we put them in a dataframe it will look like below — (In the left hand side which is basically the index, also represents the number of goals).

Okay then we have come this far to calculate the chances of scoring 0, 1, 2, 3 , 4, 5 number of goals for the home and away teams. we now know the home and away teams’ scoring of exactly 0, 1, 2, 3, 4, 5 goals in the match.

It is now getting long. I think we should take a coffee break and think of what we have done so far and grasp the relevant calculations (What we did so far).
I took a break. Did you? Lets start again! Remember what was our main goal? Calculating true odds. We are close. I promise you.

So far, we know that each of the home and away team’s scoring chances of 0, 1, 2, 3, 4, 5 goals. Now we have to calculate the chances of home team and away teams draw (0–0, 1–1, 2–2 etc.) and win chances for each of them (like home team’s and aways team’s score 1–0, 2–1, 3–1, 3–2 etc.) We will simply take a dot product of the chances and it will look like below:

Now, think of this as a matrix. The summation upper triangular matrix is the chances of away team’s winning in total, the summation of the diagonal axis is the total chances of draw and the lower triangual matrix is the chances of home team’s winning in total.

For better understanding, I have highlighted the matrix. The green parts is away team’s total chances of winning, the black part is the total chances of draw and the red part is the home teams’s total chances of winning. Close close very close.

Time to sum up.

We did it. Machineball is my website’s name. I am acting like a bookmaker now. 😉💁 . We found out the chances of home match and away match scoring performances. True chances (according to our analysis).

Time to calculate true odds. Remember odds = (1/ probability). I calculated.

Implied odds vs True odds

Machineball home team win odds: 1.61
Bookmaker home team win odds: 1.73

Machineball draw odds: 5.88
Bookmaker draw odds: 3.75

Machineball away team win odds: 6.29
Bookmaker home team win odds: 4.75

They say “In betting, the trick is to find markets in which the bookmaker’s odds are bigger than the true odds, for that is where the value lies.” Did we find anything? Yes, 1.73 > 1.61.

Some question and answers 😉

Question 1. Shall we bet?
Answer: If you can calculate the true odds correctly, and find the above-mentioned situations, you should. Cause long term you win. Mathematics says that. Expert betters say that.

Question 2. The calculated true odds are true odds?
Answer: Our model is very simple. Bookmakers use lots of data, match sentiment, lineups, and complex machine learning algorithms to calculate odds. If you think you are more powerful than them. Go on.

Conclusions

The above was a general analogy to look at true odds. How the bookmakers set their own odds and how you can calculate your odds. This was one way of thinking and one way of calculation. Make your own assumptions and go. Atleast give it a try. Mathematics is beautiful.

Last but not the least, the final score of the game was Aston Villa 3–0 Bournemouth.

An important note:

I want to discuss an important point regarding the true odds calculation. On the 20th of March, 2023 there was a match between Barcelona (The home team) and Real Madrid (the Away team).

First, try the above calculation for this match by yourself. You know what to do, you have to select all the La Liga matches between those teams before the above-mentioned date and then run the calculation. You will think that this calculation is flawed. Your calculated odds for Barcelona will be something like 1.58, for the draw 3.42 and Real Madrid will be 13.89. No chance the bookmaker will offer you something like 13.89 for Real Madrid. What happened?

Before looking at my answer try to think by yourself what happened. Maybe, go back to each calculation you made and try to analyse.

I am telling you what happened. Barcelona at Home scored 27 and conceded only 2 goals. Our calculation includes “average home team goals conceded at home” and this one plays a role in “Home team defensive strength”. Now you know why the calculated odds are so high for Real Madrid. I have done again the analysis with a bias term added to the away team goal expectance and used it as a lambda function of Poisson distribution. The bias is of your choice. Think of the bias as, bias = choice /100, I thought Real Madrid has at least a 50% chance of scoring in this game. So, I added 50/ 100 = 0.5 to away team goal expectance and re-calculated. The re-calculated odds are for Barcelona 2.05, for a draw 3.45 and for Real Madrid 4.55. You can take a look at the analysis here.

In this match, the bookmaker offered Barcelona 2.20, for a draw 3.40 for Real Madrid 3.20. Technically, did we again get some value to place a bet? 😎 Look at the match result 😉

Link to the post in my website: https://themachineball.com/blog-details/13
Link to the Github for the code: https://github.com/kaii55/Odds_calculation
Link to my website: https://themachineball.com
Link to my twitter: https://twitter.com/themachineball

References:

1. Tutorial — 1

2. Tutorial — 2

3. Soccermatics

4. Implied odds vs True odds

5. How to think about odds

--

--

Aritra

PhD in "Football Injury Prediction" at Bournemouth University and AFC Bournemouth football club. Watching football and through data giving insights.