Accurately Predicting Football with Python & SQL
And betting on the outcomes
--
“I’ve got a hunch that Chelsea is going to win” isn’t a very good argument for why you should place a bet.
“I’m using an algorithm that is proven to accurately predict outcomes better than the bookmakers” is much better.
To predict these outcomes I’ve created a data warehouse with my co-developer (Estèphe Corlin) throughout the 2021/22 football season to store: bookies odds, our own odds, player stats, fixture outcomes and more. Below is an example of our “HomeOdds” table being created in the warehouse:
CREATE TABLE Football.HomeOdds(
FixtureId INT
, TeamId INT
, Market VARCHAR(50)
, Odd DECIMAL(4,2)
, DataTimeStamp DATETIME NOT NULL
, CONSTRAINT PK_BookieOdds PRIMARY KEY (FixtureId, TeamId, Market)
);
The data warehouse is a fully managed relational database that contains five seasons of data. It is populated through a mixture of backfills, web scraping and algorithmic calculations to calculate outcomes.
These calculations are based on the given xG for a team, their current form, their opponents ability to defend and whether they’re playing home or away. More details of this calculation can be found in the article below.
By iteratively modifying our models parameters we are able to retrospectively simulate the results of different betting strategies over a given season.
Project Architecture
All of the data gathering processes and outcome calculations are decoupled in order to enable backtesting, improve reliability and to separate concerns.
These calculations are run for every game over five different leagues (EPL, La Liga, Bundesliga, Ligue 1 and Serie A) and cover the following markets:
- Match result — e.g. home win, away win or draw
- Double chance — either of…