Accurately Predicting Football with Python & SQL

And betting on the outcomes

Liam Hartley
Systematic Sports


“I’ve got a hunch that Chelsea is going to win” isn’t a very good argument for why you should place a bet.

“I’m using an algorithm that is proven to accurately predict outcomes better than the bookmakers” is much better.

To predict these outcomes I’ve created a data warehouse with my co-developer (Estèphe Corlin) throughout the 2021/22 football season to store: bookies odds, our own odds, player stats, fixture outcomes and more. Below is an example of our “HomeOdds” table being created in the warehouse:

CREATE TABLE Football.HomeOdds(
FixtureId INT
, TeamId INT
, Market VARCHAR(50)
, Odd DECIMAL(4,2)
, CONSTRAINT PK_BookieOdds PRIMARY KEY (FixtureId, TeamId, Market)

The data warehouse is a fully managed relational database that contains five seasons of data. It is populated through a mixture of backfills, web scraping and algorithmic calculations to calculate outcomes.

These calculations are based on the given xG for a team, their current form, their opponents ability to defend and whether they’re playing home or away. More details of this calculation can be found in the article below.

By iteratively modifying our models parameters we are able to retrospectively simulate the results of different betting strategies over a given season.

Project Architecture

All of the data gathering processes and outcome calculations are decoupled in order to enable backtesting, improve reliability and to separate concerns.

Project architecture. Decoupled processes run to load data into the database which is used to calculate outcomes and to simulate betting strategies.

These calculations are run for every game over five different leagues (EPL, La Liga, Bundesliga, Ligue 1 and Serie A) and cover the following markets:

  • Match result — e.g. home win, away win or draw
  • Double chance — either of…