Sports Betting Strategies: Machine Learning

IG Quant Research
IG Quant Research
Published in
3 min readJan 27, 2020

Machine learning has become accessible and widely used. Through languages like Python one can easily import sklearn, et al and experiment with different models.

In this article we are going to talk how we use supervised learning and unsupervised learning techniques to produce betting tips/signals.

Outlier detection — Unsupervised Learning

One of our goals is to find potential inefficiencies in the market. Every bookmaker has to deal with mispriced odds, using outlier detection techniques we try to find these odds before they are completely corrected.

We analyze the odd timeseries for each bet using z-scores(variance and mean calculated with the Welford Algorithm) and more complex models such as Isolation Forests to detect outliers. This process will output real time alerts and data to be used by our filter/forecast models detailed below.

Isolation Forest

Supervised Learning — Filtering and Forecasting

For each different sport we have different goals when using Supervised Learning. This is because we have different ranges of data for each sports. Our base dataset for each sport is composed by player/team and tournament/league statistics plus bookmaker data(odd timeseries of different houses). We also use the outputs of the anomaly detection models as a feature/input for ours models. Each sport has its own data preparation process consisting on cleaning, labeling and standardizing data.

For sports like Tennis or Soccer we use binary classification to filter out bad tips, meaning, if our models predict a different outcome from the one forecasted by our tipsters, the tip is not realized as a bet.

For sports like Volleyball or Handball we try to forecast the outcome of the match. We try to find convergence between the tips produced by out tipsters and our models.

We use a mix of different models, currently we obtain the best f-scores using Random Forests and Support Vector Machines with a Radial kernel. Generalized linear models and pricinpal component analysis are also used to improve our results through better feature selection.

Support Vector Machines w/ different kernels.

Betting with the Machine Learning models

The first thought most bettors/gamblers have when applying machine learning to sports betting is to try to forecast match results and bet on it. We have found that by using our models outputs for Handicap, Number of Sets/Games or Number of Goals/Points betting we have bigger accuracy and returns than betting on result. This is specially true for football betting, where betting on match result is rarely a good idea. For example, if the model forecasts the favorite will loose, one may conclude that the probability of a disputed game (high number of points) is higher.

We also have noticed that there is a seasonality bias on our forecasts. In the beginning of each season/league there are more miscalculated odds(odds are mostly calculated based on previous year results). In this early period anomaly detection will be very profitable when used to identify this inefficiencies. At later stages there are more information about each team/player making it more difficult to spot mispriced odds but easier to forecast a game winner.

In sports where the favorite(lower odd) is more likely to win such as Handball or Volleyball we combine the output of different models with tips provided by our tipsters to create a ranking that will be used to set the quantity to bet. The more certain the bet is and the higher the odd is, the larger the quantity to bet will be.

--

--

IG Quant Research
IG Quant Research

Collective of Engineers, Scientists and Quants writing about their experiments in Quantitative Sports Betting https://www.patreon.com/igquants