DATA STORIES | SPORT ANALYTICS | KNIME ANALYTICS PLATFORM

Choose The Football Game You Want To Bet On Wisely

3 strategies to enhance your chances of success

Hans Samson
Low Code for Data Science

--

Photo by Travel Nomades on Unsplash.

In one of my previous blog posts, I discussed the possibility of creating a prediction model that forecasts the outcome of a football match (win, loss, draw).

Read here: Can I predict the outcome of a football match (and make money)?

I believed that the predictions were reasonably accurate, with an overall model accuracy of around 52%. The question I posed in the blog post was whether it would be possible to make money if I had bet 1 Euro on each match’s predicted outcome during the 2022–2023 season. The conclusion was that this approach would not yield a profit for me by the end of the season.

Meanwhile, my contemplation continued, leading me to the realization that it might not be wise to gamble on every single match. There were matches where the likelihood of a correct prediction was not very high. The question that arose for me was whether it would be possible to make a profit by strategically choosing the matches on which to bet. But how do you make a smart selection of matches to increase your chances of accurate predictions (and thus, a better return)? That is what this post is all about. But first, how did I do that?

Download my KNIME workflow from the KNIME Community Hub.

Screenshot from te KNIME workflow to gain insights by matching predictions by the model and the bookies.

Expanding my Dataset

I performed my analyses using KNIME. KNIME Analytics Platform is a free, open-source, no-code/low-code data science platform. It offers a combination of standard KNIME nodes by default and the ability to incorporate code (Python/R/Java) when necessary, making it my go-to solution.

The foundation for the analyses is a data file where the model predictions of a football match are linked to the bookmakers’ odds. I shared a sample workflow (including the data) in my personal space on the KNIME Community Hub. You can download the workflow and import it into your own KNIME Analytics Platform (and customize it to your liking). In this section, I will discuss some steps of the workflow (i.e., the steps framed in yellow in the figure above).

Two data files form the basis of the analysis: “online quotes 22–23” and “predictions by the model & actual score.” These involve matches from the Dutch football league (Eredivisie) in the 2022–2023 season. The online quotes are directly read from a website using a CSV Reader node. The CSV Reader has many options, making the imported file usually directly usable. The “Transformation” tab of this node is very useful. In this tab, you can indicate which columns you want to read or not. You can even easily adjust the column name and data type (see figure below).

Screenshot from the Transformation tab to configure the CSV Reader node.

The file with the model predictions and the actual result (score) of the match comes from the validation sets during the training of my model. The prediction model generates three probabilities: the chance of winning, losing, and finishing on a draw. The model’s prediction is the result associated with the highest probability. With the Column Aggregator node in combination with the Rule Engine node, it is straightforward to determine which result the model predicts.

Screenshot of the configurations of the Column Aggregator node. This node determines the highest value (p_model_max) of the three probabilities.

The next Rule Engine node then determines what the model has predicted.

$p_model_max$ = $p_model_home$ => "home"
$p_model_max$ = $p_model_away$ => "away"
$p_model_max$ = $p_model_draw$ => "draw"

In the next step, we need to find out whether the model’s predicted result matches the actual score, and thus whether the prediction is correct. This is also done through a Rule Engine node.

$model_prediction$ = $score$ => "Y"
TRUE => "N"

Now that we know whether the model’s prediction for a match is correct, the bookmakers’ odds can be matched to the matches using the Joiner node. Also, determining the model’s revenue is established by using a Rule Engine node.

Screenshot from the Rule Engine node determining model revenue.

In my analysis, I assumed that the stake on each match is 1 euro. This means that for a correctly predicted match, I retain the model_revenue minus 1 euro. And that an incorrectly predicted match $model_predict_ok$ = “N” always costs me 1 euro.

I can already reveal that groups of probabilities (bins) are central to the analyses. For creating these bins, KNIME has several standard Binning nodes (Auto-Binner; Numeric Binner). For this use case, I didn’t find these nodes very convenient, so I created the percentiles with (once again) a Rule Engine node. I am dividing these probabilities into 20 groups with a bin width of 0.05, such as the interval 0.40–0.45.

$p_model_max$ > 0.4  AND $p_model_max$ < 0.45 => "0.40 - 0.45"

After binning, all the ingredients are ready to explore how I can wisely choose the match to place a bet on. The dataset I described in this paragraph (and that is attached to the workflow in my space on the KNIME Community Hub) is not the complete dataset on which I conducted my analysis. I performed the analysis on matches in the Eredivisie over the past seven seasons, involving a total of 1743 matches.

A Smart Strategy

Of course, I am not the only one seeking an answer to the question: on which matches is the risk of loss the smallest when you want to place a bet? After some thought and online research, I identified the following three strategies that seem worth exploring beforehand. I can increase my returns (make a profit) if I bet only on matches where:

  1. the model calculates a high chance of success;

2. the “Expected Value” is greater than 0;

3. the likelihood of a correct prediction is high.

While describing the first two strategies, I will primarily focus on the outcomes. I will illustrate the implementation of the third strategy based on the steps I took in KNIME.

Note. This blog post occasionally refers to amounts for winnings, losses, and bets. The stake for each match is consistently €1, and the amounts for correctly predicted matches already account for the stake being deducted.

Only Betting on Matches with a High Chance of Success

At first glance, it seems like a good idea to place your money on a match where you believe the outcome is quite certain. However, there is a catch.

Let me explain this step by step. The model’s output consists of three probabilities: the probability of winning, the probability of losing, and the probability of a draw. The sum of these probabilities is always 100%. The outcome associated with the highest probability is the model’s prediction. I have categorized the highest probability (associated with the prediction) into percentiles (each percentile representing a probability range of 5%). For each percentile, I initially examined two metrics: one, the percentage of correctly predicted matches, and two, the average maximum return when the matches are correctly predicted.

Table 1: The higher the maximum probability, the lower the potential profit per game.

In table 1, you can see, for example, that if the model calculates the chance of success for the prediction between 0.60 and 0.65, the average maximum profit per match is € 0.53. From the “Percentage games predicted ok” column, it can be inferred that the model performs as expected with a 63% accuracy in this percentile. It is also noteworthy that the average maximum profit per match decreases (down to € 0.11 when 95% of the matches are correctly predicted) as the calculated chance of success increases. So the higher the probability to make a correct prediction, the lower the possible profit.

The above finding is also illustrated in the figure below. The x-axis represents the probality calculated by the model, and the y-axis represents the potential profit (assuming the model has correctly predicted the match). As the imaginary line of maximum profit descends from left to right, the number of red dots (incorrectly predicted matches) also decreases from left to right.

As the chance of success increases, the potential profit decreases.

If we then consider the profit from correctly predicted matches and the lost stake from incorrectly predicted matches, this results in the average profit per match per percentile consistently falling just below or occasionally just above €0 (mostly below), with an overall average of € -0.09 (see table 2).

Table 2: The Average Net Result per game is independent of the maximum probabilities.

In my introductory blog post, I demonstrated that my football prediction model is slightly better at predicting matches where the home team wins than matches where the away team wins. However, only betting on matches where the model predicts a home team victory results in an overall loss of € -0.06 per match. For matches where the model predicts an away team victory, the average loss per match is € -0.15. Unfortunately, there are no percentiles that guarantee a certain profit.

Therefore, placing bets on matches where the model expects a high probability of a correct prediction or on matches with a relatively high return will not be successful in the long run with this model. While matches with high probabilities are more often predicted correctly, the average profit per correctly predicted match is lower. Additionally, matches with a high predicted probability occur less frequently, reducing the opportunities for success.

Betting on Matches with an Expected Value Greater than 0

There is a considerable amount of information, strategies, and tips available on the internet for successfully betting on sports (football) matches. One such strategy is to use the Expected Value as a criterion for deciding whether or not to bet on a match.

The formula to calculate the Expected Value is as follows:

(P_Winning​) × (Amount Won per Bet) − (P_Losing​) × (Amount Lost per Bet)

Matches that have the potential for profit will yield an outcome greater than 0 with this equation, preferably much greater than 0. But how does this work for the model I have created? Does an Expected Value greater than 0 indeed lead to a successful bet that generates profit?

Firstly, I examined how often it occurs that the model predicts a match with an Expected Value greater than 0. It turns out that only about one-third of the matches have an Expected Value greater than 0. Secondly, I created percentiles of Expected Value and examined the return for each percentile.

Table 3: The average net result per game is independent of the Expected Value (EV).

What catches my attention in table 3 is that for matches with an EV > 0, the percentage of correctly predicted matches is significantly higher than for matches with an EV < 0. However, what is disappointing is that the net result of the bets is still (again) negative. For an EV greater than 0.5, the average return for every euro placed is € 0.02 . This represents a positive result, but somehow it feels a bit “tricky” to fully embrace.

Similar to the previous section where I looked at percentiles with the expected chance of success, the average return per match also fluctuates around € 0 here. Thus, choosing matches based on Expected Value for a successful bet with this model doesn’t seem to be a good choice.

Betting on Matches with Likely Correct Predictions

There is a third possibility that I explored. My idea was to examine my validation set to see which footballmatches the model predicted correctly and which it did not. I used this characteristic as a target variable in a Decision Tree model.

In a Decision Tree the goal is to split the football matches into subsets based on the features that lead to the most effective segregation of the fact that the result of the game is correctly predicted. This way, groups of matches are created, which, based on the values of the various features, yield a percentage (probability) indicating the extent to which the matches in this group are correctly predicted. To create the decision tree, KNIME employs the Decision Tree Learner node.

The output of the Decision Tree Learner is a model object. This model object is converted into business rules using the Decision Tree to Ruleset node.

Screenshot of the way the business rules generated by the Decision Tree Learner are applied in the Rule Engine (Dictionary) node.

Each group of matches (a final leaf in the Decision Tree) can be represented by a set of business rules. A business rule may look like this (see the bottom right node in the Decision Tree above):

home_points_cum_avg > 1.2 AND delta_points_cum_avg > 1.3 AND delta_goal_difference > 0.7  => succesrate 0.829

Out of the 2949 matches that meet this rule, 2246 have been predicted correctly. In other words, a footballmatch that meets this rule has an 83% chance of being correctly predicted by the model. In total, the Decision Tree generated over 200 business rules, each with the associated expectation that the prediction will be correct.

These business rules are applied to the features of the predicted match. I used the Rule Engine (Dictionary) node for this purpose. The business rules, along with the probability of a correct prediction, together form the dictionary. The table with the rules are connected to the lower input port of the Rule Engine (Dictionary) node. The rules then attempt to find a match with the data of the matches predicted by the model (the upper input port). If a rule matches, the probability that the match is correctly predicted is added to that match. The first matching rule in order of definition determines the outcome. It is important that the set of business rules is sorted (from high to low probability) before it is connected to the RuleEngine.

Configuring the Rule Engine (Dictionary) node (Condition = RuleSet & succes_rate = probality) that comes along with the rule.

This way, the model’s prediction is accompanied by the probability that the prediction will indeed be correct. This probability correlates (correlation coefficient = 0.64) with the probability of the predicted outcome (win, lose, draw) calculated by the model.

My hypothesis is that the higher the probability of the prediction being correct, the higher the return on the bet. However, similar to the previous two strategies, the return on the investment of 1 euro is usually negative, except for matches where the percentage correctly predicted falls between 55% and 64%.

Table 4: A high probability that a prediction is correct does not necessarily lead to a positive return.

But, similar to the outcomes of the analysis on Expected Value (EV), I am still not convinced that this is the group of matches for which I should go all-in.

What’s next?

Many hours of collecting data, creating features, testing algorithms, optimizing model parameters that predict the outcome of a football match have given me a lot of knowledge in predictive modeling and the KNIME Analytics Platform. However, making a profit with my predictions, beating the bookies, doesn’t seem convincing with my three explored strategies. Perhaps I overlooked something. Yet, it seems more likely that the model is (just) not good enough.

The journey continues, acknowledging the need for model enhancement to achieve consistent success against bookmakers. This post serves as a checkpoint, highlighting the ongoing pursuit of refining predictive models to beat the odds. Maybe I am a bit like Don Quixote, so be it. But it must be possible to beat the bookmakers with a prediction model. If you’re curious about how this turns out, follow me and don’t miss a thing.

--

--

Hans Samson
Low Code for Data Science

Hans is a data analyst/data scientist (but what's in a name)