How to make money with machine learning: value betting on predicted UFC fight outcomes
Table of contents
- Introduction
- Planning our approach
- Scraping a dataset
- Training details and network accuracy
- Turning predictions into profit
- Results over three recent events
- Scaling up and conclusions
Introduction
It goes without saying — networks have revolutionised the way of we can use computers to perform complex tasks. These include the classic problem of handwritten character recognition, noble endeavours like improving the accuracy of medical diagnoses, and more esoteric and unsavoury tasks like generating fake news or toxic 4-chan posts.
If you’re reading this, it’s likely you have tried your hand at training some models yourself. It’s not hard to see the potential your skills have for monetisation. But finding interesting problems to sink your teeth into with a clear/straightforward pathway to profit can be challenging.
Here, I aim to show you an example that lies somewhere in between the noble and nefarious…
I will describe how 1) a basic network can be trained to predict UFC fight outcomes, and 2) how one could theoretically utilise network predictions to place bets and return a profit.
The idea of using networks to capitalise on sports betting markets is well established, and has been implemented at scale. E.g. companies like Mercurius Betting Intelligence offer access to their network-driven sports trading bot. Think of this article as a demo for how this idea might work in practice.
Of course, this article only contains my opinion, offered strictly for information purposes only and is not intended as financial advice. Seek a duly licensed professional for actual investment and or financial advice. The title of this article is just clickbait.
Planning our approach
So where do we start? When I think about training a network to perform a task, I generally begin with these steps (though not always in this order, and the process is rarely linear):
- Define the task.
- Determine the factors/features the output likely depends on.
- Gather data containing this information.
- Choose a network architecture/training strategy that can extract/manipulate these features from the available data to perform the desired task (i.e. approximate the true underlying function mapping our inputs to their corresponding output).
1. Defining the task
Here, the task is to predict the outcome of a UFC fight before it starts (as opposed to ‘live’ betting to keep things simple).
2. Determining the factors/features the output likely depends on
Career statistics encode information about how fighters typically perform and the actions they prefer to take during matches (e.g. how often they land strikes, their ability to absorb strikes, how often they perform takedowns, etc.). I hypothesise that the difference in these fighter traits can be used to determine match outcomes. For example, at first glance it might seem like a fighter who can take a large number of strikes will have an advantage. But this may be nullified if they face someone who typically lands a larger number of strikes on their opponent. Similarly, a fighter with lower reach and height may find it challenging to land strikes, but may be more effective at performing and avoiding takedowns. If their takedown accuracy and rate are high (indicating a natural preference for performing these moves during matches), their lack of reach/height may be less of a disadvantage than initially thought.
Instead of leaving everything up to the network, it may be possible to hand-engineer features from these career stats based on our prior knowledge/assumptions about what factors are important in determining the outcome of a fight. Incorporating inputs with potent predictive power can ease the burden placed on the network to learn insights on its own, ultimately lowering training time or the amount of data required for training. However, I am no MMA expert so I will leave everything up to the network for now. Given enough data, the network may learn insights that are not obvious even to the most seasoned MMA fan.
It’s important to mention that there are factors relevant to predicting fight outcomes that we can’t know or are hard to know (e.g. a fighter’s mood on the day of the fight or how their personal relationship with the other fighter might affect the outcome). Public sentiment may encode information about the matchup that is not reflected in the stats. Though, extracting and assessing the quality of this information can be challenging. Therefore, we will leave this for now. A more robust feature may be the opinion of the bookmakers. Therefore, we ensure our inputs encode information about which fighter was deemed the favourite, and which was deemd the underdog.
Each input example will be a 14 element vector containing the difference between the height, weight, age, stance, reach, Significant Strikes Landed per Minute (SLpM), Significant Striking Accuracy (Str. Acc.), Significant Strikes Absorbed per Minute (SApM), Significant Strike Defence (Str. Def.), Average Takedowns Landed per 15 minutes (TD Avg.), Takedown Accuracy (TD. Acc) and Takedown Defense (TD Def.), Average Submissions Attempted per 15 minute (Sub. Avg.), and win ratio of the fight favourite and the underdog.
3. Acquiring a dataset
The most straightforward way to acquire our dataset is to pay for it. Various vendors offer APIs that allow access to historical fight odds, fighter stats, and match data. However this can be costly (e.g. ~£600/yr, too much for a lowly postdoc like myself…). Instead, it’s possible to scrape a dataset off of the internet for free (though, perhaps one that is lower quality). UFCstats.com has all the current stats of each fighter, and past match outcomes. betmma.tips has historical odds data. This is what we will use for now. The limitations of using this dataset are discussed a bit later on.
4. Choosing a training strategy
Our network is tasked with learning a mapping between the difference in career stats to the likelihood that either fighter wins the fight. Given what we discussed above, we want to ensure our model will utilise context of all the inputs when predicting fight outcomes (we expect certain stats to be more relevant to determining the outcome if other stats are high/low). Deep, fully connected feed forward networks are a suitable option. A fully connected architecture will ensure the outputs of each node in the first layer depend on all inputs. In turn, the outputs of subsequent layers will depend on all of the outputs produced from the nodes in their adjacent/preceding layer, ensuring a given node is always provided with information about all other inputs. Having several layers may increase the network’s ability to efficiently capture more complex/abstract insights from the data by manipulating more `basic’ information inferred by earlier layers. Secondly, it is often the case that networks composed of several layers are more effective/efficient to train (which is why they are used despite the fact that networks with one hidden layer share the same expressive power as deep networks). Though, adding too many layers can lead to model degradation so it’s important to experiment with several architectures.
To reduce the variance of predictions and sensitivity to weight initialisation, we will average the results of 5 identical network architectures with the following structure (we also add dropout to reduce overfitting):
#Define graph
inputs = keras.Input(shape=(14,))
x = layers.Dense(20, activation="relu")(inputs)
x = layers.Dropout(.05)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(.05)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(.05)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(.05)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(.05)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(.05)(x)outputs = layers.Dense(2, activation="softmax")(x)model_1 = keras.Model(inputs=inputs, outputs=outputs, name="fight_predictor1")
model_2 = keras.Model(inputs=inputs, outputs=outputs, name="fight_predictor2")
model_3 = keras.Model(inputs=inputs, outputs=outputs, name="fight_predictor3")
model_4 = keras.Model(inputs=inputs, outputs=outputs, name="fight_predictor4")
model_5 = keras.Model(inputs=inputs, outputs=outputs, name="fight_predictor5")model_1 = model_1(inputs)
model_2 = model_2(inputs)
model_3 = model_3(inputs)
model_4 = model_4(inputs)
model_5 = model_5(inputs)final_output = layers.average([model_1,model_2,model_3,model_4,model_5])model = keras.Model(inputs=inputs, outputs=final_output, name="fight_predictor")model.compile(
loss=keras.losses.BinaryCrossentropy(),
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
)
To summarise, the network will learn a mapping f from the difference in fighter stats xᵢ ∈ X to a two element vector describing the outcome of the fight yᵢ ∈ Y, f: X→Y . The use of a softmax activation function on the final layer ensures the output’s confidence is easily interpretable as a probability—it will normalise the output to a probability distribution over the predicted output classes.
Scraping a dataset
I found this to be tedious work — though probably because I was quite new to it.
We want to train a network to predict match outcomes based on the stats of each fighter leading up to a match. Therefore, our dataset should ideally consist of the difference in the stats of each fighter on the day of their fight alongside the match outcome. However, I found it hard to track down a website containing the historical career stats of each fighter. So instead, we have to settle for the stats posted on UFCstats.com which reflect the current state of each fighter.
This means our dataset is not very high quality. The difference between the current stats of two fighters are unlikely to be the same as they were in the past (e.g. perhaps one fighter has improved their take down accuracy over time). If we use this data, the network will learn to associate differences in current fighter stats with outcomes from the past. As a consequence, the importance it places on particular stats when predicting outcome may not be as accurate — e.g. it could be that today there is a large discrepancy in one statistic between two fighters who have had a match in the past, while this was not the case at the time of their match. The network may incorrectly associate this difference as being highly relevant to the fight outcome, when it would not have been at the time of the fight. Ultimately, the accuracy of our model will suffer because of this.
If on average a fighter’s stats only change subtly over the course of their careers, then this may not be a big problem. Though, I am not certain as to whether this is the case. As a quick fix (if necessary), the quality of our dataset may be improved by only considering more recent fights, where the current stats more accurately reflect the properties of the fighters.
In any case, we aim to acquire the following information from these websites (UFCstats.com and betmma.tips):
- The date of each match.
- The names of the fighters involved.
- The career stats of each fighter.
- The bookmaker’s odds for each fight.
- The match outcome.
The Beautiful Soup Python library can be used to gather this data (there are also Chrome apps that streamline this process if you’d like to give them a try). The library makes it straightforward to extract a website’s HTML code, and parse relevant information from it (e.g. the data contained in a specific table). There are several tutorials available online, and Stack Overflow is your friend. Bear in mind that some websites block IPs exhibiting bot-like behaviour (e.g accessing hundreds of pages in a minute…). I found that adding random breaks of a few seconds between visits to each site was sufficient to avoid a ban.
A summary of the scraping workflow is given in the Appendix.
Data cleaning
The data we have scraped is not perfect for other reasons… UFCstats.com does not have complete information about all fighters and betmma.tips does not have odds data for some fights. This means there are missing inputs for some examples in our dataset. One way to deal with this problem is to substitute an empty value with the mean value of the input. I took a more conservative approach and removed all examples with missing input values. This means our dataset contains more seasoned fighters with complete profiles on UFCstats.com (this means our model may not be as accurate when predicting match outcomes featuring new fighters, though this bias shouldn’t be too big of a problem, as we will only be able to make predictions on future fights with fighters who also have complete profiles).
I also removed all female fights from the dataset. By restricting the training set to male fights, we allow the network to learn a more optimised model as opposed to one that generalises well to both male and female fight prediction (differences in fighting styles between sexes could mean the relation between inputs and fight outcomes is significantly different). I chose to remove female match data as there were more male fights in the dataset — it is preferable to have as large a dataset as possible for training our network. In some cases, the names of certain fighters were written differently on either site (e.g. Phil vs. Phillip). Matches affected by this problem were simply not included in the dataset, but in principle, it should be straightforward to work around this issue.
After following these steps, I was left with 2169 examples.
Training details and network accuracy
Our model was trained for 30 epochs, with 1500 examples, and a batch size of 10. A validation set of 300 examples was used to assess overfitting, and a test set of 360 examples was used to assess model accuracy. The test set was composed of the 360 most recent fights in the dataset.
Overall, our network is 72% accurate — a bit better than the 67% achieved by the bookmakers. This on its own is not too impressive. But as will be discussed a bit later, overall accuracy is not necessarily that important — profitability comes from finding the specific situations where we know our model will predict fight outcomes much more accurately than the bookmakers…
If we only look at instances where our network predicts that the fight favourite will be the winner, the accuracy is higher, reaching 77%.
When the network a) predicts that the favourite will win and b) is confident in its prediction (>80% confidence), then the accuracy improves significantly, reaching 89%! If we add a further restriction, and only look at fights where the bookmaker odds for the favourite were 1.4+ (the reason for this will be made clear in the next section), then this accuracy decreases to 87%. These are fights the bookmakers were less certain about, so we would expect a drop in accuracy (i.e. they are harder to predict). It’s important to note that only 18% of fights in the test set fall within this subset (65/360). This is not exactly representative of all the matchups we may encounter ‘in the wild’, so it’s best to take this estimate with a grain of salt.
Turning predictions into profit
So we have trained a model that is decent at predicting fight outcomes. How can we use this to our advantage? The Kelly criterion can be used to work out the percentage of your bankroll, f*, to bet to maximise profits given the likelihood your prediction is correct, p , the probability that your predictions is incorrect, q= 1-p , and the return if the prediction is correct b = (decimal odds -1).
If we were to only place bets on the subset of fights where our network has predicted that the favourite will win with high confidence and when the odds are 1.4 or higher, then we would expect our predictions to be correct around 87% of the time. According to the Kelly criterion, this means we should bet 54% of our bankroll for each fight that meets these criteria (we assume all fights have odds of exactly 1.4). However, because our accuracy was derived from the results of only 65 fights, we can’t be especially confident about this figure. In practice it would be best to use a lower percentage.
You can think of our strategy as the sports betting equivalent of value investing. A blog post by Mercurius provides a short explanation of this:
“For instance, the bookmaker has offered odds of 2.50 on Arsenal to beat Manchester United. The implied probability of this event occurring is 100/2.50 = 40. In other words, the bookmaker says Arsenal have a 40% chance of winning the match. If [your model says] the chances of Arsenal winning are 45%, you have an ‘edge’ because the ‘real’ odds should be 100/45 = 2.22.
If you consistently bet at odds of 2.50 when the real odds should be 2.22, you will make a profit in the long-term.”
E.g. If we only bet on fights with odds of 1.4 or higher, according to the Kelly criterion, we need to be right more than 71.42% of the time to be profitable (i.e. to wager a non-zero fraction of our bankroll). The bookmakers set these odds as their models suggest the likelihood their predictions are correct is less than 71.42%. I.e. it is their belief that if someone were to repeatedly bet on all fights with 1.4 odds, they would win less than 71.42% of the time and lose money in the long run. This is neatly summarised by calculating the expected value of the bet. This is a measure of what a bettor can expect to win or lose per bet placed on the same odds over time:
For a £10 bet placed at 1.4 odds with a probability of a correct outcome at 71.42% the EV ~ 0: EV = (4 * .7142) - (10 * (1 - .7142))=-0.0012
However, in some cases, the bookmakers’ predictions might not be quite right, and their odds will not reflect the true probability. Different models are based on different assumptions, and in some cases one model may perform better than another. If we can isolate fights where this is the case (i.e. find fights with odds 1.4+ and where our expected accuracy will be more than 71.42%), we can exploit the fact their odds are set too high and return a profit by repeatedly betting on them.
It’s quite straightforward to find these fights with our model. For all fights where the odds for the favourite are 1.4+, we know that if our model predicts the favourite with high confidence (>80% confidence) its accuracy will be something around 87%, far exceeding this 71.42% accuracy threshold.
For each event we follow these steps:
- Predict outcomes for all fights where the favourite has odds of 1.4+
- If our model predicts the favourite will win with 80%+ confidence, we place a bet. We place a bet on the first upcoming fight that satisfies our criteria.
- For this bet, we wager 5% of our bankroll (a very conservative estimate of our model’s accuracy). Once the fight is completed, and we receive our returns (or not), we place a bet on the next fight that satisfies our criteria (if there are any).
It is important to note that we are only profitable if a) our model indeed is more accurate than 71.42% on this subset of fights and b) if we are staking a percentage of our bankroll in line with our models’ true accuracy (according to the Kelly criterion), or a conservative estimate of it. Therefore, it is vital that we monitor the performance of our model over time to ensure we remain profitable.
Results over three recent events
Given that so few fights meet our selection criteria, we may only be able to make one or two bets per event. Backtesting our approach on a large set of examples will take a while. Here, I have provided all the predictions made for three recent events (none of these fights were in the training, validations, or test sets). Out of all of these fights, only two met our criteria for placing bets. These are highlighted in green.
Overall, the accuracy seems in line with that achieved on the test set. Overall accuracy is 76%. Predictions on fight favourites are 79% accurate. High confidence (>80%) favourite predictions are 100% accurate. It’s important to bear in mind that this is only a small set of fights, and therefore these accuracy values are not necessarily representative of what we might expect in the long run. Nonetheless, this seems promising!
Scaling up and conclusions
How far can we go with this? Could we start some sort of sports-betting hedge fund and play around with a large bankroll? If you place bets with the major bookmakers and consistently return a profit, you will eventually get blacklisted or heavily restricted. Alternatively, you could use the Betfair exchange. Here, other users act as the bookmaker and set their own odds. The goal is to take money from people just as eager to take your own! You won’t be kicked if you consistently win, but they will take a cut of your profits. The amount you can bet on a fight depends on the amount other users have staked on the opposite outcome. This is a hard limit to how scalable this kind of approach can be. For example, for UFC Fight Night Kattar vs. Emmett, one could wager up to £1027 on Kattar at odds of 1.45 if placing a bet at 10:55 17/06/22 (a couple days before the fight).
This amount fluctuates leading up to the fight as more people place their bets.
Another thing to consider is stability. Calculating our stake based on the percentage predicted by the Kelly criterion will return maximum profits in the long run, but the road to those profits may be volatile — each fight we will be staking a significant portion of our bankroll. Users may prefer more stable behaviour and so it might be better to use a much more conservative percentage of our bankroll to stake for each fight.
It would also be a good idea to use a higher quality dataset to train our model. This would likely lead to improved accuracy, and allow people to have more confidence in our work.
With all this said, I’m not thinking too seriously about commercialising this algorithm (at the moment at least…). I’m quite pleased with it as a side project for now : ) ! However, if you’d be interested in working with me or have any suggestions, feel free to DM me on Twitter @ ciaranbench.
If you’re interested in seeing more, I post my network’s predictions on Twitter: @ third_ai_ .
Also, here is my website for more about me: https://ciaranbench.github.io
Thanks for reading!
Appendix: Scraping workflow
Acquiring fighter stats (UFCstats.com)
- Visit the page containing all fighters whose surname starts with A. http://ufcstats.com/statistics/fighters?char=a&page=all
- Scrape all fighter page hyperlinks from this page.
- Visit each fighter page, and scrape their stats.
- Save fighter name, and stats in an array that will contain all fighter data.
- Repeat steps 1–4 for fighters whose surnames start with B, C, D, etc. until all fighter data has been scraped.
Acquiring match data (UFCstats.com)
- Visit page containing links to all previous UFC matches. http://ufcstats.com/statistics/events/completed?page=all
- Scrape all match hyperlinks.
- Visit the page for each match. Scrape the date of the event, and the fighters involved with each match. The first fighter listed for a match is always the winner.
- Store this data in an array containing all match data.
- Stance is stored as a string (can’t be processed by our network). I used a dictionary to map stances to integer values.
Acquiring historical odds data (betmma.tips)
- Visit the site containing links to all past matches https://www.betmma.tips/mma_betting_favorites_vs_underdogs.php
- Scrape hyperlinks for each match.
- Visit the page for each match, and scrape the event date, fighter names, and their odds for the event.
- Save this information in an array.
Organising our dataset
Now we have all the information we need to construct our dataset.
- Refer to the array containing all match data to get the names of the fighters and the winner of each match.
- Refer to the array containing all fighter stats, and acquire their stats.
- Then use the event date and fighter names for each match to find the odds for each fighter from the array containing the odds data.
- Use the odds to determine the fight favourite.
- Subtract the stats of the underdog from the stats of the fight favourite. Store this in an array that will contain all examples.
- If the favourite was the winner for this match, save the ground truth in a separate array as [1, 0]. Otherwise, store it as [0,1].