$4,718 — Using Machine Learning to Bet on the NHL

Alex Ovechkin celebrates a goal with Semin and Laich

They say it’s the hardest trophy to win in sports. 82 gruelling games of the regular season. 16 wins in the playoffs. You need your Sidney Crosbys, your Duncan Keiths, and Jonathan Quicks. You need the right management, depth, grit, and even luck. Does your team have what it takes to win the Stanley Cup this year? As a Canucks fan, no. Instead, I’ll be looking forward to having an edge in my fantasy pool with my brand new machine learning model.

Yes, this is a 15 minute read, but it also works à la carte. 
Skip to results
Skip to betting
Skip to recap

Let’s get started

In this article, I will be walking through everything I learned from this project as well as the steps I took to get there. I won’t be going into code here, but if you’d like to follow along, you can do so as my entire project is on Github.

The simplest sports bet one can make is guessing which team would win. Now each team playing in an NHL game can have one of two results: win or loss. (An over-time loss still counts as a loss.) This value becomes the target variable, or the output. All the other variables we think contribute to the results of a match are called features, or the input. This is actually a perfect example of a supervised machine learning problem, specifically classification. How this works is we’ll feed in already labelled data into the machine. The machine will identify and learn different patterns and properties about this dataset. Then it will use this knowledge to predict the target of a completely new dataset.

Consider this simple example:

A very simple diagram of how classification works.

Here we tell the machine a list of win rates and their corresponding results as the labelled data. The machine does its calculations and a model is created. We can then input the features which is the win rate and it will output the predicted target. The performance of a machine learning model depends on how accurate it can predict the targets of unseen features. Notice how the model has never seen a example of a 12% win rate before.

Features

In practice, we’ll be needing a lot more features than the win rate. In my case, I chose 16 features which were a variety of basic and advanced hockey team statistics. Basic hockey statistics could be ones you can find on the back of a hockey card like save percentage and number of penalties. Some examples of the advanced statistics I used were based on Corsi, Fenwick, and PDO. Since the main goal of our model is to predict NHL games, we want to make sure that all the features we use are available before each game has started.

For those who aren’t familiar, Corsi measures the difference between total shots at the net for and against (including shots on goal, missed shots, and blocked shots). Fenwick is a variation on Corsi but excludes blocked shots. PDO is the sum of a team’s shooting percentage and save percentage meant to capture the presence of luck. There are many debates on the reliability of Corsi and Fenwick as an indicator of individual performance, but since we talking in the perspective of teams, we’ll keep it in.

Of course there are many other and probably better approaches to obtaining features. Some might try to predict the outcome of games based on aggregated individual performances or even twitter sentiment.

Getting the Data, Processing the Data

The information that was required for this approach was easily accessible. All I needed to do was build a web scraper in Python and call the NHL API. I chose to collect data over ten seasons from 2007–2018, except for the lockout season in 2012–2013 (regular season only, no playoffs). NHL.com actually offers stats all the way back to 1917, but of course the game was very different back then, so I stuck with more recent years. At this point in time, I had the season-to-date stats for each NHL game for all 10 seasons. Before training the model, I used pandas (a Python data manipulation library) to do some data processing:

  1. I brought all the features relative to the home team. If the Home Team had a Points Percentage of 0.8 and the Away Team had Points Percentage of 0.6, I would use Difference in Points Percentage: 0.2 in the final dataset.
  2. I removed all games before November 1 (roughly the first 12 games of the season) because of the small sample size available. Think about how you would predict the result of the first game of the season based on team stats. It would be inaccurate to use data from the previous season because of all the roster and management changes over the summer.
  3. I standardized all the features. This is common practice especially when dealing with features in different units. It essentially rescales them to have a standard normal distribution.

Machine Learning

Now that we have our data ready to go, we can start doing machine learning. First, we split our dataset into two: the training set (80%) and testing set (20%). As mentioned before, the training set would be set where we learn the properties of and the testing set is the unseen dataset in which we use to evaluate the performance of the model.

Classification Algorithms

Now, there are many different algorithms used for classification problems. In this case, I trained it against the following algorithms to see which one would perform the best:

Logistic Regression: 65.47% 
Support Vector Classification (SVC): 67.22% 
XGBoost: 65.69%

The percentage value beside each classification algorithm is the F1 score. The F1 score is actually a better measure than just using accuracy because it balances both precision and recall. In fact, accuracy alone can be quite misleading. Consider the following example (or skip ahead if this isn’t your thing!): This is a confusion matrix. It basically tallies up all the possible cases of a binary classification problem.

Confusion Matrix Example I

In this case, the accuracy would be the sum of all games guessed correctly divide by all games (TP + TN) / (TP + TN + FP + FN) which would be 66%.

What if we subbed out our current classifier and used one that just predicted L all the time.

Confusion Matrix Example II

Now our accuracy would be 73%! What’s going on here? This is actually called the accuracy paradox. It’s a great way of illustrating how just using accuracy can be misleading by not considering precision (out of all the times W is predicted, how often is it an actual W) and recall (out of all the times its an actual W, how often is W predicted).

Now What?

So SVC yielded the best results with an F1 Score of 67.22%, now what? Well 67% is significantly better than guessing, if the person guessing was not a hockey fan and was doing it completely randomly. But is it even better than the guesses of an actual hockey fan? Or just someone who had access to the season standings? For this, we turn to the people who’ve been trying to guess which team would win since the very first NHL game, bookies!

Robert De Niro as Sam “Ace” Rothstein, a brilliant bookie (Hulton Archive/Getty Images)

Vegas

Bookies, or bookmakers, are people who accept and pay out bets. You might remember De Niro from Casino, he was a bookmaker in Vegas. Most of this happens online now, you might have heard of these sites like Bet365 and Pinnacle. Well, we can look at the odds that were offered historically to see which teams Vegas thought were the favourites and which were the underdogs, and then compare our results with theirs.

For the purpose of this article, we’ll be using the prediction results of only the latest season (November 2017 — April 2018) which had an F1 Score of 69.94% and an accuracy of 59.30% (644 / 1086 games predicted correctly). Similarly, Pinnacle achieved an F1 Score of 69.97% and an accuracy of 60.95% (662 / 1086 games predicted correctly). It looks like our model was pretty good at guessing compared to Pinnacle. In fact, we were only short by 18 games! This doesn’t mean we always picked the same winners as Vegas though, it only happened 84.84% of the time.

I was surprised when I learned that Vegas couldn’t predict better than 61%, considering they probably used way more data than I did. So I decided to dig a little deeper. I combined all the odds offered by Vegas from 2007–2018 with the actual results and found that it was only 58.41% accurate. And if Vegas was a benchmark that showed which team was a favourite, and which team was the underdog, that meant 41.59% of all games in the NHL, are upset victories!

There’s actually a Vox video on this about how unpredictable hockey is compared to other team sports. The video details the research done by Michael Mauboussin. He attributes the heavy presence of luck in hockey to the combination of the small sample size of matches, small amount of scoring chances within a game, and the even distribution of ice time between all players. The example they used in the video is that Sidney Crosby only plays 20 minutes a game, compared to Lebron James’ 37.

Michael Mauboussin’s skill-luck diagram (Vox)

Betting

How much would we have made if we bet on every single game last season given what we know? We can find out from the historical odds Pinnacle offers, but first we we must understand how the Money Line bet works. It’s one of the simplest bets there is and the goal is to predict which team will win. It’s usually shown like this.

San Jose Sharks 🦈 (-160) vs. Los Angeles Kings 👑 (+140)

The minus sign indicates which team is the favourite. In order to win $100 betting on the Sharks, you need to bet $160. On the other hand, betting $100 on the Kings would yield $140. If you manually calculate the probability of the match result based on the odds, you’ll notice it doesn’t add up to 1, that’s where Vegas takes it’s cut. 💸 Simple enough?

We’ll assume we’re betting $100 per game, every game. How would we do? We’d make $1,609.04! Not bad, right? Unfortunately it’s on an investment of $108,600 (1086 games). That’s only an ROI (return on investment) of 1.48%. We’d do slightly better betting on the favourites every night, yielding $2,420.40, an ROI of 2.22%.

Can we make money with this? According to last season, we certainly can. But not a lot. I looked into this further and started to think about how much money I was putting in per bet. Using the previous odds as an example, if I had bet on the Sharks correctly, I’d make $62.50. However if I had bet on the Kings correctly, I’d make $140. Since our model predicts the favourites 84.18% of the time, and 62% of that time it’s correct, most of our correct predictions should be from guessing the favourite team. So it didn’t seem to make sense that we’d only be making so little from each bet.

Instead, I changed the betting strategy to bet whatever it takes to make $100 if correct. For example, I would make $160 on the Sharks or $71.42 on the Kings. This way, I’d bet more heavily on the favourites and less on the underdogs. The result was an increase in profit from $1,609.04 to $2891.30 and ROI from 1.48% to 1.81%.

Selective Betting

Now, 2% isn’t bad, but it’s not great either. The previous examples we used also assumed we bet on every single game (after November 1) which isn’t a good strategy. In this section, I’ll explore the effectiveness of selectively betting in given situations.

By Team

Betting based on a team seemed like a natural thing to do. Even if I had all the stats of every team, I’d be much more comfortable and confident betting on a team that I knew. Here is the final betting result for each of the 31 teams. This also happens to be sorted based on regular season standings.

The average ROI across all teams is about 1.74% but the interesting measure is not the mean, it’s the range (max and min). If we had chosen the Montréal Canadiens to bet on this past year, we could have made 16% returns. But if we had chosen the Los Angeles Kings, we would have lost almost 13%. If you notice the ROI’s of the best and worst team in the league, you’ll find that they both yield very positive returns. On the other hand, teams in positions 12th to 26th are full of negative returns. Again, since our model is trained to predict the better team to win, it turns out our ROI is moderately correlated with how often the favourite team wins. The correlation coefficient is 0.63.

Bet Based on Teams Performance

Teams don’t necessarily have to be good relative to their opponents in order for us to make money, they just have to play consistently. The less upset victories, the more boring hockey games might be, but the more money we would make. Taking a look at the top 11 teams yielding positive returns must indicate doing well does help though. Without the benefit of hindsight, I went back to this power ranking article from NHL.com, written October 3, 2017, a day before the first game of season was played. Out of top twelve teams listed in the article, six of them were incorrect. Most notable were the Edmonton Oilers and Chicago Blackhawks which both fell almost 20 spots from every sports publication article I found. I still don’t think this makes betting selectively by teams an impartial strategy. Upsets happen as we can see from the Vegas Golden Knights’ miraculous season but they don’t happen overnight. If we looked at the standings midway through the season on January 1, 2018, only three teams fell out of the top ten spots in the end.

By Time

Remember when we talked about dropping all games before November 1 because of the lack of sample size? Does it apply for later seasons too?

Everything in this chart looks pretty standard except for the 52.97% accuracy in November, which proves that a large sample size is really important. Other than that, 47% of all games in January resulted in an upset victory. Unfortunately, after running the same script for seasons from 2007 -2018, that was only an anomaly for this past year.

By How Often Favourites Win

We already found that ROI is correlated with how often the favourites win, so I decided to add that as a feature and retrain the model. It only improved a nominal 0.2% in F1 Score. So the model didn’t get any more accurate, but perhaps we can bet selectively on games where an upset victory is less likely to exist. The values on the x-axis is calculated by the multiple of how often each team is involved in games where favourites win. If the favourites win 50% of the time the Sharks are playing and 60% of the time the Kings are playing, the Combined Favourites W% (CFW%) would be 0.3.

Bets Based on Combined Favourites

The important thing to note is that if we bet selectively on games where the CFW% is more than 0.32, we would yield 2.58%, the highest we’ve seen so far! However, you might notice that the two rightmost points on the graph also has the most negative returns. This could be explained by the lack of sample size again. There are only 100 games played where the CFW% value is greater than 0.35 compared to the 932 games at 0.32.

By How Confident the Classifier Is

Whenever the classifier makes a prediction, it also provides how confident it is in this prediction. We can selectively bet on games where the classifier is more confident than a certain threshold. Here are the results. The ConfidenceDifference (CD) is calculated by the difference in confidence between the prediction of both classes. If the classifier is 50% confident on guessing Team A will win and 50% on guessing Team B will win, it’s not very sure at all. By taking the difference, we’re only looking at the games it’s pretty sure about.

Betting Based on Confidence Difference

The highest ROI recorded here is when one bets selectively on games where the CD is greater than 0.07 at 3.07%, even higher than the previous strategy. Again, the right most values yielding less than -10% returns are the result of a using a sample size smaller than 40 games.

By How Often Favourites Win and How Confident the Classifier Is

What if we combined the two previous strategies? We actually see a high ROI of 4.05% just by selecting games where the CFW% is greater than 0.32 and CD is greater than 0.07 (the optimal values from previous section). In total, we would have made $4,718.52 on $116,392.22. I was quite skeptical initially as this was just one season of test data and we might be making the mistake of overfitting, so I used the same strategy to see how would it do over the past three seasons. Using the exact same values of 0.32 and 0.07, I would have yielded 2.16%, which it is almost half, but it’s still consistently positive. The optimal values of CFW% and CD over the past three seasons are 0.36 and 0.07 respectively, yielding an ROI of 6.53%. However, in the 2017–2018 season, that would only been an ROI of 0.03%.

Recap

Here’s a recap so you don’t have to read this again.

  1. I built a model with an F1 Score of 67.22%, which is 18 games short of being as accurate as the Pinnacle betting site.
  2. In the past ten seasons in the NHL (2007–2018), underdogs win 41.59% of the time. Hence, the league is very unpredictable.
  3. ❌ Betting flat $100 on each bet.
    ✅ Betting to make $100 on each bet (more money on favourites, less on underdogs).
  4. Betting on teams that play most consistently works. Consistently meaning less upset victories.
  5. Our model is less accurate in November, when the sample size is smaller.
  6. When we bet only on games where an upset victory is less likely, we make more money.
  7. When we bet only on games where our model is more confident, we make even more money.
  8. When we bet on both of the previous conditions combined, we make the most money, 4.05% last year.

What’s Next?

I think this is a pretty good start on trying to predict NHL games. However, there is a lot that could be done in terms of improving it. Almost all the features I used were directly correlated with team performance, which is why the predicted results were quite similar to just betting on the favourite team on a betting site. Factors like injuries, winning streaks, teams playing back-to-back games, tough schedules, morale, line chemistry, starting goaltender, the arena, and the weather can affect the result of a game and were not considered in this project. As mentioned above, an interesting choice for a future project would be to use aggregated individual performance data, twitter sentiment, and even exploring the playoffs.

Would I make the bets listed in this article?

Before I answer, I should mention I’m not a expert on hockey, statistics, machine learning, or betting/gambling. Since this article was written retrospectively, there may be certain biases. All bets mentioned were hypothetical, but all data used was real and was sourced either from NHL or SportsBookReview. If you do choose to bet, please do so responsibly and make sure it is legal where you live.

Now that I’m able to identify the factors that affect each NHL game and having the stats to back it up, I am much more confident placing these bets than if I hadn’t pursued this project. There is no doubt if I decide to bet for fun this upcoming season, I would keep the strategies listed in this article in mind.

While working on this project and noticing the unpredictable nature of the league, I thought it might be better to create a model to assist with live betting (betting during a game). I don’t have proof yet, but comebacks must be more rare than upset victories (41.59% of the time), right? Also I might considering creating a model for sports where luck has a smaller presence.

That’s all. Thanks for reading this far. Good luck with betting if you choose to do so and Go Canucks Go! 🏒