Beating the Books: Using Machine Learning to Make Money Sports Betting

Ben Jensen
CodeX
Published in
10 min readMar 31, 2022

Here’s how my “just for fun” data science project made me a little extra cash…

Photo by MontyLov on Unsplash

If you want to dig into the code, I’ve documented the process in a Jupyter Notebook on my GitHub here:

Originally published in March 2022

In this article, I’m going to cover how I built a basic machine learning model to bet on the NBA. The quick version: I built a model based on sklearn’s GradientBoostingRegressor to bet over/under with a 58% success rate over 140 NBA games. The long version:

I am a grad student in Arizona State University’s MSBA program and a former Division 3 student-athlete at Beloit College. I graduated from Beloit College in the spring of 2021 with a Bachelor’s in Quantitative Economics and hopes of becoming a professional data scientist. I decided the best way for me to accomplish this was to come down to Tempe, AZ, and enroll in a Master’s in Business Analytics. When I moved from Wisconsin to Arizona in August of 2021 something magical happened; online sports betting went live in Arizona.

My goal in such a long introduction isn’t just to brag about myself, but also to qualify myself as a little knowledgeable and interested in the intersection between sports and data science. When online sports betting opened up in Arizona, I immediately began placing all sorts of small bets (college student budget) on almost every NFL game. I was quickly humbled and learned that my sports knowledge alone could not beat Vegas.

Getting started

By February 2022, I had enough coursework under my belt to start adding some science to my sports knowledge. After some practice building basic machine learning models using python’s sklearn library, I was ready to try them out in the real world.

I turned to the NBA because the LA Rams had just beaten Joe Burrow and the Bengals in the Superbowl and I didn’t want to wait until football season rolled back around to test out my skills.

Now that I had set my sights on betting on the NBA, I needed to make a couple of important decisions; where to get my data and what kind of bets to make.

For my data, I turned to the official NBA website. Instead of doing the smart thing and building a Beautiful Soup scraper, I was impatient and manually transferred 349 games’ worth of team stats and box score data into an excel workbook.

I picked over/under bets because they are based on a sum of both teams’ scores and not a difference and direction like spreads. Both of these decisions were inexact. Improvements could be made in the data collection process and the intentionality/theory of picking the bet type. With that said, I was itching to get going and this start was good enough for me.

Over/under bets

In data science projects, it is important to understand your problem thoroughly before attempting to solve it. The problem I have selected is whether or not I can predict if the total score of an NBA game will exceed a certain, predefined point total. This certain point total is generated by each sportsbook ahead of each game and is unique to that game.

For example, DraftKings Sportsbook could offer a line of 230.5 total points for a regular-season game between the Milwaukee Bucks and the Phoenix Suns. As the bettor, if I feel there will be more than 230.5 points scored by the Bucks and Suns combined, then I will bet the over.

An example result where I would win would be a final score of 120–116 in favor of the Suns (120+116=236>230.5).

Boiled down, over/under betting is a simple binary classification problem.

How much do you have to win to make money?

More than 53% of your bets at -110 odds. Most over/under bets are -110 odds, but I’ve seen them occasionally offered at odds from -105 to -120.

Why do I need to win 53% of my bets and not just 50%?

Because sportsbooks are businesses that are interested in making money. The way they do this isn’t by being smarter than you and predicting the outcome of sports games with more accuracy than you can. They could if they wanted to. They have the resources. Instead, they’ve figured out a better, safer way to make tons of money.

The (oversimplified) business model of the sportsbook is to split the public’s bets even right down the middle and take a cut from the winners’ winnings. The cut they take is built into the odds they offer on the bet. That is why -110 is such an important number.

What -110 odds in the American betting system means is that you need to wager $110 to win $100. This computes to winning $90.90 on a $100 bet at -110 odds.

Let’s imagine a sportsbook has 500 people who all want to bet $100 each on a certain game. The sportsbook picks a line that divides the people evenly into 2 groups. 250 people bet $100 on Outcome 1 at -110 odds. The remaining 250 people bet $100 on Outcome 2 at -110 odds. Now, the sportsbook has guaranteed a profit no matter what happens in the game.

If Outcome 1 happens they keep the $25,000 from the people who bet on Outcome 2 and only pay out $22,725 ($90.90 x 250) to the winners who bet on Outcome 1. The math is the same if Outcome 2 happens instead of 1. In both situations, the sportsbook makes $2,275 just for hosting the bet. This profit is truly risk-free if they can evenly split the money that is bet on each outcome (note: they don’t have to split the number of people evenly on each bet, just the money. I did that for simplicity of the illustration).

Among the sports betting community, there are many names for this cut the casino takes for brokering the bets. My favorites are “juice” and “vig”.

The juice is the reason why sports bettors need to win 53% of their bets to be profitable. It can be annoying but does not always have to be a bad thing.

The Plus Side of the Juice

Photo by Zlatko Đurić on Unsplash

There is good news. Sportsbooks are focused on splitting the money and not beating YOU at all costs. Instead of betting against a couple of huge multimillion-dollar businesses in a multibillion-dollar industry, you’re betting against the public.

The public is Larry. Larry is a Cowboys fan. Larry is going to bet on the Cowboys to cover the spread no matter what because he wants them to win. There are also a bunch of other Larrys out there who bet on the Cowboys to cover and pressure the sportsbooks to move their line. They’ll move the line so that the money can continue to be split down the middle and they can take home their precious juice.

When this line is moved one way by the public for an irrational reason, it leaves value on the other side. That window of value is where rational sports bettors have an opportunity. We aren’t beating the sportsbook, we’re beating Larry.

Now, don’t worry about Larry. You aren’t stealing his money. He doesn’t know you exist and he’s going to bet on the Cowboys anyway. Hopefully, it’s a responsible amount. The key is that there are a lot of Larrys doing the same thing.

On the flip side, smart sports bettors (sometimes called “sharps”) publish their takes, and a large number of people “tail” or copy their bets, which also creates a shift in the line. However, this shift brings the line closer to the reality of the situation and closes some of the value window opened by the Larrys.

Lastly, there is still some randomness introduced by the variance of outcomes in sports. That’s why sports are fun. We can’t always know who wins before the game starts, no matter what the numbers say. On any given day, Larry could win his bet on the Cowboys.

As rational sports bettors, we just want to be on the winning side more often than not in the long run. If we are winning our bets 53% of the time in the long run, then we are making money.

Getting Over 53% with Machine Learning

I’m going to leave most of the nuts and bolts of my model in the Jupyter Notebook on GitHub which I linked at the beginning of this article. Here I’ll describe 1) what it is 2) why it works.

Photo by Edge2Edge Media on Unsplash

What is the Model?

The model is a Gradient Boosting Regressor. This means it uses a fancy algorithm (gradient descent) to make a number prediction based on the factors you think are important (regression). The number prediction is the final total score of an NBA basketball game and the factors are the team stats I took from the NBA website.

The model is trained using data from games that have already happened. This allows the model to look at the important factors and compare them to the actual final score of that game and decide just how much each factor affects the final score. This process is where the machine learns.

If you’re curious about how machines learn I’d encourage you to do some exploring on the topic because it is quite cool. I have already done a little bit of my own studying on the subject and for this problem just needed to decide which learning method I wanted this machine to use and what to teach it.

Choosing which learning method I wanted the machine the use was trial and error. I tried several models including a Random Forest Regressor, Multilayer Perceptron Regressor, and a Gradient Boosting Regressor. The Gradient Boosting Regressor performed best in terms of my model evaluation metric, root mean squared error, so that’s the learning method I went with.

Deciding what you want to teach your model is called feature selection. It is the practice of including factors that you think are important in determining your outcome and leaving out factors that don’t. Luckily for me, sports fans and analysts spent years arguing about what stats are important, so the ones recorded on the NBA’s official website are pretty well-selected features. They include but aren’t limited to:

  • shots taken and of what type
  • shot conversion percentages
  • fouls committed
  • fouls drawn
  • rebounds
  • turnovers
  • blocks

I used these features as the per-game averages for the season, preceding the game that I wanted to predict the final score.

After training and testing, the model is ready to be used. All I have to do now is use the most up-to-date team stats and have the trained model predict what the total score of the game will be.

Reminder: I am betting over/under and not trying to have a more accurate score prediction than the sportsbook. My model is a regression model, but my task is binary classification. This just means that I need to predictions to be on the correct side of the line given by the sportsbook for each game. When the predicted score is greater than the line I bet the over. When it is less than the line I bet the under. Through about 140 games this method has had a 58% success rate. Flipping a coin would have a 50%. Remember that it only takes a 53% success rate to be profitable betting at -110 odds. This model is profitable with a 58% success rate.

Photo by Japheth Mast on Unsplash

Why it Works

The model works because it beats Larry. Based on the variance that naturally occurs in basketball, it would be unreasonable to try to predict what will happen in a game with a high level of certainty. What is reasonable is to expect to hit the value window that is left open by the public betting based on their feelings/other reasons that are less grounded in truth than our statistical approach.

If the majority of bettors were to switch their strategy and make more informed/data-driven decisions, this system would be much less successful.

Closing Remarks

I jumped right into betting with this model because I trusted it, was betting with tiny units (not much money), and didn’t have the historical betting line data to match back with the games I trained on. It would be better practice to evaluate the model’s performance against the task you’re using it for before deploying it. In this case, the model should be tested on the percentage of bets it wins before being deployed and not just the regressor’s RMSE.

Lastly, I encourage you to do any of your personal betting legally and responsibly. This was a fun exercise for me. It blended my learning with a personal interest. I‘m thankful for the success I’ve seen so far. With that said, this system relies on an interaction between my model, the betting market, and the nature of NBA games. While I might not change my model, I am not in control of the betting market or the future of basketball. If you want to make money with your money, create a diversified portfolio investing in the stock market. The returns are proven to be higher.

Photo by micheile .com on Unsplash

Thank you!

This is my first article published here. If you enjoyed it and hopefully learned something while reading, then please like and share with a friend. If you know anyone that is looking to hire a motivated data scientist fresh out of grad school, definitely share with them.

If you are the person who is looking to hire a motivated data scientist fresh out of grad school, I’d love to connect on LinkedIn:

--

--

Ben Jensen
CodeX
Writer for

I am a professional data scientist and enjoy sharing my data-driven takes and analysis regarding sports.