Sports Analytics Framework in Python (Part-1)

A data-driven system to make smarter bets

Yavuz Selim Sefunc
Analytics Vidhya
4 min readDec 29, 2020

--

Photo by Thomas Serer on Unsplash

Football is the most popular sport worldwide. It is a global game that connects almost every single person on the planet. It has been a part of my life since I was a child. I’ve watched almost every match so far.

Over the last couple of months, I have been working on a project called betting analytics on sports (football) matches. I want to implement a simple betting framework in Python. It has two parts which are data-preprocessing and exploratory data analysis. (Part- 2) In this part, I want to go through how to preprocess the football data turn into action.

What is the Sport (Betting) Analytics and use cases?

Sports analytics is actually a broad field and there are a lot of different types of roles that a sports analytics professional can have. I want to touch between data analysis and betting analytics.

Finding a dataset

It can be quite hard to find a specific football matches dataset but, I found that there is a website that provides fresh and historical football results which include 27 season results and 20 season match stats “http://football-data.co.uk/data.php”. All word leagues are available.

The datasets have 22 variables. Only the English premier league has a referee column. The data dictionary has “http://football-data.co.uk/notes.txt.”. Some variable dictionary table are below;

  • FTHG = Full-Time Home Team Goals
  • FTAG = Full-Time Away Team Goals
  • FTR = Full-Time Result (H=Home Win, D=Draw, A=Away Win)
  • HS = Home Team Shots
  • HST = Home Team Shots on Target
Example dataset from Premier League (1)

The most popular type of bets on football, the aim of which is to predict the final result of the match, either the win of the home team (1), a draw (X), or the win for the guests (2). But, now you have lots of options to play. I made a popular listed below. For example, An over/under 2.5 goals bet is placed on the 2.5 goals market. This is where you select one of two things to happen, under 2.5 goals means two goals or less, over 2.5 goals means three goals or more.

Coding part

I can show some examples of how to code bet options such as making statistical under/over 2.5 goals half or Final, mutual goals, MS_1_under_1_5, and change to the numeric column for each team strategy.

Data preprocess part (2)

You can apply the same format coding. 59 variables are creating. After the preprocess dataset, the below gif is creating.

dataset gif (3)

Team Analysis Part

After the data wrangling completed, creating a function called the name is “probability”. It is a function that creates an information card for each team. These functions show the percentage of the win-lose draw and other bet options in detail. You can find a percentage of types of bets with a sorted style.

Ipywidgets are interactive HTML widgets for Jupyter notebooks which easy access to information. Users gain control of their data and can visualize changes in the data. I have a slider and dropdown menu with a team list you can manage easily.

Implementing information card for each team (4)
Sample dashboard (5)

When you run the code, you will see ipywidgets at end of the code;

  • The first option is that 1(home) 2(away) 3(all matches) analysis.
  • Choose your first team analyze
  • Choose your second team analyze
  • Finally, you see both team results to analyze deciding which one best option is bet. The gif below explains how the system works.
You will see the sample dashboard like this. (Note: You can the split cells notebook extension to split vertically to prevent extra white space on the right-hand side of your Jupyter Notebooks which help nbextensions) (6)

Summary

In this article, we went through how to implement data-driven betting analytics in Python. All code in my Github account. Thank you for reading. Please let me know if you have any feedback.

For those interested, continue reading part 2 in this series following the link above where I dive into exploratory data analysis by using data visualizations to find out variables that analyze football data.

Part 2: Follow the link below.

--

--