Predict UFC Fights with Deep Learning II — Data collection and implementation in PyTorch

Yuan Tian
2 min readNov 26, 2018

--

Since the publication of my first post “Predict UFC Fights with Deep Learning”, I have received many requests for the datasets used in the project. Therefore, I have decided to open source the code for scraping the data in this follow-up post. Additionally, I will implement a neural network for UFC prediction in PyTorch.

Data collection

We will need two types of data to predict UFC fights. First, we need the information of each UFC bout such as the opponents and result. Second, we need UFC fighter records and statistics including their striking and grappling. Lucky for us, FightMetric hosts both types of data. Thus, we will write two web crawlers using Scrapy — one for bouts and one for fighters—to scrape the data from FightMetric. Both crawlers have been added to the GitHub repository for this post and can be run by first cding to the corresponding directories and then typing scrapy crawl ufc_bout and scrapy crawl ufc_fightet.

Once the two crawlers are done, the data can be exported from Postgresql to CSV files using \copy (SELECT * FROM ufc_bouts) to ‘path\ufc_bouts.csv’ with csv header; and \copy (SELECT * FROM ufc_fighters) to ‘path\ufc_fighters.csv’ with csv header;.

Next, we will clean up the two datasets using Pandas and merge them together following the procedures in the preprocessing notebook. The final dataset that will be used for deep learning looks like below:

The final dataset, saved in the file named ufc_combined.csv, contains 4591 bouts and 41 categorical and continuous features.

Neural network in PyTorch

In the original post, I used Keras to implement the neural network. Here, we will use PyTorch instead because I have been following the fastai course which builds upon PyTorch.

Since there are two types of features in the datasets, we will use category codes for the categorical features and normalize the continuous features, after which the dataset will look like below:

We will then build a neural network like below:

Picture was inspired by yashu seth.

We will use embeddings for categorical features such as fighter names, weight classes, and fighter stances. The implementation details can be found in the notebook named ufc_prediction.ipynb in the GitHub repository for this post.

I trained the model for 4 epochs and finally evaluate the model on the test set:

The accuracy on the test set is about 69%, which I hope can be significantly improved in the future.

Source code

The datasets and source code are available on GitHub at https://github.com/naity/DeepUFC2.

--

--

Yuan Tian

💻🧬Decoding life's data with AI & ML | Computational Biology (LinkedIn: www.linkedin.com/in/ytiancompbio)