Predicting Rocket League Match Outcome With Machine Learning

Walker Payne
CodeX
Published in
10 min readNov 16, 2021
Figure 0. Alternative Data (source)

I’ve always been fascinated by the concept of an “informational advantage”, specifically in the finance world. When conventional economic data taught in textbooks is the status quo, investors (mainly hedge funds) seek alternative data sources to gain an edge over their competitors.

Examples of these alternative data sources are compelling and seemingly endless:

  • Swiss investment firm UBS Investment Research uses satellite imagery to estimate occupancy rates of Walmart parking lots, extrapolating quarterly sales and thus gain useful investment insights. For example, consistently empty parking lots could signal poor foot traffic, indicating that a fund might want to short the stock.
  • Thinknum, an alternative data provider, uses web-scraping data to analyze vehicle inventory at Carvana and CarMax. Funds can act on this data to predict upcoming earnings for these companies, or even to assess demand in the broader auto industry for vehicles by manufacturers of interest like Tesla or Ford.
  • Two Sigma, a quant hedge fund with $60 billion AUM, has a Macro Alpha Capture Platform that pays investment professionals to submit trade ideas with a timeframe, rationale, and conviction level. This information is then (presumably) used in various machine learning models to either execute trades or provide insights to Two Sigma fund managers.
Figure 1. eSports betting volume by game. Source.

With alternative data sources in mind, I decided I wanted to implement my own attempt at creating an informational advantage.

Sports betting is predicted to be a 140 billion dollar industry by 2028 and happens to be ripe with exploitable, underutilized datasets. The umbrella of “sports betting” also includes eSports, which features games such as CS:GO, League of Legends, Dota 2, Starcraft, and others (Figure 1).

One game that’s not on this list but may well be in the future is Rocket League (RL). It’s my favorite game at the moment and can be explained quite easily — it’s basically soccer but with flying cars instead of people. If that sounds silly, well, it kind of is, but it’s a beautiful game — a simplistic concept combined with an incomprehensibly high skill ceiling. And with nearly 100 million monthly active players, it’s nothing to laugh at. It’s no stretch to think that RL will have some widely available forms of betting in the near future.

With this in mind, my goal began to crystalize: to exploit publicly available data in a unique way and attempt to predict Rocket League match outcome, creating an informational advantage for a future sports betting scenario.

This exercise serves as a great analogy to the broader financial world and their utilization of alternative datasets. Is the data publicly available? Sure. Does the average retail investor have the knowledge, skillset, and capital available to scrape, aggregate, and process that data into actionable insights? Absolutely not. Fairness, it seems, is out of the question.

The next three sections go into the technical details of scraping and exploring the dataset and machine learning model evaluations. Feel free to skip to the end for “lessons learned”.

Alternative Data Sources: Scraping the Web

My idea was to gather data on RL 1v1 matches and the players involved in each. To make a predictive model, I needed input data:

  • Player 1 and player 2 statistics — win rate, total number of wins, Match Making Rating (MMR), goal/shot ratio, etc. (goal/shot ratio defined as the number of goals scored by a player divided by the number of shots they take, which is a proxy metric for accuracy. More on this later.)

and output data:

  • game outcome (win or loss, from the perspective of player 1).

I would then use the input data to train a machine learning model, which would output the probability of a player winning or losing in a 1v1 match. If the accuracy of the model is high enough to warrant usefulness, it could be used in the future to inform betting on matches.

Having outlined what data I needed, I ended up scraping two separate websites. It would be super convenient if all of the above information was served up on a silver platter for me, but unfortunately I had to work for hard for this informational advantage. The two websites I scraped are described below:

  • Ballchasing.com, which allows users to upload replays of their own matches and automatically analyze them. From here I gathered match information and player names — player1_name, player2_name, and player1_outcome.
  • Rocketleague.tracker.network, which automatically tracks player profiles and creates summarizing statistics for each. From here I gathered skill stats for both players— total wins, MMR, and goal/shot ratio.

The code is a rats nest that not even I know how it works, but if you’re sadistically curious you can check it out on my GitHub. I’ll describe the process on a high level here:

  • Ballchasing.com has an API. Using Python packages like JSON, requests, and pandas, I accessed JSON data for 200 pages of the website and scraped the relevant information into a dictionary, converted it to a DataFrame, and saved it as an excel spreadsheet. This resulted in about 40,000 data points, one for each match.
  • I took that data and used it to parse Rocketleague.tracker.network, which unfortunately did not have a public API. I imported the data from Ballchasing.com, iterated through each match and constructed custom URL’s for each player based on either their steam profile or xbox/psn/psynet name. Then for each profile I scraped the player stats into another DataFrame and excel spreadsheet.

The end result can be seen in Figure 2 below.

Figure 2. Dataset resulting from scraping various websites. (Image by author)

Exploratory Data Analysis & Feature Engineering

The next step is to clean the data and engineer some new features to use as inputs for the machine learning model. After removing data with null values (e.g. players with private profiles or deleted accounts, or simple HTTP request timeouts), I ended up with about 10,000 useable data points.

I also like to explore the dataset and identify any interesting trends or relationships among variables. A quick way to start off the exploration is to get summary statistics:

From the above, a few interesting points stand out:

  • Player 1, the one who uploaded the replay to Ballchasing.com, had on average 5954 wins. Their average MMR was 1240.
  • One player in particular had 35,000 wins. Their name is Scripts_RL, and while they’re in the top 0.1% of players in terms of total number of wins, they’re only in the top 40% of players in terms of rank. (For reference, I have about 3,000 wins and am in the top ~1% of players by rank. I’m not sure if this should be a source of pride for me or humility for them.)
  • Assuming a match length of about 7 minutes and a 50% win rate, that equates to 8,000 hours of in-match play time. Since RL’s release in July of 2015, this equates to nearly four hours of game play, every single day, for six years straight. Wild.

It’s also important to check for a balanced dataset, which can be done quite easily:

Then I moved on to the relationships between various features, such as MMR and goal/shot ratio (Figure 3).

Figure 3. Goal/shot ratio vs. skill rating (MMR). (Image by author)

From the graph we can see a clear negative correlation between MMR and goal/shot ratio. It makes sense that this metric should decrease as skill level increases, because a skilled player will save or block more shots on their net than an unskilled player. Because of this relationship, I may want to exclude the feature from my model training process to avoid multicollinearity.

To train the ML model, I needed to do some feature engineering to create predictors based on my scraped data. The final dataset, ready to be split into training and testing subsets, can be seen below:

Model Fitting & Results

I used two different models to predict match outcome. It’s trivial to play around with various models when you have non-production code and static data, so I chose to implement both XGBoost and scikit-learn’s Logistic Regression.

XGBoost is a relatively new algorithm (released in 2014) which gained popularity due to its success in various machine learning competitions. It’s an easily accessible “implementation of gradient boosted decision trees designed for speed and performance”. The algorithm can be used for both regression and classification problems (among others). (Note that XGBoost, a decision tree based algorithm, does not require feature scaling/normalization.)

Logistic Regression is a supervised machine learning classification algorithm that predicts the probability of the dependent variable being 0 (in our case, a loss) or 1 (a win).

A significant difference between the two is that XGBoost is an ensemble model (new models are added sequentially, improving on previous models until a certain point) whereas Logistic Regression is a linear model. Additionally, XGBoost is much harder to interpret than Logistic Regression, which is an important consideration depending on the topic at hand.

The results are as follows:

  • XGBoost yielded an accuracy of 57%.
  • Logistic Regression yielded an accuracy of 57% as well.

Which brings up the question — what level of accuracy is acceptable? In this instance, my goal is for the model to be more accurate than a human at predicting the winner of a match. To establish this baseline, I manually classified around 100 matches based on the same data that the model is trained on. Using bootstrap sampling, I achieved an average accuracy of… 57.3%.

Figure 4. Player 1 vs. Player 2 total wins. (Image by author)

My results using these two models bring up a fundamental and oft-overlooked issue: machine learning is not an appropriate tool for all datasets. If a well trained human cannot satisfactorily classify the data points, you shouldn’t expect a machine learning model to do so either.

There are a few issues with the dataset I’ve gathered that explain this relatively poor accuracy. First, take a look at Figure 4 to the left. Each point on the graph represents a single match between two players. Orange points indicate that player 1 won, blue points indicate that player 1 lost. Would you be able to group the data into orange or blue points by simply looking at the graph? I know I couldn’t — there are no distinguishable trends or linear breakpoints that would indicate whether player 1 should have won or lost. We shouldn’t expect an ML model to be able to magically make these predictions either.

Another issue arises when we consider what is actually expressed by each feature, knowledge that many might say requires “domain expertise”. I chose to scrape player MMR, or Match Making Rating, and use it to train the ML model. It sounds like a good idea at first glance, but when we stop to think about it, the usability of MMR in predicting match outcome starts to unravel. Rocket League assigns an MMR that is updated after every match based on the difference between your MMR and the opposing players’ MMR, and whether you won or lost. In general, the reason MMR exists is to match players against other players of the same skill level. It follows that in most 1v1 matches the players will have similar if not equal MMR. The insight provided to the model is quite difficult to parse — MMR difference doesn’t have an easily discernable relationship with match outcome if the vast majority of matches are paired up evenly.

Despite the relatively poor accuracy of the models, they may not be entirely unusable. You can evaluate the variability of the performance of the model using k-folds cross-validation. If variability is low for each model (e.g. the accuracy of each model hovers around the same value {57.5%, 56.7%, 57.2%, etc.}) then it might be safe to use the model in production to place actual bets. To elaborate, imagine you are 95% confident that your model accuracy is 57% plus or minus a few percentage points. This means that over enough bets your odds of being correct will approach 57%, so you would be executing bets with positive expected value. Conversely if you are less confident in the variability of the model outputs you should beware utilizing it in any real capacity.

Lessons Learned

Alternative data sources are ripe for exploitation, especially on the internet. Web scraping is a powerful and useful skill to obtain, allowing you to generate unique datasets that may have never been utilized before. Though this example is fictitious, there may be similar datasets out there that are more expressive and directly actionable, creating an informational advantage that results in real gains.

To reiterate, machine learning is a very valuable and useful tool but its applicability is not universal. If the features in your dataset are not descriptive enough for a human to predict the outcome with high accuracy, then you cannot expect a machine learning algorithm to do any better.

And lastly, even if your model accuracy is poor you may learn something in the process that makes you more knowledgeable and better prepared to take advantage of future opportunities. As Matthew McConaughey once said,

“I’ll take an experienced C over an ignorant A any day.”

Thanks for reading!

--

--

Walker Payne
CodeX
Writer for

skeptical data scientist | ex nuclear engineer