Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Predict Matches in League of Legends While Learning PyTorch Basics

5 min readMay 31, 2020

--

By Richard So

League of Legends is one of my all-time favorites, even if I honestly do suck at it. LOL is an extremely competitive MOBA, where two teams of 5 (the blue and red team) are pitted against each other to destroy the base (nexus) opposite of theirs. Winning usually requires a great amount of teamwork, coordination, or for a tilted player, “luck”. Regardless, it is not too hard for a League player (even if they’re pretty new) to tell which team’s probably going to win, based on the number of kills, deaths, and numerous other stats that the game keeps track of. Something that a machine learning model can predict……(this is supposed to be a badly executed foreshadow of this article)

I’m starting to learn a little bit of machine/deep learning with PyTorch as well (through this PyTorch Zero to GANs course). One of the course assignments was to build a regression model with PyTorch for some use case scenario, and thus, here we are now! Sure, we could use scikit-learn and call it done, but the point for me (and you) is to learn the basic fundamentals of building PyTorch models. This will be a walkthrough of my journey as I learn PyTorch by building a simple League of Legends Match predictor, which would determine the chances of either team winning with reasonable accuracy.

xkcd

The Dataset

First, I used a Kaggle dataset of about 10 thousand League of Legends ranked matches. The data for each match was collected within its first 10 minutes. All matches were played with players around the ranks of high Diamond to low Master. Luckily, there were no missing values I had to fill, nor were there any non-numeric data that I had to one-hot encode. All in all, not so much data manipulation was needed.

To start off, we’ll import all the necessary libraries:

And we’ll see what the data looks like:

df.read_csv('high_diamond_ranked_10min.csv')

There are 40 columns of this dataset, but some of which are actually redundant. For example, if you know the kills made by blue team (blueKills), that almost directly translates to deaths by the red team (redDeaths ). Based on this deduction, we only need one of these two.

df.drop(['gameId','redFirstBlood','blueTotalGold','redTotalGold','blueTotalExperience','redTotalExperience','redGoldDiff','redExperienceDiff','redKills','redDeaths'], axis=1, inplace=True)

This leaves us with only 30 columns left, including the column of labels. The label of this dataset (what we want to predict) is the column blueWins , a 0 or 1 denoting if the blue side won within the remainder of the game. So finally, we separate the labels from the features…

targets = df[['blueWins']].values #only the `blueWins` column
features = df.drop('blueWins', axis=1).values #all the other columns

…split the data (with a 80/10/10 split for training, validation, and testing) and turn them into PyTorch tensors…

test_size = int(.10 * 9879) # 10% of the total size of the dataset
val_size = test_size
train_size = 9879 - test_size*2
train_size , val_size, test_size
dataset = TensorDataset(torch.tensor(features).float(), torch.from_numpy(targets).float()) # turning arrays into tensorstrain_ds, val_ds, test_ds = random_split(dataset, [train_size, val_size, test_size]) # doing an 80/10/10 split on the data

…and make DataLoader objects out of them with a batch size of 128.

batch_size = 128#making data loader objects out of the splits
train_loader = DataLoader(train_ds, batch_size, shuffle=True)
val_loader = DataLoader(val_ds, batch_size)
test_loader = DataLoader(test_ds, batch_size)

The Model

Well, here is where the “fun” begins! It’s time to make the logistic regression model using PyTorch. Here is my train of thought while building this:

  • A logistic regression model only needs one nn.Linear layer in our PyTorch model, accepting 29 inputs, and outputting 1: nn.Linear(29, 1).
  • Since we are predicting a value from 0 to 1, using nn.Sigmoid() would narrow the model’s output to that range.
  • How will we figure how “wrong” the model is, other than just an accuracy metric that is likely not going to be differentiable (this would mean that the gradient can’t be determined, which is necessary for the model to improve during training)? A quick look at the PyTorch docs yields the cost function: binary_cross_entropy.
  • Because an accuracy metric can’t be used for model training, doesn’t mean it shouldn’t be implemented! Accuracy in this case would be measured by a threshold, and counted if the difference between the model’s prediction and the actual label is lower than that threshold.
  • We want to track the validation losses/accuracies after each epoch, and every time we do so we have to make sure the gradient is not being tracked.
  • We want to print out the mean validation losses and accuracies every few epochs.

With all of that said, here is the implementation:

Defining some variables for use later.
The python class for our model, `LOLModel`.
The accuracy function.

Then, we need to make a training loop, and decide which optimizer to use. For the optimizer, I went with Adam, after trying SGD first:

Defining the training loop, defined by the `fit()` function.

We define an evaluate() and fit() function, the latter being the main training loop that:

  1. Inputs a batch of data into the model
  2. Applies a gradient with a set learning rate
  3. Test the model against validation data, yielding validation loss/accuracy
  4. Appending the data into an array history for plotting

Training & The Final Reveal

In summary, for the hyperparameters, I trained for 750 epochs, with a learning rate of 0.0001 :

model = LOLModel() #initiate the model
Let’s make sure data passes through the model properly first.

The final validation loss and accuracy is 0.5494 and 0.7209 , respectively (Accuracy in this model is measured from 0 to 1).

Here, we can see the change in validation accuracy as the model iterates through ever epoch. There was a sharp increase at approximately epoch #150, and hovers at around .8 accuracy since.

Now, here comes testing the model over the last 10% split (drumroll, please!):

Ignore ‘val_loss’ and ‘val_acc’!

Using evaluate() , we found that the model is about 74.5% accurate when predicting LOL Match outcomes, not too bad!

Conclusion

All in all, I consider this a success for me; I’ve learned a lot more about the fundamentals of making models in PyTorch, even if the model’s a simple one. And what’s cooler: I had a lot of fun seeing something I made work out decently well in the end! Hopefully, you enjoyed reading the process just as much as I did making the model. Anyways, happy coding (or keep on playing League)!

If you want the source for the jupyter notebook used for this mini-project, look here: https://jovian.ml/richardso21/lol-logistic.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Richard So
Richard So

Written by Richard So

https://sorichard.com | BS/MS CS @ Georgia Tech, Class of ’25. Pursuing everything code. Always learning!

Responses (3)