Predicting League of Legends Match Outcome with Embeddings and Deep Learning

Yuan Tian
4 min readMay 24, 2018

--

I’m taking the Practical Deep Learning For Coders course taught by Jeremy Howard, and it has been a very fun learning experience. We learned about embeddings in lessons 4 and 5 of the first part of the course, and I thought it would be interesting to do a small project using embeddings.

I used to play League of Legends (LoL), which is a 5 v.s. 5 multiplayer online battle arena (MOBA) game. Each player selects and controls a champion that has a set of unique abilities, and the goal is to destroy the opposing team’s nexus. The map is divided into four sections including top, middle, bottom and jungle, and each lane has one champion assigned to it except the bottom lane which usually has two champions (attack damage carry and support) assigned to it.

In this project, I’ll try to predict LoL match outcome using pre-match information including players in each lane, the champions they select, team tags and a few additional features such as the year, season, and league a match was played, as well as match type. The dataset was downloaded from Kaggle (https://www.kaggle.com/chuckephron/leagueoflegends) uploaded by Chuck Ephron.

Data preparation

The first thing we need to do is formatting the data for deep learning. The information of over 7,000 matches is stored in a Pandas DataFrame `match`. Let’s take a look:

The bResult column will be used as our result column, 1 means the blue team won, while 0 means the blue team lost. Interestingly, the winning rate for the blue team is 54%, so the blue team may have a slight advantage over the red team. We will use 54% as our baseline accuracy.

Apart from the result column, there are 26 feature columns including league, year, season, match type, 2 team tags, 10 players and 10 corresponding champions. We can treat all of the 26 feature columns as categorical features, which would allow us to use embeddings.

Embeddings

We can use one-hot encoding to represent categorical features. However, sparse representation can use a lot of storage if a feature has many different values. Furthermore, the inner product of any two values is zero, thus we would not be able to capture the relationship between them. For example, the champion Garen may be more similar to Darius than Lux since both Garen and Darius are top lane champions while Lux is a support champion. But one-hot encoding would not be able to learn this information.

In contrast to one-hot encoding, embedding uses a set of features to represent each value. This would save storage as the number of features is usually smaller than cardinality. More importantly, we can use embedding to learn the relationships between values. For example, we can learn a champion embedding matrix where each row represents a champion and the columns may indicate their defense, health, speed and so on. We will learn a embedding matrix for each of the categorical variables including league, year, season, type, team tag, players and champions.

Champion Embedding Matrix

Neural Network

Next, we will extract the embedding vector for the values of the 26 feature columns, and vertically stack them together in one dimension. We will then feed the stacked vector through a neural network. Our final model will look like something below:

Results

After training the model for 3 epochs, the prediction accuracy for the training set and validation set is 73% and 67%, respectively. Although not very high, we did much better than the 54% baseline.

Future directions

I rely totally on embeddings in this project. I expect that by including additional information such player winning rates and champion abilities would further improve prediction accuracy.

Source code:

The source code for the project can be found at: https://github.com/naity/LoL

--

--

Yuan Tian

💻🧬Decoding life's data with AI & ML | Computational Biology (LinkedIn: www.linkedin.com/in/ytiancompbio)