Playing around League of Legends with Neo4j — Prologue
It has been a while (few years actually) since I wanted to use Neo4j, but until now I couldn’t find an interesting dataset to get my hands on.
A few days ago, the League of Legends European Championship (LEC) 2019 Summer Regular Season just ended and this is the time, looking into playoffs coming up next week, to look back on what happened during the last ten weeks. I came across a site which gathers a lot of interesting information about the League of Legends competitive scene and decided to scrape some data from here.
This post aims to import simple data and to build a small graph in Neo4j, then execute some Cypher queries and see it we can get some insights from the data.
1. Build the graph
1.1. Import the data
You can find the data in the Github repo.
Copy the 3 files into the import
folder of your Neo4j installation, then run the following queries.
Note : I didn’t include the scraping part in this post because I am planning to write a dedicated post about it in the future (after I clean up my code).
The teams
LOAD CSV WITH HEADERS FROM "file:///teams.csv" AS row
MERGE (t:Team { nameShort: row.nameShort, nameFull: row.nameFull })
10 teams are participating in the 2019 LEC Summer Season, namely G2 Esports, Fnatic, Splyce, FC Schalke 04, Rogue, Team Vitality, SK Gaming, Origen, Misfits Gaming and Excel Esports.
The matches
LOAD CSV WITH HEADERS FROM "file:///matches.csv" AS row
MATCH (t1:Team { nameShort: row.team1 }),
(t2:Team { nameShort: row.team2 }),
(winner:Team { nameShort: row.winner })
MERGE (m:Match { date: row.date, team1: row.team1, team2: row.team2 })
MERGE (t1)-[:PLAYED]->(m)<-[:PLAYED]-(t2)
MERGE (winner)-[:WON]->(m)
91 matches were played this season so far over 9 weeks, with an additional tiebreaker game at the end. We link the match to the teams with a PLAYED
relation and add an additional relationship representing which team WON
.
The casters
LOAD CSV WITH HEADERS FROM "file:///casters.csv" AS row
MATCH (m:Match { team1: row.team1, team2: row.team2 })
MERGE (c:Caster { name: row.casterName, type: row.casterType })
MERGE (c)-[:CASTED]->(m)
Lastly, we create the caster nodes and add a relationship to the matches they CASTED
.
1.2. Check the graph
The schema is very simple, so let’s look a bit closer to one match.
MATCH (m:Match)
WITH m
LIMIT 1
RETURN m, (m)--(:Team), (m)--(:Caster)
A match (purple node) was played on June 7 by Splyce and G2 Esports (orange nodes), and eventually won by G2 Esports. Ender and Quickshot (blue nodes) were casting this match.
2. Sample Queries
2.1. Number of matches casted per Caster
MATCH (m:Match)
WITH COUNT(m) AS numberOfMatchesTotal
MATCH (c:Caster)-[:CASTED]-(m:Match)
RETURN c.name AS casterName,
COUNT(m) AS numberOfMatchesCasted,
COUNT(m) * 100.0 / numberOfMatchesTotal AS percentageOfMatches
ORDER BY numberOfMatchesCasted DESC
Looking from the bottom of the table, PapaSmithy and Foxdrop actually casted only one week worth of matches, so we expected to see them at the bottom. Though, I was surprised that Quickshot casted so few matches compared to the others casters (20 matches compared to 30+ matches of the top 5 casters).
2.2. Teams casted by Medic & Vedius
MATCH (:Caster {name: "Medic"})-[:CASTED]->(m:Match)<-[:CASTED]-(:Caster {name: "Vedius"})
WITH m
MATCH (t:Team)-[:PLAYED]->(m)
RETURN t.nameFull AS teamName,
COUNT(*) AS numberOfTimesCasted
ORDER BY numberOfTimesCasted DESC
In League of Legends, usually two casters comment together and the better their synergy, the more they make the game entertaining. I have a personal preference for the “MediVedi” duo, composed of Medic and Vedius.
I was interested to know if we could find something interesting among the teams they casted together. We can see that they casted more SK Gaming and Origen compared to Excel Esports and Team Vitality.
2.3. Matches which had a tri-cast
MATCH (t:Team)-[:PLAYED]->(m:Match)<-[:CASTED]-(c:Caster)
WITH m,
COLLECT(DISTINCT t.nameFull) AS matchup,
COLLECT(DISTINCT c.name) AS cast
WHERE size(cast) = 3
RETURN m.date AS date,
matchup[0] + " VS " + matchup[1] AS Match,
cast AS Cast
ORDER by date ASC
Since tri-cast usually happen when a match is important, it is understandable that the more you progress into the season, the more matches matter. By looking at the results from the Cypher query, we notice that the number of matches that was tri-casted was higher in August, which also corresponds to the last month of the season.
Noticeably, Vedius was part of 14 among the 21 casting trios.
2.4. Casters who never casted together
MATCH (c1:Caster),
(c2:Caster)
WHERE id(c1) < id(c2)
AND NOT( (c1)-[:CASTED]->(:Match)<-[:CASTED]-(c2) )
RETURN c1.name,
c2.name
Because Neo4j is built around relationships between entities, we can also look for relationships that don’t exist. This is useful when building a recommendation engine for example.
In this case, “casters who never casted together” means that there is no relation (c1)-[:CASTED]->(:Match)<-[:CASTED]-(c2)
.
3. Conclusion
Using Neo4j to build a graph from simple data, I was able to play around simple Cypher queries and get some interesting insights from the data. Some of the queries are, I think, actually very hard to do in SQL, but that would deserve its own post.
In the next steps, I will build a more complex graph using richer data such as players, champions, picks & bans, items builds, detailed stats for each match, and add other regional leagues and international competitions to do even more interesting analysis.