Now Available: World Cup 2018 Graph

Mark Needham
Neo4j Developer Blog
4 min readJun 20, 2018

World Cup 2018 is into its 2nd week and we felt it was about time that we refreshed the previous World Cup Graph that we created for the World Cup in Brazil 4 years ago.

Image courtesy of https://www.behance.net/gallery/20875693/Russia-World-Cup-2018-Branding-Identity

Graph Model

There have been some tweaks to the graph model so let’s have a look at our new and improved model for 2018.

World Cup Graph Model

The WorldCup node sits in the middle of our graph and all other parts of the model revolve around that. We have one WorldCup node for each tournament.

Next up we have the host Country which is connected to the WorldCup node by a HOSTED_BY relationship. Matches also belong to a WorldCup and each Country names a Squad of Players that will represent them in a WorldCup tournament.

A player is connected to an Appearance node for each match that they participate in either as a starter or substitute. If they score a Goal the Appearance node will connect to that Goal node.

Show me the data!

We have a Neo4j browser guide that you can use to import the data into your own, local Neo4j instance if you want to play along.

:play worldcup-2018

Let’s have a look at some of the queries that we can run against this dataset:

Who’s hosted the World Cup?

MATCH path = (wc:WorldCup)-[:HOSTED_BY]->(country)
RETURN path

Most of the World Cup nodes have a single host Country. The outlier is 2002 which was hosted by Korea and Japan.

Who’s hosted the World Cup more than once?

MATCH (host:Country)<-[:HOSTED_BY]-(wc)
WITH wc, host ORDER BY wc.year
WITH host, count(*) AS times, collect(wc.year) AS years
WHERE times > 1
return host.name, times, years
╒═══════════╤═══════╤═══════════╕
│"host.name"│"times"│"years" │
╞═══════════╪═══════╪═══════════╡
│"Mexico" │2 │[1970,1986]│
├───────────┼───────┼───────────┤
│"France" │2 │[1938,1998]│
├───────────┼───────┼───────────┤
│"Brazil" │2 │[1950,2014]│
├───────────┼───────┼───────────┤
│"Italy" │2 │[1934,1990]│
└───────────┴───────┴───────────┘

This one is interesting because you might expect Germany to show up as they hosted the World Cup in 1974 and 2006. The reason they don’t is because the first World Cup was pre-reunification when they were playing as West Germany which is represented by a different node in the dataset.

Have the hosts ever won the World Cup?

MATCH (match:Match {round: "Final"})<-[hostPlayed:PLAYED_IN]-(host:Country),
(host)<-[:HOSTED_BY]-(worldCup),
(worldCup)-[:CONTAINS_MATCH]->(match),
(match)<-[oppositionPlayed:PLAYED_IN]-(opposition)
WHERE (hostPlayed.score > oppositionPlayed.score)
OR (hostPlayed.penalties > oppositionPlayed.score)
RETURN host.name, worldCup.year, hostPlayed.score + "-" + oppositionPlayed.score AS score, opposition.name
ORDER BY worldCup.year
╒════════════╤═══════════════╤═══════╤═════════════════╕
│"host.name" │"worldCup.year"│"score"│"opposition.name"│
╞════════════╪═══════════════╪═══════╪═════════════════╡
│"Uruguay" │1930 │"4-2" │"Argentina" │
├────────────┼───────────────┼───────┼─────────────────┤
│"Italy" │1934 │"2-1" │"Czechoslovakia" │
├────────────┼───────────────┼───────┼─────────────────┤
│"England" │1966 │"4-2" │"Germany FR" │
├────────────┼───────────────┼───────┼─────────────────┤
│"Germany FR"│1974 │"2-1" │"Netherlands" │
├────────────┼───────────────┼───────┼─────────────────┤
│"Argentina" │1978 │"3-1" │"Netherlands" │
├────────────┼───────────────┼───────┼─────────────────┤
│"France" │1998 │"3-0" │"Brazil" │
└────────────┴───────────────┴───────┴─────────────────┘

Who are the top scorers across all the World Cups?

MATCH (player)-->(stats)-[:SCORED_GOAL]->(goal),
(stats)-[:IN_MATCH]->()<-[:CONTAINS_MATCH]-(wc:WorldCup)
WHERE goal.type IN ["goal", "penalty"]
WITH player.name AS player, count(*) AS goals,
collect(DISTINCT wc.year) AS competitions
UNWIND competitions AS competition
WITH player, goals, competition ORDER BY player, goals, competition
RETURN player, goals, collect(competition) AS competitions
ORDER BY goals DESC
LIMIT 5
╒════════════════════════════════════╤═══════╤═════════════════════╕
│"player" │"goals"│"competitions" │
╞════════════════════════════════════╪═══════╪═════════════════════╡
│"Miroslav Klose" │16 │[2002,2006,2010,2014]│
├────────────────────────────────────┼───────┼─────────────────────┤
│"Ronaldo" │15 │[1998,2002,2006] │
├────────────────────────────────────┼───────┼─────────────────────┤
│"Gerd Mueller" │14 │[1970,1974] │
├────────────────────────────────────┼───────┼─────────────────────┤
│"Just Fontaine" │13 │[1958] │
├────────────────────────────────────┼───────┼─────────────────────┤
│"Pelé (Edson Arantes Do Nascimento)"│12 │[1958,1962,1966,1970]│
└────────────────────────────────────┴───────┴─────────────────────┘

Who’s the top scorer playing in the 2018 World Cup?

MATCH (player:Player)-->(stats)-[:SCORED_GOAL]->(goal),
(stats)-[:IN_MATCH]->()<-[:CONTAINS_MATCH]-(wc:WorldCup)
WHERE goal.type IN ["goal", "penalty"]
WITH player, count(*) AS goals
ORDER BY goals DESC
MATCH (player)-[:IN_SQUAD]->(squad:Squad {year: 2018}),
(squad:Squad)<-[:NAMED_SQUAD]-(country)
RETURN player.name, country.name, goals
LIMIT 5
╒═══════════════════╤══════════════╤═══════╕
│"player.name" │"country.name"│"goals"│
╞═══════════════════╪══════════════╪═══════╡
│"Thomas Mueller" │"Germany" │10 │
├───────────────────┼──────────────┼───────┤
│"Cristiano Ronaldo"│"Portugal" │9 │
├───────────────────┼──────────────┼───────┤
│"James Rodriguez" │"Colombia" │6 │
├───────────────────┼──────────────┼───────┤
│"Luis Suarez" │"Uruguay" │5 │
├───────────────────┼──────────────┼───────┤
│"Tim Cahill" │"Australia" │5 │
└───────────────────┴──────────────┴───────┘

Next Steps

We hope you enjoy the dataset and if you have any questions or suggestions on what we should do next let us know in the comments or send us an email to devrel@neo4j.com.

We encourage you to take the data and either build your own APIs or applications or analysis notebooks on top of it. We’d love to hear all about your ideas. You can use libraries like py2neo, spring-data-neo4j, neo4j-client or neode. We, ourselves are really excited about GraphQL, so that’s what we build next — A World Cup API in GraphQL using the GRANDstack.io. Stay tuned.

--

--