Football Transfers Graph Analysis — Linked Data Edition

Daniel Heward-Mills
OpenLink Virtuoso Weblog
6 min readSep 13, 2019

I stumbled upon a Neo4j-oriented blog post about using a Graph to analyze transfer record data (provided by Transfermarkt) using the Cypher query language. Knowing that Semantic Webs of Linked Data are also about Graphs, I decided to repeat this effort with data constructed using RDF as the Data Definition Language (DDL) and Virtuoso’s SPARQL-BI (an extension of SPARQL that brings it closer to Cypher) as the Data Manipulation Language (DML).

The transfer window has closed for some leagues, but is still open for others and will reopen in a few months.

Translating the JSON document content to RDF

I mapped the JSON data to RDF-Turtle (my preferred RDF content-type for exercises like this) using LODRefine (mapping used for translation).

JSON Data

RDF Data Transformation using LOD Refine

I mapped the data outwards from the transfer, producing relations to players and to clubs.

Here is a visualization of the RDF Graph, which I produced in draw.io for easy comparison with Neo4j depictions:

I then exported the mapped results to an RDF-TURTLE document.

OpenLink Structured Data Sniffer view of RDF-Turtle Document Content Snippet

Loading the data

The TURTLE documents for Transfers and Leagues have now been uploaded to my fork of the football transfers GitHub repository, and can be loaded directly into any Virtuoso SPARQL endpoint (on which you have the necessary permissions) by running the following commands:

LOAD <https://raw.githubusercontent.com/danielhmills/football-transfers/master/data/football-transfers.ttl>LOAD <https://raw.githubusercontent.com/danielhmills/football-transfers/master/data/leagues.ttl>LOAD <https://raw.githubusercontent.com/danielhmills/transfermarkt2rdf/master/ontology.ttl>

Once the load is completed, we’re ready to query the data.

Why SPARQL-BI?

In its current official form from the W3C, SPARQL does not allow SELECT list solution projections comprising values that haven’t been retrieved through a triple pattern in the body of a query. Expressions are allowed in filters, but cannot be returned to the output (a projection) derived from a SELECT list.

Virtuoso lifts this restriction by allowing expressions in the SELECT list that don’t necessarily have an explicit match in the body of the query — by introducing an +> operator for dereferencing property values.

For example, the query solution for —

SELECT 
?x
?name
?mbox
WHERE
{
?x foaf:knows ?y .
?y foaf:name ?name ;
foaf:mbox ?mbox .
}

— can be reconstructed in SPARQL-BI with SELECT list expressions, as shown below —

SELECT
?x
( ?y +> foaf:name ) AS ?name
( ?y +> foaf:mbox ) AS ?mbox
WHERE
{
?x foaf:knows ?y .
}

Transfer Graph Query Examples

Top 10 Transfers

This equivalent SPARQL query returns the string values for player names and recipient clubs, in addition to their transfer values.

PREFIX : <https://transfermarkt.com/ontology#>SELECT DISTINCT
(?player +> foaf:name) AS ?playerName
xsd:integer ( ?transfer +> :hasFee ) AS ?price
(?transfer +> :fromClub +> rdfs:label) AS ?fromName
(?transfer +> :toClub +> rdfs:label) AS ?toName
WHERE
{
?transfer :transferredPlayer ?player .
}
ORDER BY DESC (?price)
LIMIT 10
Top 10 Most Expensive Transfers
Harry Maguire

João Félix’s transfer to Atlético Madrid is still the most valuable transfer. Englishman Harry Maguire’s transfer from Leicester to Manchester United has also overtaken fellow centre-back Matthijs de Ligt as the fourth most expensive defender transfer!

The next query to be replicated focuses on the money spent on transfers.

Money In and Money Out

Let’s also make the equivalent query for fees spent between different countries, and the most expensive transfer between each:

PREFIX : <https://transfermarkt.com/ontology#>SELECT 
?clubName
?in
?out
WHERE
{
{
SELECT
(?transfer +> :fromClub +> rdfs:label) AS ?clubName
SUM(?transfer +> :hasFee) AS ?in
WHERE
{
?transfer a :Transfer .
}
}
OPTIONAL
{
SELECT
(?transfer2 +> :toClub +> rdfs:label) AS ?clubName
SUM(?transfer2 +> :hasFee) AS ?out
WHERE
{
?transfer2 a :Transfer .
}
}
}
ORDER BY DESC(?in + ?out)
LIMIT 10
Top 10 total money in and out by club

Two English clubs (Manchester City and Manchester United) have entered the Top 10 list since the original rendition was published.

Money Out

This query identifies the Top 10 clubs that have spent and received the most transfer window money for this period:

PREFIX : <https://transfermarkt.com/ontology#>SELECT 
(?transfer +> :toClub +> rdfs:label) AS ?clubName
(?transfer +> :toClub +> :country +> rdfs:label) AS ?countryName
SUM(?transfer +> :hasFee) AS ?out
WHERE
{
?transfer a :Transfer .
}
ORDER BY DESC(?out)
LIMIT 10
Top 10 Spenders by league country

The top three teams are still from the Spanish La Liga, but Barcelona have managed to climb into second place, above Atlético Madrid.

Money Flow by Country

PREFIX : <https://transfermarkt.com/ontology#>SELECT 
?fromCountryName
?toCountryName
?total
?player
xsd:integer(?fee) as ?fee
?totalTransfers
WHERE
{
{
SELECT
(?transfer +> :fromClub +> :country +> rdfs:label) AS ?fromCountryName
(?transfer +> :toClub +> :country +> rdfs:label) AS ?toCountryName
SUM(?transfer +> :hasFee) AS ?total
(COUNT(?transfer)) AS ?totalTransfers
WHERE
{
?transfer a :Transfer.
FILTER(?transfer +> :fromClub +> :country != ?transfer +> :toClub +> :country)
}
GROUP BY (?transfer +> :fromClub +> :country +> rdfs:label)
(?transfer +> :toClub +> :country +> rdfs:label)
ORDER BY DESC(?total)
}
{
SELECT
(?transfer +> :fromClub +> :country +> rdfs:label) AS ?fromCountryName
(?transfer +> :toClub +> :country +> rdfs:label) AS ?toCountryName
(?transfer +> :transferredPlayer +> foaf:name) AS ?player
MAX(?transfer +> :hasFee) AS ?fee
WHERE
{
?transfer a :Transfer .
FILTER(?transfer +> :fromClub +> :country != ?transfer +> :toClub +> :country)
}
GROUP BY (?transfer +> :fromClub +> :country +> rdfs:label)
(?transfer +> :toClub +> :country +> rdfs:label)
(?transfer +> :transferredPlayer +> foaf:name)
ORDER BY DESC(?fee)
}
}
LIMIT 10
Top 10 money transfer amounts between countries, and the most expensive transfer for each

Portugal to Spain has an additional transfer; transfers from England to Spain now have a higher total value than Spain to England; and Brazil to Spain transfers are no longer in the Top 10.

New Query: Top 10 Position Values by League

This query determines how much each league spent per-player position, and orders the results by the highest price aggregate.

PREFIX : <https://transfermarkt.com/ontology#>SELECT
(?player +> :hasPosition +> rdfs:label) AS ?position
(?transfer +> :toClub +> :league +> rdfs:label) AS ?toLeague
SUM(xsd:integer(?transfer +> :hasFee)) AS ?price
WHERE
{
?transfer :transferredPlayer ?player .
}
ORDER BY DESC (?price)
Top 10 expenditures per player position, by league

The English Premier League teams have the highest combined transfer purchase value on a single position: 270.51m on centre-forwards. They heavily populate this table alongside La Liga, and Serie A, with three entries each.

Future Articles

Over time, additional posts focused on this data set will become available:

  • Part II: Custom Inference & Reasoning — using artificial intelligence to assign players to teams based on transfer data
  • Part: III: Native Faceted Browsing with the data and embedded HTTP URIs
  • Part IV: Built-In inference and reasoning — owl:sameAs reconciliation to DBPedia URIs
  • Part V: Data Visualization with third-party applications such as Tableau

Related

--

--