Football Transfers Graph Analysis — Linked Data Edition
I stumbled upon a Neo4j-oriented blog post about using a Graph to analyze transfer record data (provided by Transfermarkt) using the Cypher query language. Knowing that Semantic Webs of Linked Data are also about Graphs, I decided to repeat this effort with data constructed using RDF as the Data Definition Language (DDL) and Virtuoso’s SPARQL-BI (an extension of SPARQL that brings it closer to Cypher) as the Data Manipulation Language (DML).
The transfer window has closed for some leagues, but is still open for others and will reopen in a few months.
Translating the JSON document content to RDF
I mapped the JSON data to RDF-Turtle (my preferred RDF content-type for exercises like this) using LODRefine (mapping used for translation).
JSON Data
RDF Data Transformation using LOD Refine
I mapped the data outwards from the transfer, producing relations to players and to clubs.
Here is a visualization of the RDF Graph, which I produced in draw.io for easy comparison with Neo4j depictions:
I then exported the mapped results to an RDF-TURTLE document.
Loading the data
The TURTLE documents for Transfers and Leagues have now been uploaded to my fork of the football transfers GitHub repository, and can be loaded directly into any Virtuoso SPARQL endpoint (on which you have the necessary permissions) by running the following commands:
LOAD <https://raw.githubusercontent.com/danielhmills/football-transfers/master/data/football-transfers.ttl>LOAD <https://raw.githubusercontent.com/danielhmills/football-transfers/master/data/leagues.ttl>LOAD <https://raw.githubusercontent.com/danielhmills/transfermarkt2rdf/master/ontology.ttl>
Once the load is completed, we’re ready to query the data.
Why SPARQL-BI?
In its current official form from the W3C, SPARQL does not allow SELECT
list solution projections comprising values that haven’t been retrieved through a triple pattern in the body of a query. Expressions are allowed in filters, but cannot be returned to the output (a projection) derived from a SELECT
list.
Virtuoso lifts this restriction by allowing expressions in the SELECT
list that don’t necessarily have an explicit match in the body of the query — by introducing an +>
operator for dereferencing property values.
For example, the query solution for —
SELECT
?x
?name
?mbox
WHERE
{
?x foaf:knows ?y .
?y foaf:name ?name ;
foaf:mbox ?mbox .
}
— can be reconstructed in SPARQL-BI with SELECT
list expressions, as shown below —
SELECT
?x
( ?y +> foaf:name ) AS ?name
( ?y +> foaf:mbox ) AS ?mbox
WHERE
{
?x foaf:knows ?y .
}
Transfer Graph Query Examples
Top 10 Transfers
This equivalent SPARQL query returns the string values for player names and recipient clubs, in addition to their transfer values.
PREFIX : <https://transfermarkt.com/ontology#>SELECT DISTINCT
(?player +> foaf:name) AS ?playerName
xsd:integer ( ?transfer +> :hasFee ) AS ?price
(?transfer +> :fromClub +> rdfs:label) AS ?fromName
(?transfer +> :toClub +> rdfs:label) AS ?toName
WHERE
{
?transfer :transferredPlayer ?player .
}
ORDER BY DESC (?price)
LIMIT 10
- Live Query Results: https://tinyurl.com/y3wcv49a
João Félix’s transfer to Atlético Madrid is still the most valuable transfer. Englishman Harry Maguire’s transfer from Leicester to Manchester United has also overtaken fellow centre-back Matthijs de Ligt as the fourth most expensive defender transfer!
The next query to be replicated focuses on the money spent on transfers.
Money In and Money Out
Let’s also make the equivalent query for fees spent between different countries, and the most expensive transfer between each:
PREFIX : <https://transfermarkt.com/ontology#>SELECT
?clubName
?in
?out
WHERE
{
{
SELECT
(?transfer +> :fromClub +> rdfs:label) AS ?clubName
SUM(?transfer +> :hasFee) AS ?in
WHERE
{
?transfer a :Transfer .
}
}
OPTIONAL
{
SELECT
(?transfer2 +> :toClub +> rdfs:label) AS ?clubName
SUM(?transfer2 +> :hasFee) AS ?out
WHERE
{
?transfer2 a :Transfer .
}
}
}
ORDER BY DESC(?in + ?out)
LIMIT 10
- Live Query Results: https://tinyurl.com/y52f2kur
Two English clubs (Manchester City and Manchester United) have entered the Top 10 list since the original rendition was published.
Money Out
This query identifies the Top 10 clubs that have spent and received the most transfer window money for this period:
PREFIX : <https://transfermarkt.com/ontology#>SELECT
(?transfer +> :toClub +> rdfs:label) AS ?clubName
(?transfer +> :toClub +> :country +> rdfs:label) AS ?countryName
SUM(?transfer +> :hasFee) AS ?out
WHERE
{
?transfer a :Transfer .
}
ORDER BY DESC(?out)
LIMIT 10
- Live Query Results: https://tinyurl.com/y2k6zhgm
The top three teams are still from the Spanish La Liga, but Barcelona have managed to climb into second place, above Atlético Madrid.
Money Flow by Country
PREFIX : <https://transfermarkt.com/ontology#>SELECT
?fromCountryName
?toCountryName
?total
?player
xsd:integer(?fee) as ?fee
?totalTransfers
WHERE
{
{
SELECT
(?transfer +> :fromClub +> :country +> rdfs:label) AS ?fromCountryName
(?transfer +> :toClub +> :country +> rdfs:label) AS ?toCountryName
SUM(?transfer +> :hasFee) AS ?total
(COUNT(?transfer)) AS ?totalTransfers
WHERE
{
?transfer a :Transfer.
FILTER(?transfer +> :fromClub +> :country != ?transfer +> :toClub +> :country)
}
GROUP BY (?transfer +> :fromClub +> :country +> rdfs:label)
(?transfer +> :toClub +> :country +> rdfs:label)
ORDER BY DESC(?total)
}
{
SELECT
(?transfer +> :fromClub +> :country +> rdfs:label) AS ?fromCountryName
(?transfer +> :toClub +> :country +> rdfs:label) AS ?toCountryName
(?transfer +> :transferredPlayer +> foaf:name) AS ?player
MAX(?transfer +> :hasFee) AS ?fee
WHERE
{
?transfer a :Transfer .
FILTER(?transfer +> :fromClub +> :country != ?transfer +> :toClub +> :country)
}
GROUP BY (?transfer +> :fromClub +> :country +> rdfs:label)
(?transfer +> :toClub +> :country +> rdfs:label)
(?transfer +> :transferredPlayer +> foaf:name)
ORDER BY DESC(?fee)
}
}
LIMIT 10
- Live Query Results: https://tinyurl.com/yych8doq
Portugal to Spain has an additional transfer; transfers from England to Spain now have a higher total value than Spain to England; and Brazil to Spain transfers are no longer in the Top 10.
New Query: Top 10 Position Values by League
This query determines how much each league spent per-player position, and orders the results by the highest price aggregate.
PREFIX : <https://transfermarkt.com/ontology#>SELECT
(?player +> :hasPosition +> rdfs:label) AS ?position
(?transfer +> :toClub +> :league +> rdfs:label) AS ?toLeague
SUM(xsd:integer(?transfer +> :hasFee)) AS ?price
WHERE
{
?transfer :transferredPlayer ?player .
}
ORDER BY DESC (?price)
- Live Query Results: https://tinyurl.com/y4v9q8qt
The English Premier League teams have the highest combined transfer purchase value on a single position: 270.51m on centre-forwards. They heavily populate this table alongside La Liga, and Serie A, with three entries each.
Future Articles
Over time, additional posts focused on this data set will become available:
- Part II: Custom Inference & Reasoning — using artificial intelligence to assign players to teams based on transfer data
- Part: III: Native Faceted Browsing with the data and embedded HTTP URIs
- Part IV: Built-In inference and reasoning —
owl:sameAs
reconciliation to DBPedia URIs - Part V: Data Visualization with third-party applications such as Tableau
Related
- Virtuoso’s SPARQL-BI extensions to SPARQL
- Virtuoso Home Page
- About Virtuoso
- Free Evaluation Downloads
- Pre-configured Virtuoso on Amazon Machine Images (AMIs) in the Amazon Web Services (AWS) Marketplace
- Pre-configured Virtuoso in Docker Containers