SPARQL and DBpedia

Humam Fauzi
4 min readJan 21, 2019
DBpedia Homepage

We love wikipedia and its information extensiveness. There is time we need to search about a topic and compile it as table. Let’s say we want compile all member of bands in alternative rock genre. Sure we can search wikipedia page about all documented alternative rock genre and write it one by one but it is a lot of work.

Fortunantely, DBpedia come to save the day. DBpedia is a wikipedia in a form of RDF (Resource Description Framework) so we could query it like SQL database but with several differences. We need to learn few things about RDF so we could work with it.

Excerpt from official W3 resource

RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.

Basically RDF is a data schema for when relation between entities that have different schema.We will show how to extract information relation of one page with another. We query the DBpedia RDF with SPARQL.

Before we move to query DBpedia using SPARQL, we need to establish how DBpedia look like so we have basic intuition when use SPARQL.

Let us go to DBpedia page of Muse band. All DBpedia page have property-value pair. It works like a key-value pair in a hash map. Let’s say we want to know what Muse’s genre is. We can look in property of dbo:genre and see the value.

Muse (band) DBpedia page

We can also do this by query using SPARQL

SELECT ?genre WHERE
{
<http://dbpedia.org/resource/Muse_(band)> <http://dbpedia.org/ontology/genre> ?genre
}

Note: Use this link and copy paste it to query form in the page.

We can interpret the query as ‘Select variable genre which is derived from property dbo:genre from page http://dbpedia.org/resource/Muse_(band)'. Variable in SPARQL marked in question mark ? . wherefilter constructed by three components

  • Source <http://dbpedia.org/resource/Muse_(band)>
  • Property <http://dbpedia.org/ontology/genre>
  • Variable ?genre
Muse Genre according to DBpedia in form of HTML table

Let’s say we want to know what kind of band that have same genre as Muse. So we need to explore any band that have at least one of the value of property dbo:genre is as same as Muse. SPARQL query will look like this

SELECT ?genre ?band WHERE 
{
?band <http://dbpedia.org/ontology/genre> ?genre .
<http://dbpedia.org/resource/Muse_(band)> <http://dbpedia.org/ontology/genre> ?genre .
}

We can interpret the query as “We find property dbo:genre in <http://dbpedia.org/resource/Muse_(band)> and store it as ?genre. We search any DBpedia that has property of dbo:genre and have at least has same one value with variable ?genre “.

Query above will match with numerous page and some of it are not band; songs and albums are also included. We need additional filter to exclude songs and albums. The query will look like this

SELECT ?genre ?band WHERE
{
?band <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/MusicGroup> .
?band <http://dbpedia.org/ontology/genre> ?genre .
<http://dbpedia.org/resource/Muse_(band)> dbo:genre ?genre.
}

Interpretation of query above is the same as previous query but with additional that each ?band should have value from property <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> equal to <http://schema.org/MusicGroup>

All we got so far is a DBpedia URL. If we want to store and do data manipulation, we should get string value not an URL. In each DBpedia page usually there is a property that holds ‘name’ of the page. Let’s write query to do exactly this.

SELECT ?genrename ?bandname WHERE
{
?genre <http://xmlns.com/foaf/0.1/name> ?genrename .
?band <http://xmlns.com/foaf/0.1/name> ?bandname .
?band <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/MusicGroup> .
?band <http://dbpedia.org/ontology/genre> ?genre .
<http://dbpedia.org/resource/Muse_(band)> <http://dbpedia.org/ontology/genre> ?genre
}

We can make the query above more compact using PREFIX that act as a predetermined variable. We rewrite query above as

PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sch:<http://schema.org/>
SELECT ?genrename ?bandname WHERE
{
?genre foaf:name ?genrename .
?band foaf:name ?bandname .
?band rdf:type sch:MusicGroup .
?band dbo:genre ?genre .
<http://dbpedia.org/resource/Muse_(band)> dbo:genre ?genre
}

foaf:name is equal to <http://xmlns.com/foaf/0.1/name> . The property of the query also become more look like what is written in DBpedia page. You can find the complete link of a property by hovering your mouse to a property and add it as prefix.

Quick tip:
If you want to design a query, please explore manually first using DBpedia page so you could know what distinct properties of a page and use it as filter to eliminate unwanted result.

That’s all for now. SPARQL have more feature that we could utilize to make our query more precise. We hope you learn something new. Happy query!

--

--