How to setup DBPedia and Geonames on OpenLink Virtuoso

Nadjet Bouayad-Agha
3 min readNov 7, 2019

In this article, I described how we mined knowledge about environmental issues from DBPedia and Geonames. In this brief post I explain how to install part of those repositories in a triplestore.

The Linked Open Data Cloud

Why use a triplestore?

DBPedia can be queried using the online DBPedia Virtuoso SPARQL editor.

Similarly, Geonames can be queried via a get query, for example:

http://api.geonames.org/get?geonameId=6942346&username=demo

However, this is not advisable if large amounts of queries are to be performed. Instead, you can install DBPedia or Geonames locally in a Virtuoso Triplestore.

Getting DBPedia dumps

First we need to download DBPedia dumps here. You can download the dumps you need, say in turtle format (ttl), such as for example: concepts labels, categories, short abstracts, categories and their broader relations (SKOS categories), wikipedia page links, geographical coordinates, or yago ontology types. Note that the files downloaded are compressed as ttl.bz2.

As of end of 2019, DBPedia data is available on the DBPedia Databus which allows description, publication and sharing of Linked Open Data artifacts via SPARQL queries. The latest DBPedia data can be downloaded here. For example, DBPedia pages wikilinks can be accessed on that page.

Getting Geonames dumps

The Geonames information is available for download here. This is a huge text file containing a sequence of Geoname URIs, each followed by its RDF/XML tree.

So you must write a script that processes this text file and transforms it into one or multiple RDF/XML files. I wrote a script that outputs an RDF/XML file for every 100K Geonames URIs.

OpenLink Virtuoso

The Virtuoso Open Source Edition can be downloaded from here.

Once Virtuoso is installed, run it and start VOS Database.

Go to Virtuoso admin page in the browser (you may have to give it a bit of time to start): http://localhost:8890/conductor/

Login with default credentials (dba/dba)

In tab “Quad Store Upload” (after selecting “Linked Data”) for testing you can upload a ttl file to the specified named graph IRI, such as “http://localhost:8890/DBPedia”.

Next you can test the triplestore in the SPARQL tab or directly at the local endpoint. For example:

SELECT count(*) WHERE 
{?category skos:broader <http://dbpedia.org/resource/Category:Environmental_issues>}

However the upload might fail for bigger files.

For bigger files and also for uploading multiple files, it is best to use the bulk upload.

Bulk Loading DBPedia Dumps

In order to bulk upload files from anywhere (and not just the Virtuoso import folder), you must add your folder to the DirsAllowed property in the Virtuoso configuration file virtuoso.ini. You must restart Virtuoso for the changes in virtuoso.ini to be effective. For example, assuming that the dumps are in /tmp/virtuoso_db/dbpedia/ttl, you can add path /tmp/virtuoso_db to DirsAllowed.

Once Virtuoso is back and running, go the the Interactive SQL (ISQL) window and register the files to be loaded by typing in:

ld_dir(‘/tmp/virtuoso_db/dbpedia/ttl/’,’*.ttl.bz2’,’http://localhost:8890/DBPedia');

You can then perform the bulk load of all the registered files by typing in:

rdf_loader_run();

You can monitor the number of triples being uploaded by performing the following SPARQL query on the local endpoint:

select count(*) as ?c where {?a ?b ?c}

Bulk Loading Geoname Dumps

The same must be done for bulk loading Geoname rdfs. The bulk loading goes as follows:

ld_dir(‘/tmp/virtuoso_db/dbpedia/rdf/’,’*.rdf’,’http://localhost:8890/Geonames');

Voilà!

--

--