How to setup DBPedia and Geonames on OpenLink Virtuoso

Nadjet Bouayad-Agha
Nov 7 · 3 min read

In this article, I described how we mined knowledge about environmental issues from DBPedia and Geonames. In this brief post I explain how to install part of those repositories in a triplestore.

The Linked Open Data Cloud

Why use a triplestore?

DBPedia can be queried using the online DBPedia Virtuoso SPARQL editor.

Similarly, Geonames can be queried via a get query, for example:

http://api.geonames.org/get?geonameId=6942346&username=demo

However, this is not advisable if large amounts of queries are to be performed. Instead, you can install DBPedia or Geonames locally in a Virtuoso Triplestore.

Getting DBPedia dumps

First we need to download DBPedia dumps here. You can download the dumps you need, say in turtle format (ttl), such as for example: concepts labels, categories, short abstracts, categories and their broader relations (SKOS categories), wikipedia page links, geographical coordinates, or yago ontology types.

As of end of 2019, DBPedia data is available on the DBPedia Databus which allows description, publication and sharing of Linked Open Data artifacts via SPARQL queries. The latest DBPedia data can be downloaded here. For example, DBPedia pages wikilinks can be accessed on that page.

Getting Geonames dumps

The Geonames information is available from download here. This is a huge idiosyncratic file containing a sequence of RDF/XML elements, each preceded by a line of information. So you must write a script that writes the information in a number of RDF/XML files to be bulk loaded. For example you can divide the RDF/XML into 10 separate RDF/XML files, each containing up to 1 million elements.

OpenLink Virtuoso

The Virtuoso Open Source Edition can be downloaded from here.

Once Virtuoso is installed, run it and start VOS Database.

Go to Virtuoso admin page in the browser (you may have to give it a bit of time to start): http://localhost:8890/conductor/

Login with default credentials (dba/dba)

In tab “Quad Store Upload” for testing you can upload a ttl file to the specified named graph IRI, such as “http://localhost:8890/DBPedia”.

Next you can test the triplestore in the SPARQL tab or directly at the local endpoint. For example:

SELECT count(*) WHERE 
{?category skos:broader <http://dbpedia.org/resource/Category:Environmental_issues>}

However the upload might fail for bigger files.

For bigger files and also for uploading multiple files, it is best to use the bulk upload.

Bulk Loading DBPedia Dumps

In order to bulk upload files from anywhere (and not just the Virtuoso import folder), you must add your folder to the DirsAllowed property in the Virtuoso configuration file virtuoso.ini. You must restart Virtuoso for the changes in virtuoso.ini to be effective. For example, assuming that the dumps are in /tmp/virtuoso_db/dbpedia/ttl, you can add path /tmp/virtuoso_db to DirsAllowed.

Once Virtuoso is back and running, go the the Interactive SQL (ISQL) window and register the files to be loaded by typing in:

ld_dir(‘/tmp/virtuoso_db/dbpedia/ttl/’,’*.ttl’,’http://localhost:8890/DBPedia');

You can then perform the bulk load of all the registered files by typing in:

rdf_loader_run();

You can monitor the number of triples being uploaded by performing the following SPARQL query on the local endpoint:

select count(*) as ?c where {?a ?b ?c}

The final count was 49,329,082 triples, taking more than 11Gb.

Bulk Loading Geoname Dumps

The same must be done for bulk loading Geoname rdfs. The final count is +166 million triples.

Voilà!

Nadjet Bouayad-Agha

Written by

KAWARISM

KAWARISM

Expert in Natural Language Processing, Natural Language Generation & Knowledge Engineering

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade