How to: Apache Jena Fuseki — 3.x.x

Rricha Jalota
7 min readDec 21, 2017

--

If you are into Semantic Web and Linked Data, chances are you would have heard of (and probably worked with) Apache Jena and even if you are just starting out, you are at the right place! For starters,

Apache Jena is a free and open source Java framework for building Semantic Web and Linked Data applications. It is composed of different APIs interacting together to process RDF data. Apache Jena Fuseki is a SPARQL server which can be run as an operating system service, as a Java web application (WAR file), and as a standalone server.

In this post, we’ll run fuseki as a standalone server i.e. from the command line. So if you want to create a SPARQL endpoint locally, download the latest version of Apache Jena and Apache Jena Fuseki from here and extract it. But before getting started, let’s understand the various commands that jena offers to run a fuseki server.

  1. ./fuseki-server --update — mem /ds : This command creates a non-persistent in-memory empty dataset i.e the data gets flushed when the service is stopped. The argument is used to write to the datastore. It allows us to upload a dataset when the server is running. Without this argument, the server will service read requests only. /ds is the name under which our file/dataset will be accessible over HTTP and in fuseki.
  2. ./fuseki-server --file=path/to/file /ds : creates an empty, in-memory dataset, then loads the given file in the memory.
  3. ./fuseki-server --loc=path/to/DB /ds: tells the server to use an existing TDB database or create an empty one if it doesn’t exist.
  4. ./fuseki-server --config=path/to/configuration/file: constructs one or more service endpoints based on the configuration description. Note that unlike other commands, it doesn’t require a service name (such as /ds) because we provide it in the configuration file itself.

We will first try to create a SPARQL server against a simple RDF file (example.ttl). Our example.ttl looks like:

<http://dbpedia.org/resource/Barack_Obama> <http://dbpedia.org/ontology/spouse> <http://dbpedia.org/resource/Michelle_Obama>.<http://dbpedia.org/resource/Barack_Obama> <http://dbpedia.org/ontology/award> <http://dbpedia.org/resource/Nobel_Peace_Prize>.

And here’s a snippet of our main working directory:

apache-jena-3.5.0 apache-jena-fuseki-3.5.0 example.ttl

If you have a different directory structure, please change the file paths in commands accordingly. Run the server from terminal by executing,

cd apache-jena-fuseki-3.5.0/./fuseki-server --file=../example.ttl /ds

The terminal should look something like this:

$ apache-jena-fuseki-3.5.0/fuseki-server --file=example.ttl /ds[2017–12–04 17:30:09] Server INFO Dataset: in-memory: load file: example.ttl[2017–12–04 17:30:09] Server INFO Running in read-only mode for /ds[2017–12–04 17:30:09] Server INFO Apache Jena Fuseki 3.5.0[2017–12–04 17:30:09] Config INFO FUSEKI_HOME=/home/<USER>/v2-fuseki/apache-jena-fuseki-3.5.0[2017–12–04 17:30:09] Config INFO FUSEKI_BASE=/home/<USER>/v2-fuseki/run[2017–12–04 17:30:09] Config INFO Shiro file: file:///home/<USER>/v2-fuseki/run/shiro.ini[2017–12–04 17:30:11] Config INFO Register: /ds[2017–12–04 17:30:11] Server INFO Started 2017/12/04 17:30:11 CET on port 3030

So now, we have our SPARQL endpoint running on http://localhost:3030/.

Apache Jena Fuseki homepage

Click on the query button to run a query against the dataset.

If the SPARQL endpoint field is empty like in the case above, click on the ‘info’ tab, copy any one of the SPARQL QUERY links and paste it in the SPARQL ENDPOINT field in the query tab.

Now we are ready to run a SPARQL query against our dataset. From the example queries, select the Selection of triples option and run the resulting query. If we get the triples from our dataset as output, then it’s a success! Our fuseki server is up and successfully running!

Interestingly, Apache Jena also offers a high performance RDF store, named TDB, for querying. There are two versions of TDB available but we will look at the version 1 (You can check out the version 2 here). We first load a dataset in the TDB datastore. For this post, we download and extract the ‘labels’ dataset from dbpedia downloads and save it in our working directory. Make a folder for tdb datastore in the same directory.

mkdir datacd datamkdir dataForTDB

Now on the terminal, change the working directory to the main directory.

cd ..

Our main working directory now has the following folders and files:

apache-jena-3.5.0 dataapache-jena-fuseki-3.5.0 example.ttl labels_en.ttl

Run the following commands to load the dataset (labels_en.ttl) in the TDB datastore.

export JENA_HOME=apache-jena-3.5.0
export FUSEKI_HOME=apache-jena-fuseki-3.5.0
APACHE_JENA_BIN=apache-jena-3.5.0/bin
JENA_FUSEKI_JAR=apache-jena-fuseki-3.5.0/fuseki-server.jar
$APACHE_JENA_BIN/tdbloader2 --loc=data/dataForTDB labels_en.ttl

The tdbloader2 script creates a database in the given location from the input file ‘labels_en.ttl’. (Just a heads up: The database creation might take some time. So you can go for a walk in the meanwhile!) Once it’s done, we are ready to run the fuseki server against our TDB datastore.

apache-jena-fuseki-3.5.0/fuseki-server --loc=data/dataForTDB /ds

Go to http://localhost:3030/ on your browser and follow the steps mentioned above to check if querying against TDB is successful.

Diving into Full Text Search

A good reason to use Jena Fuseki is the feature of Full-text search via Lucene or Elastic Search which enables the applications to perform indexed full text searches within SPARQL queries. If you’re wondering what an indexed full text search is, hold on for a while we’ll get to it!

Suppose we have to search for the resources which have occurrences of the word ‘movement’ in their rdfs:label property. Normally we would first search for the resources with rdfs:label property and then apply regex in FILTER to look for ‘movement’ in the object.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>SELECT ?subject ?object
WHERE {
?subject rdfs:label ?object .
FILTER regex(?object, “movement”)
}
LIMIT 25

If there are many such statements and many such uses of regex, then it may be appropriate to consider an alternative approach. Apache Jena Fuseki provides an efficient way to execute such queries by the means of text indexes which provide additional information for accessing the RDF graph by allowing the application to have indexed access to the internal structure of string literals rather than treating such literals as opaque items (which cannot be queried directly). Unlike FILTER, an index can set the values of variables. We can write the above query using Full Text Search functionality via a special “ARQ property function extension”, text:query. ARQ is a query engine for Jena that supports the SPARQL RDF Query language.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX text: <http://jena.apache.org/text#>
SELECT ?subject ?object
WHERE {
?subject text:query (rdfs:label “movement”).
?subject rdfs:label ?object .
}
LIMIT 25

This query makes a text query for ‘movement’ on the rdfs:label property; and then looks in the RDF data and retrieves the complete label for each match.

How to create the text index?

In Apache Jena, assemblers are used to describe a text dataset which has an underlying RDF dataset and a text index. A text index has an entity map which defines the properties to index, name of the lucene field and field for storing the uri itself.

We, first create a folder to store the indexes.

On the terminal,

cd data
mkdir luceneIndexing
cd ..

Our assembler file “fuseki-textsearch.ttl” for indexing is given below:

@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix : <http://localhost/jena_example/#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> .
## Example of a TDB dataset and text index## Initialize TDB
[] ja:loadClass “org.apache.jena.tdb.TDB” .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
## Initialize text query
[] ja:loadClass “org.apache.jena.query.text.TextQuery” .
# A TextDataset is a regular dataset with a text index.
text:TextDataset rdfs:subClassOf ja:RDFDataset .
# Lucene index
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
:text_dataset rdf:type text:TextDataset ;
text:dataset <#dataset> ;
text:index <#indexLucene> ;
.
# A TDB datset used for RDF storage
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location “data/dataForTDB” ; #path to TDB
.
# Text index description
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:data/luceneIndexing> ; #path where index will #be created
text:entityMap <#entMap> ;
.
<#entMap> a text:EntityMap ;
text:defaultField “text” ;
text:entityField “uri” ;
text:map (
#RDF label abstracts
[ text:field “text” ;
text:predicate <http://www.w3.org/2000/01/rdf-schema#label> ;
text:analyzer [
a text:StandardAnalyzer ]
]
) .
<#service_text_tdb> rdf:type fuseki:Service ;
fuseki:name “ds” ;
fuseki:serviceQuery “query” ;
fuseki:serviceQuery “sparql” ;
fuseki:serviceUpdate “update” ;
fuseki:serviceUpload “upload” ;
fuseki:serviceReadGraphStore “get” ;
fuseki:serviceReadWriteGraphStore “data” ;
fuseki:dataset :text_dataset ;
.

As can be seen from the entity map, we use the rdfs:label property for indexing but you can use any other property as well that is present in your dataset or add more properties to index!

On the terminal, run:

java -Xmx16G -cp $JENA_FUSEKI_JAR jena.textindexer --desc=fuseki-textsearch.ttl

The above command will create a text index using the assembler file “fuseki-textsearch.ttl” and now we should be able to use the full-text search functionality! Run the following command on the terminal and check if “text:query” gives appropriate results.

java -Xmx32G -jar apache-jena-fuseki-3.5.0/fuseki-server.jar --conf=fuseki-textsearch.ttl --timeout=10000

Typical reasons for Fuseki Server to not run against the dataset

  1. Problem with the assembler file.
    Check if the path to the TDB database and lucene index is correct.
  2. Using the wrong command to run the “.ttl” file
    Using the --desc=filename.ttl argument instead of --conf=filename.ttl often gives an error with two matches to the dataset. This is because the --desc argument works with only one dataset, so if you are trying to run a text dataset against the server which has an RDF dataset defined, use the --conf command instead.
  3. Check if the paths for FUSEKI_HOME and JENA_HOME have been set correctly.

If you are aware of more reasons, feel free to mention them in the comments below! The code above can be found at https://github.com/dice-group/fuseki-sample-setup.

--

--