Pro tips for writing scripts

Published in

Oxford Semantic Technologies

8 min readJul 16, 2020

Using scripts with RDFox

Photo by Cesar Carlevarino Aragon on Unsplash

Scripts can help you save time and avoid duplication of code for setting up data stores, and executing your rules and queries. This article will explain what a script is, how to write scripts, when to use scripts, how to use scripts with persistence, how to structure your workspace, and finally how use scripts within scripts, to set up your data source.

This article uses RDFox, a high-performance knowledge graph and semantic reasoning engine, that was designed from the ground up with reasoning and performance in mind. The powerful reasoning engine is unmatched in efficiency and reasoning capabilities, and by using rules it can provide flexible, incremental addition and retraction of data, as well as, fast parallel materialisation of new facts.

So why not combine the efficiency of scripts with the power of RDFox in your production environment? Click here to request an RDFox license.

What is a script?

A script is a text file which contains the instructions you would enter through the command line. Rather than entering each command manually, the script is run at import time and executes all the commands at once.

For example, the following script includes all the commands from the getting started guide, which were entered sequentially in the command line. By putting the commands into a script, which can be run when you first start RDFox, you can save a considerable amount of time:

On Mac/Linux: ./RDFox sandbox . start.script

On Windows: RDFox.exe sandbox . start.script

dstore create family par-complex-nnactive family import data.ttlset output outprefix : <https://oxfordsemantic.tech/RDFox/getting-started/>select ?s ?p ?o where { ?s ?p ?o }select ?p ?n where { ?p rdf:type :person . ?p :forename ?n }insert { ?x :married_to ?y } where { ?y :married_to ?x }import ! [?p, :has_child, ?c] :- [?c, :has_parent, ?p] .select ?p ?c where { ?p :has_child ?c }delete { :stewie :has_parent :lois } where { :stewie :has_parent :lois }endpoint start

Best practices for writing a script

In order to get the most out of your scripts, the team at Oxford Semantic Technologies have developed and refined the following tips.

Structuring your workspace

OST recommend creating multiple scripts, for example, for commands and prefixes, and structuring your workspace to help improve clarity and maintainability. For example:

The OST team suggest storing the queries, rules, data sources, and other scripts (see below) separately within the workspace and having one main script to control the others, i.e. the startup script.

Structuring the start-up script

The scripts including an explanation for each step, following the hashtags..

The first part of the script initialises the datastore settings:

#Template startup script #name your datastore and initialise it
#---------------active myDsdstore create myDs par-complex-nn   
#This will let us see query answers on the terminal
#---------------set output out 
#This will specify where we can find data, rules, and queries.
#Other options are available too.
#This helps keep a tidy working directory.
#---------------set dir.facts "$(dir.root)Data/"set dir.dlog "$(dir.root)Rules/"set dir.queries "$(dir.root)Queries/"set dir.scripts "$(dir.root)Scripts/"

In the scripts folder, we find it useful to store the prefixes in a dedicated script, for example prefixes.script would look like:

#Defining different namespaces can be useful.
#---------------prefix tt: <http://data.example.org/tupletables/>prefix type: <http://data.example.org/ontology/classes/>prefix prop: <http://data.example.org/ontology/properties/>prefix : <http://data.example.org/entities/>

We can use scripts to import the prefixes, add and attach live datasources to RDFox, along with the rules which map the datasources to triples. Additionally, we can set the number of threads, which allows for parallelisation.

#This is for defining prefixes, taking the prefix script from a file.
# — — — — — — — -prefixes.script 
#This adds and attaches live datasource to RDFox.
#In this case, we attach a csv, but one can also connect to PostgreSQL natively.
#One can also connect to other SQL databases via ODBC.
# — — — — — — — -dsource.script 
#Then we import rules mapping the attached datasources to triples.
# — — — — — — — -import mapRule.dlog 
#We can set threads as well, in order to parallelise
# — — — — — — — -threads 3

Next we can import the triples and rules in parallel, and materialise the rules within the datastore.

#Here we import data with out first transaction.
# — — — — — — — -
begin
#Importing data (triples) in parallel.
# — — — — — — — -import triples1.ttl triples2.trig triples3.ttl
#Import other rules, also in parallel.
# — — — — — — — -import rule1.dlog rule2.dlog rule3.dlog rule4.dlog 
#We can materialise the rules within a transaction with mat.
#Alternatively, the rules will be materialised automatically when the transaction is committed
# — — — — — — — -matcommit

To see the information about our datastore we can include the ‘info’ command. Additionally, we can expose an endpoint and access RDFox’s console. As well as querying or managing your knowledge graph using an IDE of your choice, such as Emacs or Visual Studio Code, you can type SPARQL queries into the RDFox console.

#This will print information about our datastore.
# — — — — — — — -
info #We set a port for our endpoint, with 12110 being the default.
#Security can also be configured, see docs.
#We can also start querying from the console, which will be at http://localhost:8080/console/myDs
# — — — — — — — -set endpoint.port 8080endpoint start

You can find more information on initiating the console in our docs.

As well as data and rules being imported in parallel, queries can also be answered in parallel. This is done using the following commands within the script:

#Queries too can be answered in parallel.
# — — — — — — — -answer q1.sparql q2.sparql q3.sparql q4.sparql 
#Or they can be answered on after the other, using echo to space the results out.
# — — — — — — — -
answer q1.sparqlecho “\n\n=====”answer q2.sparql

RDFox can provide multiple formats for the query results, including printing the results to a file. You can include how the query output format in the script.

#The output can also be printed to a file.
#Furthermore, there are multiple answer formats available.
# — — — — — — — -
set output myOutput.txtset query.answer-format “text/csv”answer q3.sparql

You can find this script on github.

Structuring the rule scripts

We recommend keeping your rules in separate files so that you can reuse the rule templates.

For example, a rule folder could contain the main rule-patterns as defined in our Rule Guide which would then be called upon when you need them.

Rule files can also be tailored to the use case you are solving. For example, a mapping rule can turn tabular data into triples that looks like this:

[?id, rdfs:label, ?name],
[?id, prop:dateOfBirth, ?dob],
[?id, prop:registrationDateTime, ?registrationDT],
[?id, a, type:Person]
:-
tt:people(?id,?name,?dob,?registrationDT) .

When to use scripts?

Scripts are useful in the development phase of your projects since they can be reused and allow you to pick up where you left things. Tailoring rules to your problem often involves deleting and restarting datastores which means that a script can be useful for quickly iterating until you reach a solution.

Scripts are also useful for setting up RDFox instances in a production environment.On cloud (Linux in this case), it could be done like by having a .service file where you specify the script to run:

[Unit]Description=RDFoxAfter=network.targetStartLimitIntervalSec=0[Service]Type=simpleRestart=alwaysRestartSec=5User=ubuntuExecStart=/home/ubuntu/RDFox/3.0.1/RDFox daemon <working_directory> start.script[Install]WantedBy=multi-user.target

Copy this into /etc/systemd/system using sudo:

sudo cp rdfox.service /etc/systemd/system

Enable the service when the machine boots:

sudo systemctl enable rdfox

Start the service manually:

sudo service rdfox start

Persistence?

Since RDFox can be persisted, one might ask why scripts are needed if RDFox can simply start from its persisted state.

The answer here has two parts:

1) Only the database itself is persisted at this stage, but not the shell variables, such as the active datastore, prefixes, working directories, endpoint ports, etc. Endpoints also need to be restarted every time one quits the RDFox process.

OST advises that a script is run to set these up for a better user experience.

This could be achieved with a restart.script which would look like this:

#We need to specify which datastore we want to work on.
# — — — — — — — -active myDs#Set output to be shown on the shell
# — — — — — — — -set output out
#Set our directories for a tidier workspace
# — — — — — — — -set dir.facts “(dir.root)Data/”set dir.dlog “(dir.root)Rules/”set dir.queries “(dir.root)Queries/”set dir.scripts “(dir.root)Scripts/” 
#Specify your prefixes
# — — — — — — — -prefixes.script
#Set the threads in order to parallelise
# — — — — — — — -threads 4
#Configure the endpoint and start it# — — — — — — — -set endpoint.port 8080set endpoint.access-control-allow-origin *endpoint start

Notice that there was no need to import triples or rules in this script, as they will have been persisted and reloaded into RDFox.

2) RDFox in-memory only (i.e. persistence switched off) may be faster, and will load more quickly when starting from scratch than from a persisted state, so if you’re just in an exploratory phase, it may be helpful to work in-memory and from a script.

Note that you can always save your datastore from the shell and then reload it like this:

save myDs.rdfox 
quit 
 
restart.script 
load myDs.rdfox

Set up your data sources without clogging up your starting script

You can use a script to set up data sources without clogging up the start script. You can call it dsource1.script

dsource add delimitedFile People \         file “./Data/people.csv” \         quote ‘“‘ \         header true \         delimiter “,” dsource attach tt:people People \         columns 4 \         “1” “data.example.org/entities/person_{id}” \         “1.datatype” “iri” \         \        “2” “{name}” \         “2.datatype “string” \         \        “3” “{dob}” \         “3.datatype” “xsd:date” \         “3.if-empty” “absent” \         \         “4” “{registration_date}T{registration_time}” \         “4.datatype” “xsd:dateTime” \         “4.default” “2000–01–01T00:00:00”

Once the script is ready, we will be able to execute it from the starting script. A script within a script!

Looking for script inspiration?

You can find this article’s template workspace and scripts here. Alternatively, you can also book a script consultation with an OST Knowledge Engineer by emailing info@oxfordsemantic.tech.

To learn more about RDFox go to our website or medium publication. To request an evaluation license click here. Academic licenses are available for free.

Team and Resources

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.