Working with RDF Database Named Graphs

Marcelo Barbieri
Feb 16 · 13 min read

Introduction

This demonstration focuses on clarifying how SPARQL behaves in Stardog* and GraphDB* when querying data across multiple graphs in a database.

* “These semantic technologies are the core technologies for any Enterprise Knowledge Graph (EKG)”

The RDF data model expresses information as graphs consisting of triples with subject, predicate, and object. Many RDF data stores hold multiple RDF graphs and record information about each graph, allowing an application to make queries that involve information from more than one graph.

A SPARQL query is executed against an RDF Dataset which represents a collection of graphs defined as:

  • One default graph, which does not have a name and may be empty in the data.
  • Zero or more named graphs, where each named graph is identified by an IRI.

A SPARQL query can match different parts of the query pattern against different graphs. So, it is possible to traverse the edges starting from one named graph and continue into another named graph via these shared nodes. It is through this sharing of nodes across named graphs that the collection of named graphs (conceptually) constitute a larger unified graph.

Each graph in a dataset is still a set of triples which means there can be no duplication of triples within a graph. However, there is no similar requirement across multiple graphs, so the same triple may appear in multiple graphs and each occurrence is considered a distinct triple.

When querying a collection of graphs, the GRAPH keyword is used to match patterns against named graphs. GRAPH can provide an IRI to select one graph or use a variable which will range over the IRI of all the named graphs in the query's RDF dataset.

The use of GRAPH changes the active graph for matching graph patterns within that part of the query. Outside the use of GRAPH, matching is done using the default graph.

Another way to think of named graphs is as a set of quads where the fourth component added to the triple is the name of the graph, which is possibly empty for the default graph.

Demonstration

To be able to run the SPARQL queries in this demonstration, you will need to set up an empty RDF database on your local machine. If you don’t have one, please follow the instructions in the Setting up the Northwind database on Stardog or GraphDB sections in the Northwind SQL vs SPARQL article.

Populating test graphs

Execute each of the following statements individually in an existing database of your choice. There will be examples to be executed in Stardog and GraphDB.

Populate graph1

PREFIX ns: <http://mysparql.ai/ns#>

Populate graph2

PREFIX ns: <http://mysparql.ai/ns#>

Populate graph3

PREFIX ns: <http://mysparql.ai/ns#>

Note that the some of the books are added to more than one graph for demonstration purposes.

Querying Graphs with Stardog

The following queries must be executed on Stardog.

Stardog is configured with query.all.graphs=false by default, which has been used for the initial queries below. See details on how to set up this database property further down in this article.

Using FROM

Each FROM clause contains an IRI that indicates a graph to be used to form the default graph. This does not put the graph in as a named graph.

PREFIX ns: <http://mysparql.ai/ns#>

Using FROM NAMED

A query can supply IRIs for the named graphs in the RDF Dataset using the FROM NAMED clause. TheGRAPH keyword is used to bind the ?gvariable to each named graph in the RDF Dataset.

PREFIX ns: <http://mysparql.ai/ns#>

Combining FROM and FROM NAMED

The Basic Graph Patterns (BGPs) outside of GRAPH {} are evaluated against the default part of the RDF Dataset (i.e. defined using FROM) while BGPs within GRAPH {} are evaluated for each graph in the named part of the RDF Dataset.

The following query is evaluated against the default part of the RDF Dataset, which is made of graph1 and graph2, and completely ignores graph3.

PREFIX ns: <http://mysparql.ai/ns#>

The following query is evaluated for each graph in the named part of the RDF Dataset, which is graph3, and completely ignores graph1 and graph2.

PREFIX ns: <http://mysparql.ai/ns#>

The following is an example of a query with FROM (and no FROM NAMED) and GRAPH {} which cannot return results.

PREFIX ns: <http://mysparql.ai/ns#>

Restricting by Graph IRI

As seen in a previous example, you can restrict the named graphs in the RDF Dataset by using the FROM NAMED clause.

PREFIX ns: <http://mysparql.ai/ns#>
SELECT
?g ?s ?p ?o
FROM NAMED ns:graph2
FROM NAMED ns:graph3

WHERE {
GRAPH ?g {?s ?p ?o}
}
ORDER BY ?g ?s

However, you can restrict the named graph by supplying their IRIs.

In the following example, the Basic Graph Patterns will be evaluated against graphs 2 and 3.

PREFIX ns: <http://mysparql.ai/ns#>

Only book1 and book2 can be found in both graphs.

In the following query, the Basic Graph Patterns (BGPs) will be evaluated against graphs 1, 2, and 3.

PREFIX ns: <http://mysparql.ai/ns#>

Only book1 can be found in all 3 graphs.

As seen in a previous example, theGRAPH keyword can be used to bind a variable to each named graph in the RDF Dataset. However, this time we are going to use the bound variable to filter the graphs, and not FROM NAMEDclause, as per the following two examples:

PREFIX ns: <http://mysparql.ai/ns#>
SELECT
?g ?s ?p ?o
WHERE {
{
GRAPH ?g {?s ?p ?o}
}
}
ORDER BY ?g ?s
PREFIX ns: <http://mysparql.ai/ns#>
SELECT
?g ?s ?p ?o
WHERE {
{
GRAPH ?g {?s ?p ?o}
}
FILTER (?g IN (ns:graph2 , ns:graph3))
}
ORDER BY ?g ?s

Union

You can explicitly union graphs and therefore alter the default “merge” behaviour. The following example returns all books from all graphs, including books that are repeated in different graphs.

PREFIX ns: <http://mysparql.ai/ns#>

Union and Join

The following example shows a query that unions graphs 1 and 2 into the default graph and joins on the named graph 3.

PREFIX ns: <http://mysparql.ai/ns#>

Union all the graphs (default and named)

The following example shows a query that unions graph 1 and 2 into the default graph and unions again on the named graph 3.

Note that ?g variable doesn’t get bound to the default graph, but only named graphs.

PREFIX ns: <http://mysparql.ai/ns#>

Counting triples in named graph

This query returns the counts only for the named graphs listed in the FROM NAMED clause.

PREFIX ns: <http://mysparql.ai/ns#>

Counting triples in named graphs

The following query returns the count for all existing named graphs on the database by using a Stardog extension.

SELECT ?g (count(*) as ?size)
FROM NAMED stardog:context:named
WHERE
{ GRAPH ?g {?s ?p ?o} }
GROUP BY
?g
ORDER BY
asc(?size)

Counting triples in the default graph

Considering that the option query.all.graphs=false, or if you don't want to rely on it, you could use the following Stardog extension.

SELECT (count(*) as ?size)
FROM stardog:context:default
WHERE {?s ?p ?o}

The default graph shown above is from the Northwind sample database.

Counting triples in all graphs in the database

The following query returns the count of triples in the default and all named graphs using the stardog:context:all Stardog extension.

SELECT ?g (count(*) as ?size)
FROM NAMED stardog:context:all
WHERE
{ GRAPH ?g {?s ?p ?o} }
GROUP BY
?g
ORDER BY
asc(?size)

The following query is equivalent to the previous one, however it does not rely on the Stardog extension, which makes it compatible with other triplestore vendors.

SELECT ?g (count(*) as ?size)
WHERE {
{
GRAPH ?g {?s ?p ?o}
} UNION {
?s ?p ?o
BIND("default" AS ?g)
}
}
GROUP BY
?g
ORDER BY
asc(?size)

Searching data across multiple graphs

Search for book1 in the default graph.

SELECT
?s ?p ?o
WHERE {
{ ?s ?p ?o }
FILTER (
(CONTAINS (STR(?s), ?searchString)) ||
(CONTAINS (STR(?p), ?searchString)) ||
(CONTAINS (STR(?o), ?searchString))
)
BIND("book1" AS ?searchString)
}
ORDER BY ?s

Book1 was not found in the default graph, because it was added to the named graph graph1. In fact, all books in these demonstrations were added to named graphs.

Note that the variable ?searchString is assigned before it’s been used in the filter, despite the BIND coming after the FILTER in the code.

However, setting the query context to stardog:context:all in the Stardog Studio drop-down (through the SPARQL Protocol) makes all triples in named graphs available on the default graph.

The query below can be used to achieve the same result. It will search for book1 across all graphs, named and default and union the results.

SELECT
?g ?s ?p ?o
WHERE {
{
GRAPH ?g {?s ?p ?o}
} UNION {
?s ?p ?o
BIND("default" AS ?g)
}
FILTER (
(CONTAINS (STR(?s), ?searchString)) ||
(CONTAINS (STR(?p), ?searchString)) ||
(CONTAINS (STR(?o), ?searchString))
)
BIND("book1" AS ?searchString)
}
ORDER BY ?g ?s

Setting context to stardog:context:all in the Stardog Studio drop-down (through the SPARQL Protocol) makes all triples available on the default graph. Therefore, the query won’t return any data from named graphs.

Stardog has a database property called “Query All Graphs” query.all.graphs, which provides the same behaviour as the stardog:context:all, but set at the database level.

Note that this setting is required to be set to “true” when working with some visualisations and query tools.

To be more specific, given the RDF Dataset as a structure of two parts (the default and named part) here is how the query.all.graphs affects the query behaviour:
With false, the default dataset will be <context:default, context:named>. With true, the default dataset will be <context:all, context:named>.
This option applies only when the query does not use any FROM or FROM NAMED and also the dataset is not set through the SPARQL Protocol (which happens when you select the graph in that drop-down list in Stardog Studio).

Setting query.all.graphs=true in Stardog can be handy when you want to provide an easy way to execute a triple pattern query over all stored RDF statements in the database. This is the default behaviour in GraphDB (Refer to Querying Graphs in GraphDB section), and allows you to query all the data without having to worry about where they are, reducing the complexity of queries. It also allows visualisation tools to have full visibility of the data available in the RDF database. Note that you can still access data in named graphs individually, if needed to, using the methods described in this article.

Note that the query below does not use the GRAPH keyword to reference named graphs. Triples from the named graphs are now available on the default graph.

SELECT
?s ?p ?o
WHERE {
{ ?s ?p ?o }
FILTER (
(CONTAINS (STR(?s), ?searchString)) ||
(CONTAINS (STR(?p), ?searchString)) ||
(CONTAINS (STR(?o), ?searchString))
)
BIND("book1" AS ?searchString)
}
ORDER BY ?s

However, when the GRAPH is used, duplicates will appear, as the same triples are available through the default and named graphs.

SELECT
?g ?s ?p ?o
WHERE {
{
GRAPH ?g {?s ?p ?o}
} UNION {
?s ?p ?o
BIND("default" AS ?g)
}
FILTER (
(CONTAINS (STR(?s), ?searchString)) ||
(CONTAINS (STR(?p), ?searchString)) ||
(CONTAINS (STR(?o), ?searchString))
)
BIND("book1" AS ?searchString)
}
ORDER BY ?g ?s

By setting the query context to stardog:context:all in the Stardog Studio drop-down (through the SPARQL Protocol) seems to correct this behaviour, as all data is made available through the default graph only.

Querying Graphs in GraphDB

GraphDB constructs the default dataset as follows:

  • The dataset’s default graph contains the merge of the database’s default graph AND all the database named graphs.
  • The dataset contains all named graphs from the database.

Triples will appear to be in both the default and named graphs.

There are two reasons for this behaviour:

  1. It provides an easy way to execute a triple pattern query over all stored RDF statements.
  2. It allows all named graph names to be discovered, i.e., with this query: SELECT ?g { GRAPH ?g { ?s ?p ?o } }.

Examples

Querying the default graph

SELECT
?s ?p ?o
WHERE {
{ ?s ?p ?o }
FILTER (
(CONTAINS (STR(?s), ?searchString)) ||
(CONTAINS (STR(?p), ?searchString)) ||
(CONTAINS (STR(?o), ?searchString))
)
BIND("book1" AS ?searchString)
}
ORDER BY
?s

Note that book1 was added to the named graphs 1, 2, and 3, but not to the default graph, in the beginning of this demonstration. Also note that there are no book1 duplicates in the default graph.

Querying the default and named graphs

SELECT
?g ?s ?p ?o
WHERE {
{
GRAPH ?g {?s ?p ?o}
} UNION {
?s ?p ?o
BIND("default" AS ?g)
}
FILTER (
(CONTAINS (STR(?s), ?searchString)) ||
(CONTAINS (STR(?p), ?searchString)) ||
(CONTAINS (STR(?o), ?searchString))
)
BIND("book1" AS ?searchString)
}
ORDER BY ?g ?s

Note that there are occurrences of book1 in each of the named graphs, as well as the default graph. That is the expected behaviour in GraphDB, as explained previously.

Clearing the Graphs

Run one statement at a time to clear the graphs used in this demonstration.

PREFIX ns: <http://mysparql.ai/ns#>
CLEAR GRAPH ns:graph1

Visualisation and Query Tools

Visualisation and natural language query tools (and query builders) can offer great options to explore an RDF Graph Database.

Below are two examples of tools that require all named graphs to be available in the default graph for them to work properly. This is the default behaviour in GraphDB and can be accomplished in Stardog by setting the query.all.graphs=true database property.

The following visualisation was created using metaphacts.

Sparklis is a natural language query builder and offers a very intuitive way of exploring an RDF Graph Database. The tool generates the SPARQL query for you automatically.

References

agnos.ai

The Enterprise Knowledge Graph Company

Marcelo Barbieri

Written by

Knowledge Graph Engineer - agnos.ai

agnos.ai

agnos.ai

agnos.ai is a specialist consultancy that designs and implements Enterprise Knowledge Graphs. We harness the power of semantic technology to solve your most complex enterprise data challenges.

Marcelo Barbieri

Written by

Knowledge Graph Engineer - agnos.ai

agnos.ai

agnos.ai

agnos.ai is a specialist consultancy that designs and implements Enterprise Knowledge Graphs. We harness the power of semantic technology to solve your most complex enterprise data challenges.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store