Weaving Linked Data Cloud with (Hyper)GraphQL

Federated querying revisited

6 min readJan 16, 2018

Musical rain gutter wall in Dresden (https://goo.gl/RkJnMm)

Successful deployment of data via live queryable services is very much about finding a good balance between the expressive power of the query interface the service provider is willing to support and the cost of keeping that service available, performant and easy to use for the clients. In simplest terms:

There’s a value in obtaining answers to complex questions, but there’s a cost involved in formulating those questions and the cost of getting them answered efficiently.

In the linked data space, one prominent source of such value and cost comes with the possibility of querying information across different datasets and services, by joining data along shared identifiers (URIs), representing same things in different contexts and sources. Effective exploitation of such joins on the query level is known as federated querying in the Semantic Web lingo.

In this post, we take a brief look at how federated querying can be supported using GraphQL, or more specifically, its extension HyperGraphQL — a GraphQL-based interface for querying and serving linked data on the Web — an open-source project developed at Semantic Integration.

Controlled deployment of queryable linked data

The trade-off between the complexity of queries and the cost of supporting them in public linked data query endpoints has been very distinctly articulated and brought to attention of a broader audience by researchers working on Linked Data Fragments — a framework for exploring alternative approaches, positioned between two extreme interface types: server-side expensive, yet powerful SPARQL endpoints; and crude, yet cheap to deliver RDF dumps, which shift the cost of actual data retrieval entirely to the client.

In another blog post, we have outlined how the GraphQL interface can be conveniently utilised to facilitate controlled, live access to RDF graphs, in the spirit of Linked Data Fragments.

The approach, implemented as HyperGraphQL, is simple: you only need to define a GraphQL schema and map it onto URIs of the vocabulary employed in your RDF graph. Under the hood HyperGraphQL performs a rather straightforward rewriting of GraphQL queries to SPARQL, delegates them to the SPARQL endpoint, and returns the responses as a JSON-LD objects.

One of the strengths of GraphQL in this application scenario lies in its restrictive schemas, which determine a set of permitted tree-shaped queries of an easily controllable depth and width. Depending on your need, you can expose only a single triple pattern, a set of patterns or the entire set of types and properties used in an RDF graph — it’s really straightforward.

GraphiQL UI over a HyperGraphQL instance exposing a subset of DBpedia.

HyperGraphQL is not limited to serve only as a wrapper around SPARQL endpoints. Being natively equipped with its own in-memory triplestore, it can also be used to deploy linked data directly from RDF files, while exposing exactly the same GraphQL interface to the clients.

In many cases it makes good sense to deploy smaller and more modularised fragments of linked data using simple GraphQL instances, instead of a massive know-it-all RDF knowledge graph served over a single SPARQL endpoint.

It’s easier to assure high quality of published data, curate and update it, adopt the right level of focus and granularity, and employ most suitable vocabulary for expressing the metadata in each module. The strict GraphQL schema is easier for the clients to exploit in order to get the requested data, without the need to understand the open-world, schema-on-read nature of the Semantic Web stack. Most importantly, the variety and shape of permitted queries, and so the average cost of serving the data per request, can be greatly controlled in a simple, declarative manner, by means of those same GraphQL schemas.

Federating services

A crucial feature of the Semantic Web architecture is the ability to perform federated queries across distributed datasets in order to uncover interesting connections in data coming from different sources. After all it’s all about links between things, and the same things, represented by unique URIs, can be described and referred to in different places and contexts. To this end, SPARQL language supports the SERVICE keyword which enables delegating specific subqueries to other remote SPARQL services.

For instance, the following query joins some movie data from LinkedMDB movie database with some extra annotations fetched from DBpedia:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT ?film ?label ?subject WHERE {
    SERVICE <http://data.linkedmdb.org/sparql> {
        ?film a movie:film .
        ?film rdfs:label ?label .
        ?film owl:sameAs ?dbpediaLink 
    }
    SERVICE <http://dbpedia.org/sparql> {
        ?dbpediaLink dcterms:subject ?subject
    }
}

If you execute it on a SPARQL endpoint with appropriate permissions in place you should be able to uncover some implicit connections across data in both services — for instance, these:

film:    <http://data.linkedmdb.org/resource/film/11>  
label:   "Godzilla vs. the Sea Monster"
subject: <http://dbpedia.org/resource/Category:1960s_horror_films>                                   film:    <http://data.linkedmdb.org/resource/film/11>  
label:   "Godzilla vs. the Sea Monster"
subject: <http://dbpedia.org/resource/Category:Giant_monster_films>

The power of linked data in its finest! Or at least that’s the theory. The practice often turns out slightly more cumbersome…

SPARQL-based federation of linked data services enjoys mixed reputation among Semantic Web practitioners. Some insightful observations on this topic can be found, for instance, in this blog post, by Ruben Verborgh.

To put it bluntly, the bottom line might well be this:

federating data requests across different services and datasets is simply a non-trivial task and it’s unrealistic to expect it to be performed solely by the end-clients on the query level.

Federation requires deep understanding of all the involved datasets, their context, scope and vocabularies. It takes a fair amount of knowledge about the actual implementation of the query evaluation mechanism for the request to be processed efficiently. On top of that, it might require extra configuration parameters, e.g., authentication credentials for the respective services, which just cannot be handled all on the SPARQL query level.

Again:

there’s a great value in getting federated queries answered, but there’s also an obvious cost. In this case, the cost — in the sense of the complexity of formulating a meaningful and performant request — is significantly shifted to the client-side, while there might exist more viable alternatives.

In HyperGraphQL, the federation of services is not expressible in the query language. Instead, it is specified in the configuration of each specific HyperGraphQL instance that effectively supports federated query capabilities. Configuring such instances is simple and allows for seamless federation of SPARQL endpoints, RDF dumps, and other HyperGraphQL services, in arbitrary patterns, such as, for instance, those depicted below.

The federated query about movies, used earlier in this post, could in HyperGraphQL look just like any other GraphQL request:

{
   Film_GET {
      _id
      label
      sameAs {
         subject {
            _id         
         }
   }
}

It is in the configuration of that HyperGraphQL instance where all the fields would be associated with appropriate URIs and linked data services.

The gist of this shift is that the cost of federation is now billed to the service provider — not the service user.

In return, the provider gains full control over the types of federated queries that are effectively supported by the service. This, in turn, makes the cost of maintaining the service more predictable. The user, on the other hand, is free to tap on the benefits of federated querying using the same, uniform GraphQL query interface, without the need to declare the federation pattern herself or to really understand the underlying federation mechanism.

Such architecture opens also some new potentially interesting incentive models for sharing costs and values of linked data publishing on the Web. While exposing simpler and more self-contained linked datasets via queryable services like HyperGraphQL is relatively cheap and could be offered for free, federating such services carries already a distinct value tag on it. However, since federated query endpoints can be provided as a service, the data consumer could be always offered a choice: perform the federation herself, by setting up own local federation service over the freely available endpoints, or pay for the use of an existing one.

Tutorial

A tutorial on federating linked data services with HyperGraphQL is available at http://hypergraphql.org/tutorial.

Acknowledgments: Thanks are due to all the collaborators on the HyperGraphQL project at Semantic Integration, in particular Mirko Dimartino, who has co-developed the solutions described in this post.

Weaving Linked Data Cloud with (Hyper)GraphQL

Federated querying revisited

Controlled deployment of queryable linked data

Federating services

Tutorial

Written by Szymon Klarman