What is a SPARQL Endpoint, and why is it important?

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog
10 min readAug 4, 2018

--

Situation Analysis

When working with data, the vital issue of Data Flow — that is, the movement of data, from creation to storage, revision, and repurposing — has historically been impeded by disparity along several lines, including —

  • Data Item (i.e., Subject or Entity) Identifiers
  • Data Access Protocols
  • Data Structures
  • Data Locations
  • Data Query Languages
What sits between You and your Data [live Link to Interactive version of this image]

What is the SPARQL Query Language?

The SPARQL Query Language is a Declarative Query Language (like SQL) for performing Data Manipulation and Data Definition operations on Data represented as a collection of RDF Language sentences/statements.

Every SPARQL Query has a Solution Modifier (or head) and a Query Body. The Solution Modifier provides the basis for categorizing different types of SPARQL Query solutions. The Query Body comprises a collection of of RDF statement patterns (represented using Turtle Notation) that represent the entity relationships to which a query is scoped.

Read Oriented Query Types

  • SELECT — the Solution Modifier takes the form of a SELECT list, which is very much like a SQL SELECT list, that projects a query solution in tabular form
# SELECT
# {TABULAR-OUTPUT-SELECTION}
# WHERE
# {Entity Relationship Types represented in
# RDF-Turtle, supporting use of variables,
# constants, and blank nodes as identifier
# types}
#
# Example:
# Present a Query Solution in Tabular form
# that lists sentence subjects, predicates,
# and objects identified by the variables
# ?s, ?p, and ?o, and limit row count to 100
SELECT DISTINCT
?s ?p ?o
WHERE
{ ?s ?p ?o }
LIMIT 100
  • CONSTRUCT — the Solution Modifier takes the form of an RDF sentence/statement graph specification
# CONSTRUCT
# {ENTITY-RELATIONSHIP-GRAPH-OUTPUT-SELECTION}
# WHERE
# {Entity Relationship Types represented
# in RDF-Turtle that includes support for
# variables, constants, and blank nodes
# as identifier types}
#
# Example:
# Present a Query Solution in Entity Relationship
# Graph form comprising sentence subjects,
# predicates, and objects identified by the
# variables ?s, ?p, and ?o, and limit
# relationship count to 100
CONSTRUCT
{ ?s ?p ?o }
WHERE
{ ?s ?p ?o }
LIMIT 100
  • DESCRIBE — a variant of CONSTRUCT where the Solution Modifier takes the form of an RDF sentence/statement graph specification that describes a selection of entities
# DESCRIBE
# {ENTITY-RELATIONSHIP-GRAPH-OUTPUT-SELECTION}
# WHERE
# {Entity Relationship Types represented in
# RDF-Turtle, supporting use of variables,
# constants, and blank nodes as identifier
# types}
#
# Example:
# Present a Query Solution in Entity Relationship
# Graph form that describes the all subjects of
# sentences where the subjects, predicates, and
# objects are identified by the variables ?s, ?p,
# and ?o, and limit relationship count to 100
DESCRIBE ?s
WHERE
{ ?s ?p ?o }
LIMIT 100
  • ASK — this type of query returns a simple boolean (Yes or No) query solution
# ASK 
# WHERE
# {Entity Relationship Types represented in
# RDF-Turtle, supporting use of variables,
# constants, and blank nodes as identifier
# types}
#
# Example:
# Inquire about the existence of sentences that
# have the subjects, predicates, and objects
# identified by variables ?s, ?p, and ?o
ASK
WHERE
{ ?s ?p ?o }

Write-oriented Query Types

Write-oriented SPARQL Queries perform CREATE, INSERT, UPDATE, DELETE, and other "change" operations on collections of RDF sentences/statements associated with a specific DBMS or Triple/Quad Store Document (or Data Source) referred to as a Named Graph.

  • CREATE — creates a new empty Named Graph.
# SILENT keyword ensures this operation is always 
# successful, even if the target Named Graph wasn't
# explicitly created by a previous CREATE statement
CREATE SILENT GRAPH <urn:my:document:1>
  • INSERT — adds RDF sentences to a Named Graph explicitly or based on conditions satisfied in the SPARQL Query body
PREFIX : <#> INSERT
{ GRAPH <urn:my:document:1>
{ :entity1 :relatedTo :entity2 . }
}
  • COPY — copies RDF sentences between Named Graphs
COPY <urn:my:document:1> 
TO <urn:my:document:2>
  • ADD — adds (or appends) RDF sentences to a Named Graph
ADD <urn:my:document:2> 
TO <urn:my:document:3>
  • MOVE — moves RDF sentences from one Named Graph to another Named Graph; i.e., the Source Named Graph and its sentences are permanently removed after successful recreation in the Target Named Graph
MOVE <urn:my:document:3> 
TO <urn:my:document:4>
  • DELETE — removes RDF sentences that satisfy conditions in the SPARQL Query body from the Target Named Graph. DELETE queries can explicitly target specific statements —
# Remove specific statements, where the sentence 
# subject, predicate, and objects are all identified
# by constants in the SPARQL query
PREFIX : <#>DELETE
{ GRAPH <urn:my:document:1>
{ :entity1 :relatedTo :entity2 . }
}

— or target whatever statements satisfy a pattern built with variables —

# Remove all attribute-names (predicates) and 
# attribute-values (objects) for a specific entity,
# i.e., the RDF sentence subject is identified by
# a constant while the predicate and object are
# identified by variables
PREFIX : <#>DELETE
{ GRAPH <urn:my:document:1>
{ :entity1 ?p ?o . }
}
WHERE
{ GRAPH <urn:my:document:1>
{ :entity1 ?p ?o . }
}
  • CLEAR — removes all RDF sentences in a specific Named Graph i.e., you end up with an empty Named Graph
CLEAR GRAPH <urn:my:document:2>
  • DROP — removes a Named Graph in its entirety from the DBMS or Triple/Quad Store
# SILENT keyword ensures this operation is always 
# successful, even if the target Named Graph wasn't
# explicitly created by a previous CREATE statement
DROP SILENT GRAPH <urn:my:document:1>

What is a SPARQL Endpoint?

A SPARQL Endpoint is a Point of Presence on an HTTP network that’s capable of receiving and processing SPARQL Protocol requests. It is identified by a URL commonly referred to as a SPARQL Endpoint URL.

What is the SPARQL Protocol?

The SPARQL Protocol is an HTTP-based protocol for performing SPARQL operations against data via SPARQL Endpoints. Subject to the kind of operation being performed, HTTP payloads are dispatched using GET, POST, or PATCH methods.

What is Federated SPARQL?

Federated SPARQL, originally called and still sometimes referred to as SPARQL-FED, is the federated form of a SPARQL Query that provides access to RDF statements/sentences (data) provided by remote SPARQL Query Service endpoints; i.e., this is how SPARQL queries are federated via the SPARQL Protocol.

A SERVICE clause in the body of a SPARQL Query is used to identify a remote SPARQL Endpoint and the actual query that’s executed against that remote endpoint as part of a query solution production pipeline.

SELECT
DISTINCT ?s ?p ?o
WHERE
{
SERVICE <http://linkeddata.uriburner.com/sparql>
{
SELECT ?s ?p ?o
WHERE {?s ?o ?o}
LIMIT 100
}
}

What are SPARQL Query Result Serialization Formats?

SPARQL Query Result Serialization Formats are a variety of document content types associated with SPARQL Query solutions. Some Query Types limit the possible Serialization Formats in which their Results may be delivered. Among others, these include —

  • text/htmlSELECT Queries
  • application/sparql-results+xmlSELECT Queries
  • application/sparql-results+json SELECT Queries
  • text/turtleCONSTRUCT & DESCRIBE Queries
  • application/n-triplesCONSTRUCT & DESCRIBE Queries
  • text/plainCONSTRUCT & DESCRIBE Queries
  • application/ld+jsonCONSTRUCT & DESCRIBE Queries
  • application/rdf+xmlCONSTRUCT & DESCRIBE Queries

What is a SPARQL Query Service?

A SPARQL Query Service is an HTTP Service (also known as a Web Service) that offers an API for performing declarative Data Definition and Data Manipulation operations on data represented as RDF sentence collections. Naturally, this kind of service is provided by a Database Management System Application (DBMS) or Triple/Quad Store associated with a URL that identifies its point of presence on an HTTP network, i.e., the address to which messages (queries) are dispatched.

Using a SPARQL Query Service

HTML-based Query Editor

More often than not, a SPARQL endpoint is paired with an HTML document that functions as a simple interface for query editing and execution. By convention, the URL of this kind of document includes the literal sparql as the final path component or as the host part of a web site’s canonical name.

Examples include:

Live Query Example

A query is delivered to an endpoint via a HTTP request. The server associated with the endpoint returns a query solution via an HTTP response that includes a document URL.

SELECT DISTINCT
?s3 AS ?appLabel
?s7 AS ?versionLabel
?s1 AS ?executableUri
?s4 AS ?formatLabel
?s6 AS ?OSUri
?s8 AS ?OSLabel
?s2 AS ?downloadUrl
FROM <urn:data:openlink:products>
WHERE
{
?s1 a ?s9 .
?s1 <http://schema.org/downloadUrl> ?s2 .
?s1 <http://schema.org/name> ?s3 .
?s1 <http://purl.org/dc/terms/format> ?s4 .
?s1 <http://schema.org/name> ?s5 .
?s1 <http://www.openlinksw.com/ontology/software#hasOperatingSystemFamily> ?s6 .
# FILTER ( ?s6 = <http://www.openlinksw.com/ontology/software#GenericLinux> ) .
?s1 <http://www.openlinksw.com/ontology/products#versionText> ?s7 .
?s6 schema:name ?s8 .
FILTER ( ?s9 IN ( <http://www.openlinksw.com/ontology/installers#ExecutableArchive> ,
<http://www.openlinksw.com/ontology/installers#InitializationFile>
) )
}
ORDER BY 5 1
Breakdown of a SPARQL Protocol URL

Operating System Command-line

SPARQL Queries are executable directly from your computer’s operating system via curl — a generic HTTP command invocation utility.

Examples

Copy the following SPARQL Query text into a local file named, for example, sample-query.sparql:

SELECT DISTINCT
?s3 AS ?appLabel
?s7 AS ?versionLabel
?s1 AS ?executableUri
?s4 AS ?formatLabel
?s6 AS ?OSUri
?s8 AS ?OSLabel
?s2 AS ?downloadUrl
FROM <urn:data:openlink:products>
WHERE
{
?s1 a ?s9 .
?s1 <http://schema.org/downloadUrl> ?s2 .
?s1 <http://schema.org/name> ?s3 .
?s1 <http://purl.org/dc/terms/format> ?s4 .
?s1 <http://schema.org/name> ?s5 .
?s1 <http://www.openlinksw.com/ontology/software#hasOperatingSystemFamily> ?s6 .
## Uncomment (remove the leading hash "#" from)
## the following line to filter by Operating
## System Family; in this case, Generic Linux.
# FILTER ( ?s6 = <http://www.openlinksw.com/ontology/software#GenericLinux> ) . ?s1 <http://www.openlinksw.com/ontology/products#versionText> ?s7 .
?s6 schema:name ?s8 .
FILTER ( ?s9 in (<http://www.openlinksw.com/ontology/installers#ExecutableArchive>, <http://www.openlinksw.com/ontology/installers#InitializationFile> ) )
}
## This ORDER BY clause will sort result rows
## by the 6th and 1st columns (i.e., ?OSLabel
## and ?appLabel)
ORDER BY 6 1

Then use the command-line to execute the command corresponding to your desired output serialization format:

  • Get output as JSON
QUERY=$(<sample-query.sparql) && curl -X POST -H "Accept:application/sparql-results+json" --data-urlencode "query=$QUERY" http://linkeddata.uriburner.com/sparql
  • Get output as HTML
QUERY=$(<sample-query.sparql) && curl -X POST -H "Accept:text/html" --data-urlencode "query=$QUERY" http://linkeddata.uriburner.com/sparql
  • Get output as CSV
QUERY=$(<sample-query.sparql) && curl -X POST -H "Accept:text/csv" --data-urlencode "query=$QUERY" http://linkeddata.uriburner.com/sparql

Benefits of a SPARQL Endpoint

A SPARQL endpoint offer several unique benefits via an API:

  • Declarative data interaction (manipulation and definition) may be integrated via HTTP — targeting data represented as fine-grained RDF sentence/statement collections
  • HTTP document URLs bring extensive flexibility to data queries, i.e., each component of a URL is a slot for parameterized alteration of the specifics of a query that extends from its target endpoint to the nature of the query solution
  • A wide variety of content types are supported for query solution documents — HTML, JSON, CSV, RDF-Turtle, RDF-N-Triples, RDF-XML, and others
  • Content types of all query solution documents are negotiable — courtesy of HTTP content-negotiation (“con-neg”)
  • Endpoints may be accessed with any HTTP-compliant user agent or service

Conclusion

In every SPARQL Endpoint, we have an access point for an HTTP protocol extension — from the same W3C that delivered HTTP itself — that offers fine-grained Data Definition and Manipulation operations via GET, POST, and PATCH operations that support query solution (result set) delivery using a variety of negotiable document types. It is hard to imagine why anyone would seek a proprietary alternative with a fraction of the expressivity and platform independence offered by this powerful solution already delivered as a web-friendly API!

In a world where Data is the New Electricity, conducted by hyperlinks (specifically, HTTP URIs), we can effectively look to SPARQL endpoints as providers of the Data Junction Box functionality. This fact is clearly demonstrated by the ever-increasing number of SPARQL Endpoints associated with the nodes that comprise the already massive — and still growing — Linked Open Data Cloud.

Massive Linked Open Data Cloud where a majority of nodes are provide a SPARQL Endpoint

As you can see from the LOD Cloud pictorial, SPARQL Endpoints are part of a massive data-grid ready to fuel a new generation of modern applications, services, smart agents, and appliances. Remember, until the electricity grid was in place and functional, there was no viable market for consumer electronics (from the lightbulb and toaster, to the washing machine and vacuum cleaner, to the air conditioner and refrigerator).

An integral component of the LOD Cloud, Virtuoso lets anyone experience the full power and sophistication of a SPARQL endpoint (for local or federated use) by simply downloading a few files:

  • Virtuoso Enterprise Edition — (1) Server Binary, (2) Server Configuration File, (3) License Manager Binary, and (4) License File for Windows, macOS, or Linux
  • Virtuoso Open Source Edition — (1) Server Binary and (2) Server Configuration File for Windows, macOS, or Linux

Related

--

--

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.