What is a SPARQL Endpoint, and why is it important?

Published in

OpenLink Virtuoso Weblog

10 min readAug 4, 2018

Situation Analysis

When working with data, the vital issue of Data Flow — that is, the movement of data, from creation to storage, revision, and repurposing — has historically been impeded by disparity along several lines, including —

Data Item (i.e., Subject or Entity) Identifiers
Data Access Protocols
Data Structures
Data Locations
Data Query Languages

What sits between You and your Data [live Link to Interactive version of this image]

What is the SPARQL Query Language?

The SPARQL Query Language is a Declarative Query Language (like SQL) for performing Data Manipulation and Data Definition operations on Data represented as a collection of RDF Language sentences/statements.

Every SPARQL Query has a Solution Modifier (or head) and a Query Body. The Solution Modifier provides the basis for categorizing different types of SPARQL Query solutions. The Query Body comprises a collection of of RDF statement patterns (represented using Turtle Notation) that represent the entity relationships to which a query is scoped.

Read Oriented Query Types

SELECT — the Solution Modifier takes the form of a SELECT list, which is very much like a SQL SELECT list, that projects a query solution in tabular form

# SELECT
#   {TABULAR-OUTPUT-SELECTION}
# WHERE
#   {Entity Relationship Types represented in
#    RDF-Turtle, supporting use of variables,
#    constants, and blank nodes as identifier 
#    types}
#
# Example:
#   Present a Query Solution in Tabular form 
#   that lists sentence subjects, predicates, 
#   and objects identified by the variables 
#   ?s, ?p, and ?o, and limit row count to 100SELECT DISTINCT 
       ?s ?p ?o
WHERE 
   { ?s ?p ?o }
LIMIT 100

CONSTRUCT — the Solution Modifier takes the form of an RDF sentence/statement graph specification

# CONSTRUCT
#   {ENTITY-RELATIONSHIP-GRAPH-OUTPUT-SELECTION}
# WHERE
#   {Entity Relationship Types represented 
#    in RDF-Turtle that includes support for 
#    variables, constants, and blank nodes 
#    as identifier types}
#
# Example:
#   Present a Query Solution in Entity Relationship 
#   Graph form comprising sentence subjects, 
#   predicates, and objects identified by the 
#   variables ?s, ?p, and ?o, and limit 
#   relationship count to 100CONSTRUCT 
    { ?s ?p ?o }
WHERE
    { ?s ?p ?o }
LIMIT 100

DESCRIBE — a variant of CONSTRUCT where the Solution Modifier takes the form of an RDF sentence/statement graph specification that describes a selection of entities

# DESCRIBE
#   {ENTITY-RELATIONSHIP-GRAPH-OUTPUT-SELECTION}
# WHERE
#   {Entity Relationship Types represented in
#    RDF-Turtle, supporting use of variables,
#    constants, and blank nodes as identifier 
#    types}
#
# Example:
#   Present a Query Solution in Entity Relationship 
#   Graph form that describes the all subjects of 
#   sentences where the subjects, predicates, and 
#   objects are identified by the variables ?s, ?p, 
#   and ?o, and limit relationship count to 100DESCRIBE ?s 
WHERE
   { ?s ?p ?o }
LIMIT 100

ASK — this type of query returns a simple boolean (Yes or No) query solution

# ASK 
# WHERE
#   {Entity Relationship Types represented in
#    RDF-Turtle, supporting use of variables,
#    constants, and blank nodes as identifier 
#    types}
#
# Example:
#   Inquire about the existence of sentences that 
#   have the subjects, predicates, and objects 
#   identified by variables ?s, ?p, and ?oASK
WHERE
  { ?s ?p ?o }

Write-oriented Query Types

Write-oriented SPARQL Queries perform CREATE, INSERT, UPDATE, DELETE, and other "change" operations on collections of RDF sentences/statements associated with a specific DBMS or Triple/Quad Store Document (or Data Source) referred to as a Named Graph.

CREATE — creates a new empty Named Graph.

# SILENT keyword ensures this operation is always 
# successful, even if the target Named Graph wasn't 
# explicitly created by a previous CREATE statementCREATE SILENT GRAPH <urn:my:document:1>

INSERT — adds RDF sentences to a Named Graph explicitly or based on conditions satisfied in the SPARQL Query body

PREFIX : <#> INSERT
  { GRAPH <urn:my:document:1> 
    { :entity1 :relatedTo :entity2 . } 
  }

COPY — copies RDF sentences between Named Graphs

COPY <urn:my:document:1> 
  TO <urn:my:document:2>

ADD — adds (or appends) RDF sentences to a Named Graph

ADD <urn:my:document:2> 
 TO <urn:my:document:3>

MOVE — moves RDF sentences from one Named Graph to another Named Graph; i.e., the Source Named Graph and its sentences are permanently removed after successful recreation in the Target Named Graph

MOVE <urn:my:document:3> 
  TO <urn:my:document:4>

DELETE — removes RDF sentences that satisfy conditions in the SPARQL Query body from the Target Named Graph. DELETE queries can explicitly target specific statements —

# Remove specific statements, where the sentence 
# subject, predicate, and objects are all identified 
# by constants in the SPARQL queryPREFIX : <#>DELETE
  { GRAPH <urn:my:document:1> 
    { :entity1 :relatedTo :entity2 . } 
  }

— or target whatever statements satisfy a pattern built with variables —

# Remove all attribute-names (predicates) and 
# attribute-values (objects) for a specific entity, 
# i.e., the RDF sentence subject is identified by 
# a constant while the predicate and object are 
# identified by variablesPREFIX : <#>DELETE
  { GRAPH <urn:my:document:1> 
    { :entity1 ?p ?o . } 
  }
WHERE
  { GRAPH <urn:my:document:1> 
    { :entity1 ?p ?o . } 
  }

CLEAR — removes all RDF sentences in a specific Named Graph i.e., you end up with an empty Named Graph

CLEAR GRAPH <urn:my:document:2>

DROP — removes a Named Graph in its entirety from the DBMS or Triple/Quad Store

# SILENT keyword ensures this operation is always 
# successful, even if the target Named Graph wasn't 
# explicitly created by a previous CREATE statementDROP SILENT GRAPH <urn:my:document:1>

What is a SPARQL Endpoint?

A SPARQL Endpoint is a Point of Presence on an HTTP network that’s capable of receiving and processing SPARQL Protocol requests. It is identified by a URL commonly referred to as a SPARQL Endpoint URL.

What is the SPARQL Protocol?

The SPARQL Protocol is an HTTP-based protocol for performing SPARQL operations against data via SPARQL Endpoints. Subject to the kind of operation being performed, HTTP payloads are dispatched using GET, POST, or PATCH methods.

What is Federated SPARQL?

Federated SPARQL, originally called and still sometimes referred to as SPARQL-FED, is the federated form of a SPARQL Query that provides access to RDF statements/sentences (data) provided by remote SPARQL Query Service endpoints; i.e., this is how SPARQL queries are federated via the SPARQL Protocol.

A SERVICE clause in the body of a SPARQL Query is used to identify a remote SPARQL Endpoint and the actual query that’s executed against that remote endpoint as part of a query solution production pipeline.

SELECT
  DISTINCT ?s ?p ?o
WHERE
  { 
    SERVICE <http://linkeddata.uriburner.com/sparql> 
      {
        SELECT ?s ?p ?o
        WHERE {?s ?o ?o}
        LIMIT 100                
      }
  }

What are SPARQL Query Result Serialization Formats?

SPARQL Query Result Serialization Formats are a variety of document content types associated with SPARQL Query solutions. Some Query Types limit the possible Serialization Formats in which their Results may be delivered. Among others, these include —

text/html — SELECT Queries
application/sparql-results+xml — SELECT Queries
application/sparql-results+json — SELECT Queries
text/turtle — CONSTRUCT & DESCRIBE Queries
application/n-triples — CONSTRUCT & DESCRIBE Queries
text/plain — CONSTRUCT & DESCRIBE Queries
application/ld+json — CONSTRUCT & DESCRIBE Queries
application/rdf+xml — CONSTRUCT & DESCRIBE Queries

What is a SPARQL Query Service?

A SPARQL Query Service is an HTTP Service (also known as a Web Service) that offers an API for performing declarative Data Definition and Data Manipulation operations on data represented as RDF sentence collections. Naturally, this kind of service is provided by a Database Management System Application (DBMS) or Triple/Quad Store associated with a URL that identifies its point of presence on an HTTP network, i.e., the address to which messages (queries) are dispatched.

Using a SPARQL Query Service

HTML-based Query Editor

More often than not, a SPARQL endpoint is paired with an HTML document that functions as a simple interface for query editing and execution. By convention, the URL of this kind of document includes the literal sparql as the final path component or as the host part of a web site’s canonical name.

Examples include:

http://linkeddata.uriburner.com/sparql — URIBurner Service for “on the fly” Linked Data Generation from a myriad of Data Sources and APIs
http://lod.openlinksw.com/sparql — Linked Open Data Cloud (or LOD Cloud) Cache
http://dbpedia.org/sparql — DBpedia (nucleus of the LOD Cloud)
http://sparql.uniprot.org/ — Uniprot (Bleeding-edge KnowledgeGraph for Bioinformatics)
https://sparql.nextprot.org — neXtProt (Bleeding-edge KnowledgeGraph for Bioinformatics)
https://sparql.rhea-db.org/sparql — Rhea reactions data (Bleeding-edge KnowledgeGraph for Bioinformatics)
https://sparql.orthodb.org — OrthoDB genetics data (Bleeding-edge KnowledgeGraph for Bioinformatics)
https://sparql.omabrowser.org/sparql — OMA (“Orthologous MAtrix”) genomics data (Bleeding-edge KnowledgeGraph for Bioinformatics)

Live Query Example

A query is delivered to an endpoint via a HTTP request. The server associated with the endpoint returns a query solution via an HTTP response that includes a document URL.

SELECT DISTINCT
  ?s3 AS ?appLabel 
  ?s7 AS ?versionLabel  
  ?s1 AS ?executableUri 
  ?s4 AS ?formatLabel
  ?s6 AS ?OSUri 
  ?s8 AS ?OSLabel 
  ?s2 AS ?downloadUrl 
FROM <urn:data:openlink:products> 
WHERE 
  {
    ?s1  a  ?s9 .
    ?s1  <http://schema.org/downloadUrl>  ?s2 .
    ?s1  <http://schema.org/name>  ?s3 .
    ?s1  <http://purl.org/dc/terms/format>  ?s4 .
    ?s1  <http://schema.org/name>  ?s5 .
    ?s1  <http://www.openlinksw.com/ontology/software#hasOperatingSystemFamily>  ?s6 .
    # FILTER ( ?s6 = <http://www.openlinksw.com/ontology/software#GenericLinux> ) .
    ?s1  <http://www.openlinksw.com/ontology/products#versionText>  ?s7 .
    ?s6  schema:name  ?s8 . 
       FILTER ( ?s9 IN ( <http://www.openlinksw.com/ontology/installers#ExecutableArchive> , 
                         <http://www.openlinksw.com/ontology/installers#InitializationFile>
              )       )
  }
ORDER BY 5 1

Operating System Command-line

SPARQL Queries are executable directly from your computer’s operating system via curl — a generic HTTP command invocation utility.

Examples

Copy the following SPARQL Query text into a local file named, for example, sample-query.sparql:

SELECT DISTINCT
  ?s3 AS ?appLabel 
  ?s7 AS ?versionLabel  
  ?s1 AS ?executableUri 
  ?s4 AS ?formatLabel
  ?s6 AS ?OSUri 
  ?s8 AS ?OSLabel 
  ?s2 AS ?downloadUrl
FROM <urn:data:openlink:products>
WHERE 
  {
    ?s1 a ?s9 .
    ?s1 <http://schema.org/downloadUrl> ?s2 .
    ?s1 <http://schema.org/name> ?s3 .
    ?s1 <http://purl.org/dc/terms/format> ?s4 .
    ?s1 <http://schema.org/name> ?s5 .
    ?s1 <http://www.openlinksw.com/ontology/software#hasOperatingSystemFamily> ?s6 .## Uncomment (remove the leading hash "#" from) 
## the following line to filter by Operating 
## System Family; in this case, Generic Linux.# FILTER ( ?s6 = <http://www.openlinksw.com/ontology/software#GenericLinux> ) .    ?s1 <http://www.openlinksw.com/ontology/products#versionText> ?s7 .
    ?s6 schema:name ?s8 . 
  FILTER ( ?s9 in (<http://www.openlinksw.com/ontology/installers#ExecutableArchive>, <http://www.openlinksw.com/ontology/installers#InitializationFile> ) )
  }## This ORDER BY clause will sort result rows 
## by the 6th and 1st columns (i.e., ?OSLabel 
## and ?appLabel)ORDER BY 6 1

Then use the command-line to execute the command corresponding to your desired output serialization format:

Get output as JSON

QUERY=$(<sample-query.sparql) && curl -X POST -H "Accept:application/sparql-results+json" --data-urlencode "query=$QUERY" http://linkeddata.uriburner.com/sparql

Get output as HTML

QUERY=$(<sample-query.sparql) && curl -X POST -H "Accept:text/html" --data-urlencode "query=$QUERY" http://linkeddata.uriburner.com/sparql

Get output as CSV

QUERY=$(<sample-query.sparql) && curl -X POST -H "Accept:text/csv" --data-urlencode "query=$QUERY" http://linkeddata.uriburner.com/sparql

Benefits of a SPARQL Endpoint

A SPARQL endpoint offer several unique benefits via an API:

Declarative data interaction (manipulation and definition) may be integrated via HTTP — targeting data represented as fine-grained RDF sentence/statement collections
HTTP document URLs bring extensive flexibility to data queries, i.e., each component of a URL is a slot for parameterized alteration of the specifics of a query that extends from its target endpoint to the nature of the query solution
A wide variety of content types are supported for query solution documents — HTML, JSON, CSV, RDF-Turtle, RDF-N-Triples, RDF-XML, and others
Content types of all query solution documents are negotiable — courtesy of HTTP content-negotiation (“con-neg”)
Endpoints may be accessed with any HTTP-compliant user agent or service

Conclusion

In every SPARQL Endpoint, we have an access point for an HTTP protocol extension — from the same W3C that delivered HTTP itself — that offers fine-grained Data Definition and Manipulation operations via GET, POST, and PATCH operations that support query solution (result set) delivery using a variety of negotiable document types. It is hard to imagine why anyone would seek a proprietary alternative with a fraction of the expressivity and platform independence offered by this powerful solution already delivered as a web-friendly API!

In a world where Data is the New Electricity, conducted by hyperlinks (specifically, HTTP URIs), we can effectively look to SPARQL endpoints as providers of the Data Junction Box functionality. This fact is clearly demonstrated by the ever-increasing number of SPARQL Endpoints associated with the nodes that comprise the already massive — and still growing — Linked Open Data Cloud.

Massive Linked Open Data Cloud where a majority of nodes are provide a SPARQL Endpoint

As you can see from the LOD Cloud pictorial, SPARQL Endpoints are part of a massive data-grid ready to fuel a new generation of modern applications, services, smart agents, and appliances. Remember, until the electricity grid was in place and functional, there was no viable market for consumer electronics (from the lightbulb and toaster, to the washing machine and vacuum cleaner, to the air conditioner and refrigerator).

An integral component of the LOD Cloud, Virtuoso lets anyone experience the full power and sophistication of a SPARQL endpoint (for local or federated use) by simply downloading a few files:

Virtuoso Enterprise Edition — (1) Server Binary, (2) Server Configuration File, (3) License Manager Binary, and (4) License File for Windows, macOS, or Linux
Virtuoso Open Source Edition — (1) Server Binary and (2) Server Configuration File for Windows, macOS, or Linux

What is a Virtuoso SPARQL Endpoint, and why is it important?
What is the Linked Open Data Cloud, and why is it important?
Linked Open Data Cloud — broken down by sub-clouds
SPARQL Endpoint Monitor — a service provided by Datahub.io
Tabulated Comparison of Triple/Quad Stores, Graph Databases, and Relational Database Management Systems (RDBMS)
Virtuoso Enterprise Editions vs Open Source Editions Features Comparison Matrix
Virtuoso Server Binary Downloads via Google Spreadsheet — for Windows, macOS, and Linux
Virtuoso Server Binary Downloads via HTML-based Pivot Table — for Windows, macOS, and Linux
Virtuoso Server Binary Downloads via basic HTML page — for Windows, macOS, and Linux
Virtuoso Download & License Generation Service — free evaluation licenses and installer archives for Windows, macOS, and Linux