Feature Proposals for SPARQL 1.2

Marcelo Barbieri
Feb 16 · 9 min read

The aim of this article is to introduce the Northwind and AdventureWorks sample databases to the community as a valuable resource to work on development of feature proposals for new versions of SPARQL, as well as introduce a preliminary list of proposed features.

These databases can be easily set up on SQL Server by following the instructions in the Setting up the sample databases on SQL Server section of the Northwind SQL vs SPARQL article.

RDF Triplestore versions of the Northwind database have also been made available, which can be used to validate new features once they have been implemented in SPARQL, by executing SQL and SPARQL queries side by side and comparing their results. The article above contains instructions on how to set up the Northwind database on Stardog* and GraphDB* RDF Graph Databases.

* “These semantic technologies are the core technologies for any Enterprise Knowledge Graph (EKG)”

AdventureWorks is only available in SQL Server.

The SPARQL query language for RDF is currently in version 1.1. The SPARQL 1.2 Community Group is a forum for discussion and refinement of SPARQL 1.1 and has been working on new feature proposals for version 1.2.

This article contains examples with the following proposed features:

  • Correlated Subquery
  • Window Functions
  • Ranking Functions
  • Aggregate Functions
  • Temporary Tables
  • Select Top with Ties

You may choose to skip the environment set up if you don’t want to execute the queries yourself, but only browse the results in the screenshots.

Correlated Subquery

In queries that include a correlated subquery (also known as a repeating subquery), the subquery depends on the outer query for its values. This means that the subquery is executed repeatedly, once for each row that might be selected by the outer query.

Select the 3 most recent orders from each customer.

USE Northwind;
SELECT
cst.CustomerID,
cst.City,
cpp.OrderID,
cpp.OrderDate
FROM
Customer AS cst
CROSS APPLY
(
SELECT TOP 3
ord.OrderID, ord.OrderDate, cst.CustomerID
FROM
[Order] AS ord
WHERE
ord.customerid = cst.customerid
ORDER BY
ord.OrderDate DESC
) AS cpp
ORDER BY
cst.CustomerID,
cst.City,
cpp.OrderDate DESC

Window Functions

The SELECT — OVER clause determines the partitioning and ordering of a rowset before the associated window function is applied. It defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

The following example replaces the previous correlated subquery with a window function.

Select the 3 most recent orders from each customer

USE Northwind;
SELECT
ptt.*
FROM
(
SELECT
cst.CustomerID,
cst.City,
ord.OrderID,
ord.OrderDate,
ROW_NUMBER() OVER(PARTITION BY cst.CustomerID ORDER BY ord.OrderDate DESC) AS [RowNumber]
FROM Customer AS cst
INNER JOIN [Order] AS ord
ON cst.CustomerID = ord.CustomerID
) ptt
WHERE
ptt.[RowNumber] <= 3

Refer to the Ranking Functions section for more information on ROW_NUMBER.

The following are two more examples on window functions.

Top 5 most expensive product in each product category

USE Northwind;
SELECT
ptt.*
FROM
(
SELECT
ctg.CategoryName,
prd.ProductName,
prd.UnitPrice,
ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber]
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
) ptt
WHERE
ptt.[RowNumber] <= 5
ORDER BY
ptt.CategoryName,
ptt.RowNumber

Order total quantity and percentage by product

USE Northwind;
SELECT
ord.OrderID,
ord.ProductID,
ord.Quantity,
SUM(ord.Quantity) OVER(PARTITION BY ord.OrderID) AS Total,
CAST(1. * ord.Quantity / SUM(ord.Quantity) OVER(PARTITION BY ord.OrderID) * 100 AS DECIMAL(5,2)) AS "PercByProduct"
FROM
OrderDetail ord
WHERE
ord.OrderID IN(10248,10249, 10250)

Ranking Functions

Ranking functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive the same value as other rows. Ranking functions are nondeterministic.

ROW_NUMBER: Numbers the output of a result set. More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

The following example calculates a row number for the salespeople in Adventure Works based on their year-to-date sales ranking.

USE AdventureWorks2017; 
SELECT
ROW_NUMBER() OVER(ORDER BY SalesYTD DESC) AS Row,
FirstName,
LastName,
ROUND(SalesYTD,2,1) AS "Sales YTD"
FROM
Sales.vSalesPerson
WHERE
TerritoryName IS NOT NULL
AND SalesYTD <> 0

RANK: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. ROW_NUMBER and RANK are similar. ROW_NUMBER numbers all rows sequentially (for example 1, 2, 3, 4, 5). RANK provides the same numeric value for ties (for example 1, 2, 2, 4, 5).

The following example ranks the products in inventory the specified inventory locations according to their quantities. The result set is partitioned by LocationID and logically ordered by Quantity. Notice that products 494 and 495 have the same quantity. Because they are tied, they are both ranked one.

USE AdventureWorks2017;  
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity
,RANK() OVER
(PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank
FROM Production.ProductInventory AS i
INNER JOIN Production.Product AS p
ON i.ProductID = p.ProductID
WHERE i.LocationID BETWEEN 3 AND 4
ORDER BY i.LocationID

DENSE_RANK: This function returns the rank of each row within a result set partition, with no gaps in the ranking values. The rank of a specific row is one plus the number of distinct rank values that come before that specific row.

This example ranks the products in inventory, by the specified inventory locations, according to their quantities. DENSE_RANK partitions the result set by LocationID and logically orders the result set by Quantity. Notice that products 494 and 495 have the same quantity. Because they both have the same quantity value, they both have a rank value of one.

USE AdventureWorks2017;  
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity
,DENSE_RANK() OVER
(PARTITION BY
i.LocationID ORDER BY i.Quantity DESC) AS Rank
FROM Production.ProductInventory AS i
INNER JOIN Production.Product AS p
ON i.ProductID = p.ProductID
WHERE i.LocationID BETWEEN 3 AND 4
ORDER BY i.LocationID

NTILE: Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.

The following example divides rows into four groups of employees based on their year-to-date sales. Because the total number of rows is not divisible by the number of groups, the first two groups have four rows and the remaining groups have three rows each.

USE AdventureWorks2017;   
SELECT p.FirstName, p.LastName
,NTILE(4) OVER(ORDER BY SalesYTD DESC) AS Quartile
,CONVERT(NVARCHAR(20),s.SalesYTD,1) AS SalesYTD
, a.PostalCode
FROM Sales.SalesPerson AS s
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0

The following is an example with all the above Ranking Functions using the Northwind database.

Top 3 most expensive product in each product category

USE Northwind;
SELECT
ptt.*
FROM
(
SELECT
ctg.CategoryName,
prd.ProductName,
prd.UnitPrice,
ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber],
RANK() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RANK],
DENSE_RANK() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [DENSE_RANK],
NTILE(6) OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [NTILE]
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
) ptt
WHERE
ptt.[RowNumber] <= 3
ORDER BY
ptt.CategoryName,
ptt.RowNumber

Update Scenarios

The following is an example of common update scenario using window functions.

Apply a 10% discount on the top 3 most expensive product in each product category

USE Northwind;
UPDATE
Product
SET
UnitPrice = UnitPrice * 0.9
WHERE
ProductID IN
(
SELECT
ptt.ProductID
FROM
(
SELECT
prd.ProductID,
ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber]
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
) ptt
WHERE
ptt.[RowNumber] <= 3
)

Temporary Tables

Temporary Tables can be used in SQL to store a dataset that goes under many calculation steps before being committed to a physical table on the database. It holds intermediate results that can be consumed multiple times at different stages of a long SQL query. Currently, there is no such a feature available in SPARQL.

The following query uses a temp table to save the list of products to be updated from the previous example.

Apply a 10% discount on the top 3 most expensive product in each product category.

USE Northwind;
SELECT
ptt.ProductID,
ptt.UnitPrice
INTO
#ProdDiscount
FROM
(
SELECT
prd.ProductID,
prd.UnitPrice,
ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber]
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
) ptt
WHERE
ptt.[RowNumber] <= 3
UPDATE
Product
SET
UnitPrice = UnitPrice * 0.9
WHERE
ProductID IN (SELECT ProductID FROM #ProdDiscount)

Check the updated records

USE Northwind;
SELECT
prd.ProductID,
prd.UnitPrice
FROM
Product prd
INNER JOIN #ProdDiscount tpr
ON prd.ProductID = tpr.ProductID
ORDER BY
prd.ProductID

SELECT TOP WITH TIES

Returns two or more rows that tie for last place in limited results set. You must use this argument with the ORDER BY clause. WITH TIES might cause more rows to be returned than the value specified in expression. For example, if expression is set to 5 but two additional rows match the values of the ORDER BY columns in row 5, the result set will contain seven rows.

Top 5 Supplier Representative by number of products sold.

SELECT 
TOP 5
WITH TIES -- returns rows that tie for last place
spl.ContactName,
COUNT(prd.ProductID) as ProductCount
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
INNER JOIN Supplier spl
ON prd.SupplierID = spl.SupplierID
GROUP BY
spl.SupplierID,
spl.ContactName
ORDER BY
ProductCount DESC

SPARQL cannot return rows that tie for last place.

SELECT 
?supplierContactName
(COUNT(?product) as ?productCount)
WHERE
{
?product a :Product ;
:hasSupplier ?supplier .
?supplier a :Supplier ;
:contactName ?supplierContactName .
}
GROUP BY
?supplierContactName
ORDER BY
DESC(?productCount)
LIMIT 5

References

Refer to the following W3C github page for a discussion on SQL Window Function proposal to be included in SPARQL 1.2.

agnos.ai

The Enterprise Knowledge Graph Company

Marcelo Barbieri

Written by

Knowledge Graph Engineer - agnos.ai

agnos.ai

agnos.ai

agnos.ai is a specialist consultancy that designs and implements Enterprise Knowledge Graphs. We harness the power of semantic technology to solve your most complex enterprise data challenges.

Marcelo Barbieri

Written by

Knowledge Graph Engineer - agnos.ai

agnos.ai

agnos.ai

agnos.ai is a specialist consultancy that designs and implements Enterprise Knowledge Graphs. We harness the power of semantic technology to solve your most complex enterprise data challenges.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store