Testing Invisible XML using XSpec

XPath 4.0 and BaseX Pave the Way

Amanda Galtman
8 min readAug 7, 2024

At last week’s “Balisage: The Markup Conference 2024” event, I heard a lot about Invisible XML (iXML). It is helping people do interesting and practical things, successfully and easily. It is being compared with regular expressions: iXML is much less commonly used, for sure, but more powerful for certain tasks and a handy tool to know about for some of the same reasons that regexes are handy. If you are new to iXML, I recommend Norm Tovey-Walsh’s tutorials, linked from the end of this article.

Some conference speakers talked about how their iXML grammars evolved incrementally or how they needed to fiddle with whitespace to get it right. I wondered how people ensure that today’s whitespace bug fix or feature evolution doesn’t ruin yesterday’s whitespace bug fix. I wondered if XSpec can be made to test an iXML grammar.

It turns out that, because of how XSpec relies on XPath and because XPath 4.0 (still in draft status) calls for a function named invisible-xml, XSpec can test an iXML grammar. This topic shows two designs for testing an iXML grammar in XSpec. Thanks to John Lumley for mentioning the invisible-xml function during his conference talk.

Both XSpec designs depend on BaseX and Markup Blitz software for a preliminary implementation of the invisible-xml function.

A Sample iXML Grammar

The working draft of the XPath 4.0 specification (link) includes the following sample iXML grammar.

Example 1. Sample iXML Grammar

date = year, -'-', month, -'-', day .
year = d, d, d, d .
month = '0', d | '1', ['0'|'1'|'2'] .
day = ['0'|'1'|'2'], d | '3', ['0'|'1'] .
-d = ['0'-'9'] .

The purpose of this grammar is to enable us to parse a date in a particular textual format and produce an equivalent representation in XML. For example, this grammar enables us to convert 2024-08-07 into the following XML format.

Example 2. Sample Date in XML

<date><year>2024</year><month>08</month><day>07</day></date>

A Function that Produces a Function

The invisible-xml function in XPath 4.0 converts our iXML grammar into a function that, in turn, is capable of converting data into an XML document according to the rules of our iXML grammar. You can interpret this operation by saying that the invisible-xml function creates a parser for our iXML grammar.

Continuing the example above, if we apply the invisible-xml function to our sample grammar, we get a function item that we can store in a variable named p. The p function is capable of converting the data 2024-08-07 into the XML document shown in Example 2. Sample Date in XML. In symbols, $p('2024-08-07') equals <date><year>2024</year><month>08</month><day>07</day></date>. The $ symbol is just the prefix for referencing a variable.

The same p function is capable of converting any data that conforms to the grammar into the equivalent XML markup, where “equivalent” means according to the rules of this grammar.

To recap, we wrote five grammar rules to describe how to interpret text like 2024-08-07 and convert it into a certain XML format, and then the invisible-xml function gave us a nifty tool that actually performs that conversion—not only for 2024-08-07, but for 1918-11-11 or 2087-01-31 or any other string that the grammar rules support.

What Does Testing iXML Mean?

If we are developing an iXML grammar, we want to check what it “does” with input data. A parser for the grammar is the concrete tool that processes input data and produces either an XML document or a failure. We want to ensure that the parser does the right thing with each set of input data that we try. If the data is meant to be supported by the grammar, the parser must convert the data into the correct XML document. We also expect the parser to fail for data we don’t expect the grammar to support, although testing unsupported data might be lower priority or perhaps not important at all.

To test the iXML grammar, we can use the invisible-xml function to generate a parser function for the grammar, ask the parser function to process lots of data, and verify that the parser function produces the expected XML document each time (or raises an error for unsupported data). On the surface, XSpec doesn’t officially claim to “support” testing iXML. But testing the grammar parser function is a lot like testing an XQuery function or XSLT stylesheet function, except that we got the parser function from invisible-xml instead of by writing it directly, and the parser function is stored in a variable instead of in a function declaration in XQuery or XSLT.

We illustrate two XSpec designs, and you can decide which one you prefer.

XSpec Design 1

Design 1 starts with the idea, “invisible-xml is a function, and we know how to use <x:call> in XSpec to call a function.”

Creating the Parser in XSpec

If the iXML grammar is in a file named my-ixml-grammar.txt in the same directory as the XSpec file, we write the following <x:call> element.

Example 3. Design 1 Calling invisible-xml

<x:call function="invisible-xml">
<x:param
select="'my-ixml-grammar.txt'
=> resolve-uri($x:xspec-uri)
=> unparsed-text()"/>
</x:call>

The invisible-xml function parses the text in my-ixml-grammar.txt and produces a parser function for the grammar. That means the actual result, $x:result, in this XSpec scenario is a parser function.

Verifying Parser Behavior

Next, we want to ask the parser function to process data. We can’t use <x:call> again in the same scenario or chain function calls together in one <x:call> element. However, we can make good use of the <x:expect> syntax that verifies something derived from the actual result. The thing we want to derive from $x:result is the output of calling this parser function on some data.

Example 4. Design 1 Verifying Result for One Input

<x:expect label="date element with year, month, and day children"
test="$x:result('2024-08-07')" select="/">
<date>
<year>2024</year>
<month>08</month>
<day>07</day>
</date>
</x:expect>

The attribute test="$x:result('2024-08-07') calls the parser function (produced by invisible-xml) on the input data '2024-08-07'. The embedded <date> element provides the expected result. The attribute select="/" says that the expected result is the document node containing the <date> element, not the <date> element itself.

With only this one <x:call function="invisible-xml"> element, we can have many <x:expect> elements, each of which verifies the behavior of the parser function for a different string of input data.

The Test Target

The invisible-xml function is part of XPath 4.0 rather than a function we wrote in a stylesheet or XQuery library module. So, what does the <x:description> element point to as the code we are testing? Both Design 1 and Design 2 use XQuery as the language to test, because the invisible-xml function has been experimentally implemented in BaseX, which is an XQuery engine. Beyond that, this design doesn’t actually need any substantive code in an XQuery library module. We can use a nearly empty XQuery module like this:

Example 5. Design 1 Trivial XQuery Module

xquery version "4.0";
module namespace noop = "urn:x-xspectacles:functions:ixml";

In XSpec, the <description> element’s start tag looks like this, assuming the XQuery module file is named no-op.xqm:

Example 6. Design 1 XSpec Top Element Tag

<x:description
query="urn:x-xspectacles:functions:ixml"
query-at="no-op.xqm"
xquery-version="4.0"
xmlns:x="http://www.jenitennison.com/xslt/xspec">

The xquery-version="4.0" attribute documents the dependency on (experimental support for the draft of) XQuery version 4.0, because the invisible-xml function is in the draft for XPath version 4.0.

XSpec Design 2

Design 2 takes a different approach: Write XQuery code that calls invisible-xml to generate the parser function, and then use <x:call> in XSpec to call the parser function.

Creating the Parser in XQuery

In Design 2, the XQuery library module declares a global variable that stores the grammar parser as a function item.

Example 7. Design 2 XQuery Calling invisible-xml

xquery version "4.0";
module namespace gr = "urn:x-xspectacles:functions:ixml";

declare variable $gr:parser as function(*) :=
unparsed-text('my-ixml-grammar.txt') => invisible-xml();

Verifying Parser Behavior

As in Design 1, we want to ask the parser function to process data, but this time we can use <x:call> to do it. Because the parser function is a function item stored in a variable, <x:call> uses the attibute call-as="variable".

The call-as attribute is new in XSpec v3.0, so please use Design 1 if you are using an earlier XSpec version.

The input data is a parameter of the parser function, so the data goes in <x:param>.

Example 8. Design 2 Calling Parser Function for One Input

<x:call function="gr:parser" call-as="variable">
<x:param select="'2024-08-07'"/>
</x:call>

The actual result ($x:result) is the XML document that the parser function creates from the data '2024-08-07'. The <x:expect> element does not need to derive anything from this actual result, so it merely provides the expected result in the same manner as in Design 1.

Example 9. Design 2 Verifying Result for One Input

<x:expect label="date element with year, month, and day children"
select="/">
<date>
<year>2024</year>
<month>08</month>
<day>07</day>
</date>
</x:expect>

To verify the behavior of the parser for a different string of input data, we would use a separate <x:scenario> element.

The Test Target

In XSpec, the <description> element’s start tag looks like this, where version="3.0" documents the dependency on XSpec version 3.0 due to the call-as attribute.

Example 10. Design 2 XSpec Top Element Tag

<x:description
query="urn:x-xspectacles:functions:ixml"
query-at="my-ixml.xqm"
version="3.0"
xquery-version="4.0"
xmlns:gr="urn:x-xspectacles:functions:ixml"
xmlns:x="http://www.jenitennison.com/xslt/xspec">

Running the XSpec Tests with BaseX

The Getting Started instructions for running an XSpec test use Saxon, but at this time, Saxon-HE does not support the invisible-xml function. Instead, we will use BaseX to run our XSpec tests that use invisible-xml. In the XSpec wiki, Run an XSpec test for XQuery with BaseX standalone provides general instructions.

Using invisible-xml in XSpec requires two special things not in the general instructions:

  • BaseX must be version 11.0 or later, so it includes the invisible-xml function implementation.
  • An additional library named Markup Blitz must be on the classpath when BaseX is running. One way to achieve that is to include the Markup Blitz location in the same argument that provides the BaseX location:
-p basex-jar="%BASEX_HOME%\lib\markup-blitz-1.4.jar;%BASEX_HOME%\BaseX.jar"

Without Markup Blitz, BaseX returns this error:

[basex:function] Function invisible-xml requires missing class: de.bottlecaps.markup.Blitz.

My commands on Windows, after I navigate to the directory containing my iXML, XQuery, and XSpec files (not the directory containing the XSpec implementation), look like the following.

set XSPEC_HOME_URI=file:///C:/.../xspec/
set BASEX_HOME=C:\...\BaseX111
set XMLCALABASH_JAR=C:\...\xmlcalabash-1.5.7-120\xmlcalabash-1.5.7-120.jar

java -jar "%XMLCALABASH_JAR%" -i source=./ixml-design1.xspec -p xspec-home=%XSPEC_HOME_URI% -p basex-jar="%BASEX_HOME%\lib\markup-blitz-1.4.jar;%BASEX_HOME%\BaseX.jar" -o result=./ixml-design1-result.html %XSPEC_HOME_URI%src\harnesses\basex\basex-standalone-xquery-harness.xproc

I abbreviated dependency locations using ..., but you can see the data format. Fill in the paths that point to the dependency locations on your own system.

You might see the following output:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

If the HTML report was generated, then the process worked and you can ignore that output.

Selected Resources

Key Takeaways

  • An XSpec scenario can call a function that you generated by calling another function. That capability and the invisible-xml function give you a means for testing an Invisible XML grammar with XSpec.
  • Design 1 uses <x:call> to generate a grammar parser and then uses any number of <x:expect> elements to process data with that parser and verify the resulting XML document.
  • Design 2 uses <x:call> to process one data set with a grammar parser and then uses <x:expect> to verify the resulting XML document.

Code is downloadable from https://github.com/galtm/xspectacles/ on GitHub, in the src/ixml folder.

--

--

Amanda Galtman

I'm an XML software developer, a maintainer of the XSpec infrastructure, and a contributor to a couple of other open source projects.