Perl 5 XML Validation with DTD and XSD

Kirk Lewis
Cultured Perl
Published in
5 min readJan 15, 2018
source: https://www.flickr.com/photos/kirklewis/50131042077

This post is a brief but practical demonstration of validating an XML Document against Document Type Definition (DTD) and XML Schema Definition (XSD) files. Such validation is necessary to ensure that XML sent between both the client, and the server hosting an XML based Web Service, is received as expected. Both DTD and XSD allow you to define the elements and attributes, an XML document should contain. XSD has the advantage of being written in XML itself, making it easier to read. XSD also allows more information to be defined about an element — such as their data type, namespace and restrictions for values. There are a few modules for validating XML in Perl 5 but I will be using two from the Lib::XML namespace.

Project Setup

Project Files

I’ll be using the following files contained in a directory named perl5-xml-validation to demonstrate both DTD and XSD validation.

perl5-xml-validation/
.
├── bin
│ ├── dtd_validation.pl
│ └── xsd_validation.pl
├── cpanfile
└── xml
├── book.xml
└── schemas
├── book.dtd
└── book.xsd

Once you’ve created the directory, be sure to enter it using cd perl5-xml-validation.

Dependencies

Open the file cpanfile and list the following modules:

requires 'XML::LibXML';
requires 'Try::Tiny';

Run cpanm -L local --installdeps . to install the modules. I will explain the other files as we start using them.

The XML

The XML document used for this demonstration simply contains book details within the file ./xml/book.xml as follows:

<?xml version="1.0" encoding="utf-8" ?>
<book>
<title>XML Validation</title>
<numPages>-200</numPages>
</book>

The value ‘-200’ for the numPages element has been set deliberately. The reason should be clear later.

Validating the XML with DTD

The DTD File

The file ./xml/schemas/book.dtd contains the following definitions:

<!ELEMENT book (title,author,numPages?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT numPages (#PCDATA)>

For any XML document associated with this DTD to be valid, it must contain the elements book, title, author, and optionally numPages. As you will notice, the author element is missing from the XML we defined earlier — we’ll correct this later.

Now let’s validate the XML. The file ./bin/dtd_validation.pl contains the following code, which I explain below it.

The Code

#!/usr/bin/env perluse v5.18;
use warnings;

use XML::LibXML;
use Try::Tiny qw(try catch);
my $xml_doc = XML::LibXML->load_xml(location => './xml/book.xml');
my $dtd_doc = XML::LibXML::Dtd->new('', './xml/schemas/book.dtd');
my $is_xml_valid = try {
$xml_doc->validate($dtd_doc)
}
catch {
say '==> ' . $_;
return 0;
};
say $is_xml_valid ? 'Valid' : 'Invalid';

Regarding validation, the code above does the following:

  • Loads the XML file using load_xml, which returns a XML::LibXML::Document object which is assigned to $xml_doc.
  • Then the DTD file book.dtd is loaded as part of XML::LibXML::Dtd instance construction.
  • Finally, the validation itself is done using the validate method.

So that the ternary expression to the bottom of the code is executed, we use the functions try and catch which are exported by Try::Tiny. These two functions will handle any errors thrown by the validate method. Not handling any errors which occur during the call to validate will cause the script to halt execution, so the ternary operation would not be executed.

Validation Errors

Now if you run perl ./bin/dtd_validation.pl and have not corrected the XML prior to this point, you will notice similar output in your shell as seen below:

==> ./xml/book.xml:0: validity error : Element book content does not follow the DTD, expecting (title , author , numPages?), got (title numPages )Invalid

The output message is indicating that the book element’s children are not as expected. The validation expected title, author and optionally numPages (in that order) but instead it just got title and numPages.

Fixing the Error

Just adding the element author just after the title element and assigning it a value, for example: <author>Johny Bravo</author>, will ensure validation is successful next time the script is run. As such, Valid will be printed.

Validating XML with XSD

The XSD File

The file ./xml/schemas/book.xsd defines the structure which the ./xml/book.xml document should adhere to as follows:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name = "title" type = "xs:string"/>
<xs:element name = "author" type = "xs:string"/>
<xs:element name = "numPages" type = "xs:positiveInteger" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

Similarly to the DTD defined earlier, the elements title and author are mandatory, and the other, numPages is optional (indicated by minOccurs="0"). You should also note that the value of numPages is expected to be of type positiveInteger. XSD lets you define many different data types, using a richer syntax while DTD does not.

The Code

The file ./bin/xsd_validation.pl contains the following code, which I explain below it:

#!/usr/bin/env perluse v5.18;
use warnings;

use XML::LibXML;
use Try::Tiny qw(try catch);
my $xml_doc = XML::LibXML->load_xml(location => './xml/book.xml');
my $xsd_doc = XML::LibXML::Schema->new(location => './xml/schemas/book.xsd');
my $is_xml_valid = try {
not $xsd_doc->validate($xml_doc);
}
catch {
say '==> ' . $_;
return 0;
};
say $is_xml_valid ? 'Valid' : 'Invalid';

The code above is the same overall process as the DTD validation code seen earlier. The only two implementation differences are:

1) The module XML::LibXML::Schema is used to both load and validate the XML file.

2) The XML::LibXML::Schema method validate actually returns a 0 when validation is successful. As such, I precede the call to ...validate($xml) with the not keyword, to make its false return value true.

Validation Errors

Run the XSD validation using perl ./bin/xsd_validation.pl.

==> ./xml/book.xml:0: Schemas validity error : Element 'numPages': '-200' is not a valid value of the atomic type 'xs:positiveInteger'.Invalid

Running the script should give the output seen above. The error indicates that the XSD expects a positive integer value for numPages but in fact the XML contains a negative integer, in this case ‘-200’.

Fixing the Error

You’ve probably beat me to it, but just changing the numPages value from -200 to 200 and running the script again should output Valid.

Conclusion

Validating the XML document using the two XML::LibXML modules discussed only required two objects, a call to the validate method, and the use of try and catch, to handle any errors that may be thrown.

If you are interested in seeing the validation done using Perl tests, click here for the Github repository containing the code for this article.

Thank you for reading this article! Feel free to leave a clap👏 and also recommend this article to others.

--

--