Validating XML file using external XSD Schema

Utkarsh Sandeep Singh
Xebia Engineering Blog
5 min readMar 2, 2021

--

When we generate an XML document, it is very important to validate it with the external schema XSD (XML Schema Definition). By validating it with the external schema, it helps in identifying the issues in the vocabulary of XML language with its associated grammar rules.

XSD helps us in figuring out all the Warnings, Errors, and Fatal Errors that are present in an XML Document. XSD helps us in creating our XML a well-formed and valid XML.

  • Well-Formed XML: An XML is said to be well-formed if all the opening tags and closing tags are correctly associated with each other and the closing tag must contain “</>”. Whether the tags are properly nested or not.
  • Valid XML: An XML is valid only if it satisfies all the conditions of being well-formed as well as it is associated with the external schema also.

For a detailed explanation of the XSD Schema, one can refer to this link.

Let’s see an example of a Well-formed and Valid XML

<?xml version="1.0" encoding="UTF-8"?>
<Vehicles>
<Cars>BMW</Cars>
<Cars>Audi</Cars>
<Cars>Mercedes</Cars>
<Cars>Toyota</Cars>
<Cars>Honda</Cars>
<Bikes>Royal Enfield</Bikes>
<Bikes>Hero</Bikes>
<Bikes>Suzuki</Bikes>
</Vehicles>

Now, let’s see an example of an Invalid XML

<?xml version="1.0" encoding="UTF-8"?>
<Vehicles>
<Cars>BMW</Cars>
<Cars>Audi</Cars>
<Cars>Mercedes</Cars>
<Cars>Toyota</Bikes>
<Cars>Honda</Bikes>
<Bikes>Royal Enfield</Cars>
<Bikes>Hero</Cars>
<Bikes>Suzuki</Bikes>
</Vehicles>

Above we can see that there are two different XMLs, in which one is a Valid XML and another one is an Invalid XML. The second XML is invalid because it violates the properties of XML. The opening tag and closing tag should be matching but in invalid XML opening and closing, tags are mismatching to each other.

If we want to write a Java Code for validating our XML to validate it against an external schema, we can proceed with the below code:

Here, in the above code, we have used few classes like- SchemaFactory, Schema, and Validator. Let’s study the need of these classes for XML Validation in detail-

  • SchemaFactory: This class is present in the javax.xml.validation package, it acts as a compiler of Schema which reads the XSD file and prepares them for the validation of the XML file. This class uses some XML constants which are present in XMLConstants.W3C_XML_SCHEMA_NS_URI.
  • Schema: This class is also present in the javax.xml.validation package. SchemaFactory class is not thread safe but Schema class is thread safe among multiple threads and parsers. The object of this class creates a set of constraints that can validate against XML Document by reading the XSD file.
  • Validator: In javax.xml.validation.Validator class, we pass the XML as a Source that needs to be validated against the Schema. The Source class is used to give the stylesheet specifications associated with the XML Document.

In the above code, we can see that object of SchemaFactory class is created from which we have created an object of class Schema that takes an XSD file as an argument and now it will generate some set of constraints that are needed to be validated with the XML file. After that Validator class will capture all the constraints of the Schema class by creating its object. The validate() method of the Validator class will take the XML file as input and it will be converted to StreamSource from the file provided.

When the above code is executed, during validation of the XML file, if any validation fails, it throws a SAXParseException. This exception helps us in locating the error in the XML file. This exception comes with some predefined methods that help us in locating the line in which error is coming. Some methods are like getLineNumber(), getColumnNumber(), and getMessage(). These methods give us at which line, at which column, and exactly what error is coming.

Below is the example, when any error is present in the XML file it prints in the below format.

ERROR:: at Line: {16} Column: {54} Message: {cvc-pattern-valid: Value 'Ver1.1' is not facet-valid with respect to pattern 'Ver1.0' for type '#AnonType_SchemaVerForm_ITR6'.}

Now, if there are many errors are present in XML, but our code is limited to catch only the first SAXParseException which is encountered first. To handle all exceptions together we need to do some modifications to our code.

To handle all the exceptions together, we need to do some modifications to our code. We need to create one XSDErrorHandler Class, that implements the ErrorHandler interface and catches all the exceptions at a single time of execution of the program.

Below is the code snippet :

By implementing XSDErrorHandler Class, it will capture all the exceptions into the String variable Error” and it will classify them according to their type whether the exception which is encountered a Warning, Error or FatalError when the execution of the program is completed, it will store all the types of errors and will display at the end. All these types of errors are stored in the variable “Error”.

All the XSD errors captured in the List of String and can be displayed like this :

ERROR:: at Line: {16} Column: {54} Message: {cvc-pattern-valid: Value 'Ver1.1' is not facet-valid with respect to pattern 'Ver1.0' for type '#AnonType_SchemaVerForm_ITR6'.}ERROR:: at Line: {16} Column: {54} Message: {cvc-type.3.1.3: The value 'Ver1.1' of element 'ITRForm:SchemaVer' is not valid.}ERROR:: at Line: {21} Column: {34} Message: {cvc-complex-type.2.4.a: Invalid content was found starting with element 'ITRForm:PARTA_BSFor6FrmAY13'. One of '{"http://incometaxindiaefiling.gov.in/Corpmaster":PartA_GEN1}' is expected.}ERROR:: at Line: {1548} Column: {32} Message: {cvc-complex-type.2.4.a: Invalid content was found starting with element 'ITRForm:TotSplRateInc'. One of '{"http://incometaxindiaefiling.gov.in/Corpmaster":SplCodeRateTax}' is expected.}ERROR:: at Line: {1723} Column: {45} Message: {cvc-complex-type.2.4.b: The content of element 'ITRForm:ComputationOfTaxLiability' is not complete. One of '{"http://incometaxindiaefiling.gov.in/Corpmaster":AggregateTaxInterestLiability}' is expected.}ERROR:: at Line: {1731} Column: {38} Message: {cvc-complex-type.2.4.a: Invalid content was found starting with element 'ITRForm:BankAccountDtls'. One of '{"http://incometaxindiaefiling.gov.in/Corpmaster":RefundDue}' is expected.}ERROR:: at Line: {1737} Column: {39} Message: {cvc-complex-type.2.4.b: The content of element 'ITRForm:BankAccountDtls' is not complete. One of '{"http://incometaxindiaefiling.gov.in/Corpmaster":BankDtlsFlag}' is expected.}ERROR:: at Line: {1739} Column: {25} Message: {cvc-complex-type.2.4.b: The content of element 'ITRForm:PartB_TTI' is not complete. One of '{"http://incometaxindiaefiling.gov.in/Corpmaster":AssetOutsideIndiaFlg}' is expected.}

The above example is the list of all errors which are encountered during the validation of the XML file against an external Schema.

I hope this blog might be useful who are trying to validate an XML using external Schema and trying to capture all the XSD validation errors all at once instead of doing one by one.
Thank you!

--

--