XML Basics

Note- Most of the examples are obtained from http://www.ibm.com/ and those are presented in a simplest way.


XML stands for Extensible Markup Language, it was created by the World Wide Web Consortium (W3C) to overcome the limitations of HTML. XML is a self-descriptive language in which you can store and transport data.

Sample XML

There are three common terms used to describe parts of an XML document: tags, elements, and attributes.

A tag is the text between the left angle bracket ( <) and the right angle bracket ( >) ,an element is the starting tag, the ending tag, and everything in between and an attribute is a name-value pair inside the starting tag of an element

XML document rules

The XML specification requires a parser to reject any XML document that doesn’t follow the basic rules

  1. Must contain one root element

above XML document contains two root elements, XML parser is required to reject this document,regardless of the information

2. XML elements can’t overlap

If you begin a <i> element inside a <b> element, you have to end it there as well, this rule is not followed in the above example.

3. End tags are required

4. Elements are case sensitive

5. Attributes must have values and must be enclosed within quotation marks

XML declaration

XML declaration that provides basic information about the document to the parser. It is recommended, but not required. If there is one, it must be the first thing in the document.

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>

Other things in XML documents

<!-- comment needs to be placed like this-->

2. Processing instructions (PI)

<!-- Here's a PI for Cocoon: -->
<?cocoon-process type="sql"?>

Processing happens according to the defined type. Here, the type=”sql” attribute tells that the XML document contains a SQL statement.

3. Entities

<!-- Here's an entity: -->
<!ENTITY dw "developerWorks">

if anywhere the XML processor finds the string &dw;, it replaces the entity with the string developerWorks.

Predefined entities in XML

  • &lt; for the less-than sign
  • &gt; for the greater-than sign
  • &quot; for a double-quote
  • &apos; for a single quote (or apostrophe)
  • &amp; for an ampersand.

4. Namespaces

To use a namespace, define a namespace prefix and map it to a particular string

Namespace prefixes are addr, books .defining a namespace for a particular element means that all of its child elements belong to the same namespace. The first <title> element belongs to the addr namespace because its parent element, <addr:Name>. The string in a namespace definition is just a string, You could define xmlns:addr=”mike” but it has to be unique.

Document Type Definition(DTD)

A DTD defines the elements that can appear in an XML document,the order in which they can appear, how they can be nested inside each other. It allows specifying the basic structure of the document.

Defining attributes

XML schemas(XSD)

In XML schemas we define how XML documents look like

XML Schema(XSD) vs DTD

Both are two ways of describing the structure and content of an XML document, DTD is the older. XSD has namespace awareness which removes the ambiguity that can result in having certain elements and attributes from multiple XML vocabularies. XSD can be programmatically processed just like any XML and it is strongly typed.

We can define many Elements in XSD, <xsd:sequence> element defines the sequence of elements that are contained in each.

Define Element contents

Like what you read? Give Prakhash Sivakumar a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.