Programming in Scala [Chapter 26] — Working with XML
Introduction
This is part of my personal reading summary of the book — Programming in Scala. Take into account, we use Scala 2 since it is in industry and has vastly more resources available than Scala 3.
This chapter introduces Scala’s XML support, covering basics like creating, saving, and manipulating XML nodes using query methods and pattern matching. It provides a brief but practical starting point for working with XML in Scala through the following sections:
- Semi-structured data
- XML overview
- XML literals
- Serialization
- Taking XML apart
- Deserialization
- Loading and saving
- Pattern matching on XML
1. Semi-structured data
XML is a widely used form of semi-structured data, organizing information into a tree-like structure. While less structured than programming language objects, it’s more organized than plain strings. Semi-structured data, like XML, simplifies serialization for tasks such as file storage or network transmission. Scala provides robust XML processing support, including construction, manipulation through regular methods, and pattern matching. As XML’s popularity grows, so does support across operating systems and programming languages, making it a valuable skill for software engineers.
2. XML overview
XML consists of text and tags, where tags are enclosed in angle brackets. Tags can be start tags, such as <pod>
, or end tags, such as </pod>
. Start and end tags must match, similar to parentheses in programming. Nesting is essential in XML, where elements are structured within each other. Empty elements can be represented using a shorthand notation with a slash after the tag's label, such as <peas/>
. Tags can also have attributes written as name-value pairs, enclosed in double or single quotes, like <pod peas="3" strings="true"/>
.
Here are some code snippets illustrating these concepts:
<!-- Start and end tags -->
<pod>Text content</pod>
<!-- Nesting elements -->
<pod>
Text content
<nested>Inner content</nested>
</pod>
<!-- Empty element shorthand notation -->
<pod/>
<!-- Tags with attributes -->
<pod peas="3" strings="true"/>
3. XML literals
Scala enables XML literals, allowing XML to be directly typed as expressions. You can start with a start tag and continue writing XML content until the compiler encounters the matching end tag. The result is of type Elem
, representing an XML element with children. Other important XML classes include Node
for all XML nodes and Text
for text nodes. NodeSeq
holds a sequence of nodes, and you can switch between XML and Scala code within curly braces {}
. Expressions inside braces can evaluate to XML nodes or any Scala value, which will be converted to a string and inserted as a text node. XML literals prevent issues like unintended tag inclusion compared to low-level string operations.
// XML literal example
val xmlElem = <a>
This is some XML.
Here is a tag: <atag/>
</a>
// Evaluating Scala code within XML literal
val yearMade = 1955
val xmlElem2 = <a> { if (yearMade < 2000) <old>{yearMade}</old> else xml.NodeSeq.Empty } </a>
// Evaluating Scala code producing non-XML values within XML literal
val xmlElem3 = <a> {3 + 4} </a>
// Escaping characters in text nodes
val xmlElem4 = <a> {"</a>potential security hole<a>"} </a>
4. Serialization
Serialization in Scala using XML literals allows for easy conversion from internal data structures to XML. By adding a toXML
method utilizing XML literals and brace escapes, instances of a class can be converted to XML effortlessly. For instance, in a catalog of vintage Coca-Cola thermometers, an abstract class CCTherm
with various attributes can be serialized into XML. This method creates an XML representation of the object, facilitating data interchange and storage.
Here’s an example demonstrating serialization:
// Abstract class representing Coca-Cola thermometers
abstract class CCTherm {
val description: String
val yearMade: Int
val dateObtained: String
val bookPrice: Int // in US cents
val purchasePrice: Int // in US cents
val condition: Int // 1 to 10
// Convert instance to XML
def toXML =
<cctherm>
<description>{description}</description>
<yearMade>{yearMade}</yearMade>
<dateObtained>{dateObtained}</dateObtained>
<bookPrice>{bookPrice}</bookPrice>
<purchasePrice>{purchasePrice}</purchasePrice>
<condition>{condition}</condition>
</cctherm>
}
// Example usage
val therm = new CCTherm {
val description = "hot dog #5"
val yearMade = 1952
val dateObtained = "March 14, 2006"
val bookPrice = 2199
val purchasePrice = 500
val condition = 9
}
// Serialize to XML
val xmlRepresentation = therm.toXML
5. Taking XML apart
Scala provides methods for extracting information from XML easily, based on the XPath language. These methods allow for text extraction, sub-element extraction, and attribute extraction without the need for external tools.
// Extracting text from XML
val textContent = <a>Sounds <tag/> good</a>.text
// Output: "Sounds good"
// Decoding encoded characters automatically
val decodedText = <a> input ---> output </a>.text
// Output: " input ---> output"
// Extracting sub-elements by tag name
val subElement = <a><b><c>hello</c></b></a> \ "b"
// Output: <b><c>hello</c></b>
// Deep search for sub-sub-elements
val deepSearchResult = <a><b><c>hello</c></b></a> \\ "c"
// Output: <c>hello</c>
// Extracting attributes
val employee = <employee name="Joe" rank="code monkey" serial="123"/>
val nameAttribute = employee \ "@name"
// Output: Joe
val serialAttribute = employee \ "@serial"
// Output: 123
6. Deserialization
Deserialization in Scala involves parsing XML back into internal data structures using methods for extracting information from XML nodes. For example, you can define a method fromXML
to parse a CCTherm
instance from an XML node:
// Deserialize CCTherm instance from XML node
def fromXML(node: scala.xml.Node): CCTherm =
new CCTherm {
val description = (node \ "description").text
val yearMade = (node \ "yearMade").text.toInt
val dateObtained = (node \ "dateObtained").text
val bookPrice = (node \ "bookPrice").text.toInt
val purchasePrice = (node \ "purchasePrice").text.toInt
val condition = (node \ "condition").text.toInt
}
// Example XML node representing a CCTherm instance
val node = therm.toXML
// Deserialize CCTherm instance from XML node
val deserializedTherm = fromXML(node)
// Output: CCTherm instance representing "hot dog #5"
7. Loading and saving
Loading and saving XML data in Scala involves converting between XML and streams of bytes. Library routines like XML.saveFull
handle this conversion effortlessly.
To save XML to a file with proper encoding and XML declaration, use XML.saveFull
:
// Save XML to file with specified encoding and XML declaration
scala.xml.XML.saveFull("therm1.xml", node, "UTF-8", true, null)
To load XML from a file, simply call XML.loadFile
with the file name:
// Load XML from file
val loadnode = xml.XML.loadFile("therm1.xml")
These methods simplify the process of loading and saving XML data, ensuring proper encoding and XML structure.
8. Pattern matching on XML
Pattern matching on XML in Scala allows for flexible handling of XML structures, especially when the structure may vary or include whitespace. By using XML patterns within pattern matching, you can sift through different XML structures effectively.
Here’s an example demonstrating pattern matching on XML:
// Pattern matching function for processing XML nodes
def proc(node: scala.xml.Node): String =
node match {
case <a>{contents}</a> => "It's an a: "+ contents
case <b>{contents}</b> => "It's a b: "+ contents
case _ => "It's something else."
}
// Example usage
val result1 = proc(<a>apple</a>)
// Output: "It's an a: apple"
val result2 = proc(<b>banana</b>)
// Output: "It's a b: banana"
val result3 = proc(<c>cherry</c>)
// Output: "It's something else."
To match a sequence of sub-elements instead of a single sub-element, use _ *
pattern:
// Updated pattern matching function for processing XML nodes with sequence of sub-elements
def proc(node: scala.xml.Node): String =
node match {
case <a>{contents @ _*}</a> => "It's an a: "+ contents
case <b>{contents @ _*}</b> => "It's a b: "+ contents
case _ => "It's something else."
}
// Example usage
val result4 = proc(<a>a <em>red</em> apple</a>)
// Output: "It's an a: ArrayBuffer(a , <em>red</em>, apple)"
val result5 = proc(<a/>)
// Output: "It's an a: Array()"
Pattern matching on XML can also be used in conjunction with for expressions to iterate through XML nodes while ignoring whitespace:
// Example iterating through XML nodes with for expression
catalog match {
case <catalog>{therms @ _*}</catalog> =>
for (therm @ <cctherm>{_*}</cctherm> <- therms)
println("processing: "+ (therm \ "description").text)
}
Concluding Thoughts
In this chapter, we explored Scala’s robust support for XML, covering various aspects from basic concepts to advanced techniques. Scala’s XML support allows for seamless serialization, deserialization, loading, saving, and manipulation of XML data. With XML literals, serialization becomes straightforward, while pattern matching facilitates flexible handling of XML structures. The ability to extract text, sub-elements, and attributes from XML nodes simplifies data processing tasks. Furthermore, Scala’s integration with XML enables efficient loading and saving of XML data.
With this, we conclude the twenty-sixth chapter of this series. I hope it adds value to people who are interested in learning Scala.
Please don’t hesitate to contact me on LinkedIn with any comments or feedback.
Other chapters in this series can be found in this reading list.
Resources:
Odersky, M., Spoon, L., & Venners, B. (2008). Programming in scala. Artima Inc.