Working with XML in Scala

XML is a semistructured data which allows us to put a data inside a schema. I had a task recently where I needed to transform XML file based on certain rules.

I enjoy writing Scala very much and thought to learn how to achieve this task using Scala. To demonstrate we will look at some tasks that we would like to perform on XML data

Querying XML with Scala

Before we dive into some common tasks, it is required that we know to query XML using Scala. The only things we need to remember are \ and \\

It’s better to look at them in action.

scala> val person =
| <person>
| <firstName>John</firstName>
| <lastName>Doe</lastName>
| <emails>
| <email type=”primary”>john.doe@noone.com</email>
| <email type=”secondary”>john.doe@noone.com</email>
| </emails>
| <address>
| <street>595 Market Street</street>
| <city>San Francisco</city>
| <zip>94105</zip>
| </address>
| </person>
person: scala.xml.Elem =
<person>
<firstName>John</firstName>
<lastName>Doe</lastName>
<emails>
<email type=”primary”>john.doe@noone.com</email>
<email type=”secondary”>john.doe@noone.com</email>
</emails>
<address>
<street>595 Market Street</street>
<city>San Francisco</city>
<zip>94105</zip>
</address>
</person>
scala>

To find elements, we use \

scala> person \ “firstName”
res4: scala.xml.NodeSeq = NodeSeq(<firstName>John</firstName>)

As you can see we get the element back we expected. Let us try another one

scala> person \ “email”
res5: scala.xml.NodeSeq = NodeSeq()

Wait, What? Why didn’t we get back the email, we have two emails. Turns out searching using \ does one level search and not deep inside our XML structure. To do deep search in the XML tree, we will use \\. Let’s try?

scala> person \\ “email”
res6: scala.xml.NodeSeq = NodeSeq(<email type=”primary”>john.doe@noone.com</email>, <email type=”secondary”>john.doe@noone.com</email>)

Aha! and we get them back. So remember the difference between \ and \\.

Let’s search more - How about if we want to search by attribute name? For example, in our person structure, if we want to get the value for type, we do

scala> person \\ “@type
res10: scala.xml.NodeSeq = NodeSeq(primary, secondary)

So we saw that we used @ in the search. Yes, if we want to search by attributes, we need to add @ in our search text.

We could also perform pattern matching on the XML to find what we need

person match {
case Elem(prefix, label, attributes, scope, child@_*) => println("Found Person!"); person
case _ => println("could not find node"); person1
}

Or you could use Variable Binding Pattern and do

person match {
case e: Elem => println(s"${e.label}"); e
case _ => println("could not find node"); person1
}

Now that we have some information about how to query XML, let us see some common tasks and how to perform them using Scala

Task 1: Add a node given a path

Consider the following data

val continents =
<continents>
<continent>Africa</continent>
<continent>Antarctica</continent>
<continent>Europe</continent>
<continent>North America</continent>
<continent>Australia</continent>
<continent>South America</continent>
</continents>

To add a new <continent>Asia</continent> node, we can do the following

def addNode(to: Node, newNode: Node) = to match {
case Elem(prefix, label, attributes, scope, child@_*) => Elem(prefix, label, attributes, scope, child ++ newNode: _*)
case _ => println(“could not find node”); to
}

To test this, we will do

val asia = <continent>Asia</continent>
val continentsOnAdd = addNode(continents, asia)

Task 2: Delete a node given a path and value

Considering the same example, let us say we want to delete the continent asia, to do that we will do the following

def deleteNodes(n: Elem, f: (Node) => Boolean) =
n.child.foldLeft(NodeSeq.Empty)((acc, elem) => if (f(elem)) acc else acc ++ elem)

def deleteNodesWithValue(n: Elem, value: String): Elem = {
//val children = n.child.foldLeft(NodeSeq.Empty)((acc, elem) => if (acc.text == value) acc else acc ++ elem)
val children = deleteNodes(n, (elem) => elem.text == value)
n.copy(child = children)
}

and then we can delete it and verify it as

val continentsOnDelete = deleteNodesWithValue(continents, asia.text)
continentsOnDelete == continents

Task 3: Delete a node given a path and attribute value

Let us consider a different XML data

val menu =
<menu>
<dish spicy="high">Pad Thai</dish>
<dish spicy="medium">Pasta</dish>
<dish spicy="light">Burrito</dish>
<dish spicy="high">Green Curry</dish>
</menu>

In order to delete a node with a certain attribute value, we could do

def deleteNodesWithAttributeValue(n: Elem, value: String) = {
//val children = n.child.foldLeft(NodeSeq.Empty)((acc, elem) => if((elem \ "@spicy").text == value) acc else acc ++ elem)
val children = deleteNodes(n, (elem) => (elem \ "@spicy").text == value)
n.copy(child = children)
}

deleteNodesWithAttributeValue(menu, "high")

As you can see it is using deleteNodes function created in Task2.

Task 4: Delete a node given a path

Consider the following example

val person =
<person>
<firstName>John</firstName>
<lastName>Doe</lastName>
<email>john.doe@noone.com</email>
<city>SomeWhere</city>
</person>

Lets us say we want to delete <city> node from person. The way we could perform this task is

def deleteCityFromPerson = person.copy(child = deleteNodes(person, (elem) => elem.label == "city"))
deleteCityFromPerson

As you can see it is using deleteNodes function created in Task2.

Task 5: Add attribute for node with a path

Consider that now we want to add email node to person XML data. To perform this, we can do

def addAttributeToNode(n: Elem, attribute: Attribute) = n % attribute
val attributeAdded = addAttributeToNode(<email>none@one.com</email>, Attribute(None, "primary", Text("true"), Null))

As we can see that in all these examples, we created functions and passed the XML data to them along with what we need from them. This works, but it turns out that with Scala we can do better by making use of RuleTransformer. In the next post, I will take show how using RuleTransformers we can perform these transformations easily.