Saturday, October 22, 2011

Groovy XML Parser Gotcha - it's not a Map your parsing!

When I first looked at examples of using the Groovy XmlParser it appeared that the parser took some nice raw XML and turned it into a map for easy traversing. I could take the following source:

<person>
<face>
<eyes>
blue
</eyes>
</face>
</person>

And parse it find the eye color like this:

def xml = new XmlParser().parseText(xmlSrc)
assert xml.person.face.eyes.text() == 'blue'


Confident in my understanding, I thought it would be trivial to handle the following XML source I was receiving from a domain registrar:

<?xml version="1.0"?>
<interface-response>
<ErrCount> 1</ErrCount >
<errors>
&ly;Err1> Do main name not available </Err1>
</errors>
</interface-response>

I parsed the source in order to extract the errors from the response as follows:

def xml = new XmlParser().parseText(xmlSrc)
def errMsg = "Errors:"
xml.errors.each { err ->
errMsg += "\n ${err.text()}"
}


This made perfect sense to me, but I wasn't getting the expected output in my error message. The problem was my "understanding". I had not grasped just what the
"xml.person.face.eyes" groovy magic was actually being done behind the scenes. If the parser had created a Map representation of my xml, my logic would have been right, but it wasn't. While being able to write xml.errors or xml.person.face.eyes might give the impression you can bring the ubiquitous map paradigm here it is a "Faux Amis" (a false friend as we say here in France). The associative dot notation is actually translated into XPath, and then this XPath query is run against your XML. So while sometimes the XPath can be equivalent to retrieving data from a Map often times it is not. Now here is code that follows the XPath paradigm (and it works!):

def xml = new XmlParser().parseText(xmlSrc)
def errMsg = "Errors:"
xml.errors.'*'.each { err ->
errMsg += "\n ${err.text()}"
}