Monday, September 26, 2011

XML & HTML Slurping with Groovy

XML Slurping String Content:
GPathResult feed = new XmlSlurper().parseText(source)
def rows = feed.FORM[0].TABLE[0].TR[1..-1]
// ...

Note: element names are case-sensitive

HTML Slurping with TagSoup:
slurper = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser())
url = new URL("http://...")
url.withReader { reader ->
    def html = slurper.parse(reader)
    // ...
}

HTML Slurping with NekoHtml:
slurper = new XmlSlurper(new org.cyberneko.html.parsers.SAXParser())
new URL(url).withReader { reader ->
    def html = slurper.parse(reader)
    // ...
}