2008-01-23T15:02:08Z
Dave Pawson.
link
Home
XML parse error
I responded to a question on xml-dev recently, addressing the editing issue of finding a parse error in an xml file. I've edited a file, I save it to disk and start processing it. Oops, application X which uses the xml first parses it and finds an error. Find the error, correct it and move on? No? No. Line 63, column 24 is nice in a compiler and seems to have come over into the world of XML where line numbers, though possible, are simply impractical in many cases. The worst case is the editor designed by the know it all. They bugger up the layout and write to disk as a single line of text. Line 63? You can have line 0 or 1, but nothing more useful than that.
As Mike Kay put it,
there's a reason Saxon doesn't give you column numbers, which is that the column number reported by the SAX parser is usually not very meaningful. Both the line number and column number typically reflect the position of the ">" at the end of the start tag. You're therefore very likely to be reporting an error at column 68 when the actual error is at column 15, and I've generally taken the view that it's better to report nothing at all than to get it quite so badly wrong.
My personal favourite in terms of hard to find errors is a wrongly encoded character, which seems to phase parsers something rotten. The more typical one is, say bad markup. There some parsers seem to provide iffy locations in the source file. So it's a bit of detective work to hunt down the error from the information given.
When Schematron first came on the scene it hurt my head just understanding what was happening. The two stage process meant that the output errors needed referring back to the source file. David Carlisle experimented with Schematron and produced this form of output, validating to the W3C accessibility guidelines IIRC. What I liked about this was the clarity of the output. The error may have been detected elsewhere, but the 'error message' was loud and clear, referring to a known point in the input XML. IMHO this makes for a good lesson in error reporting. The xpath expression is used to locate the error source as described here by David and Rick.
Why can't the xpath expression gained from the parser can't be used to generate something for a browser, against the XML source? Seems so logical to me. The xml-dev thread briefly mentions what is done in some editors, but this form of solution is so generally applicabe it could be used by any editor - refreshing the browser as the edit/parse/view loop is repeated.
Keywords: xml
Comments (View)Return to main index