xsl output problems
Can I parameterise encoding in xsl:output?
> The encoding attribute of the <xsl:output> element is set as a string. > Mike - is the fact that SAXON doesn't report an error a bug?
If the encoding isn't supported, Saxon (as permitted by the spec) reverts to UTF-8 and puts out a warning message that it is doing so. Unfortunately the encoding as written to the XML declaration or the HTML META element is the encoding that was requested, not the one that was actually used.
Which may cause browser confusion!
XML to ASCII Conversion
Use the output element in your top level, and make sure you use the newest iteration of XT: <?xml version='1.0'?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Tranform" version="1.0"> <xsl:output method="text"/> <xsl:output method="text" indent="yes"> <xsl:template match='/'> <xsl:apply-templates select="yourTemplate"/> </xsl:template>
With fuller example from Mike Brown.
I want to translate the following...
<orderlist> <order ordernum="1"> <customer> <firstname>John</firstname> <lastname>Doe</lastname> <phone>(510) 555-1212</phone> </customer </order> <order ordernum="2"> <customer> <firstname>Jane</firstname> <lastname>Smith</lastname> <phone>(916) 555-1212</phone> </customer </order> </orderlist>
into a tab-delimited ascii file for import into QuickBooks like this...
firstname 	lastname	 phone John 	 Doe 	 (510) 555-1212 Jane 	 Smith 	 (916) 555-1212
Here you go, with some comments to explain:
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <!-- if using an implementation of the current XSLT WD --> <xsl:output method="text"/> <!-- execute this template if the element name is 'orderlist' --> <xsl:template match="orderlist"> <!-- emit column headers with tabs and a newline --> <xsl:text>firstname	lastname 	phone
</xsl:text> <!-- process elements named 'customer' that are children of elements named 'order' that are children of the current node --> <xsl:for-each select="order/customer"> <!-- select as a node-set the elements named 'firstname' that are children of the current node, and emit the concatenation of text nodes contained within the first node in that node-set --> <xsl:value-of select="firstname"/> <xsl:text>	</xsl:text> <xsl:value-of select="lastname"/> <xsl:text>	</xsl:text> <xsl:value-of select="phone"/> <xsl:text>
</xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet>
No closing tag on html output method
When you use HTML output method, closing tags aren't generated for HTML tags that are 'empty by definition' (including <input>).
When the top-level output tag is <html>, you get HTML output method by default.
You can override the default by saying:
In other words: it's a feature, not a bug.
Mike Kay adds:
If your complaint (see later messages) is that with XML output it's generating <input/> rather than <input></input>, then the answer is that you can't influence this; you shouldn't need to, because they are 100% equivalent.
Mark Hayes Adds: If you are REALLY need to force a </input> tag to appear, add a preserve-space element at the top level for input, and some blank text inside the <input> output element:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml"/> <xsl:preserve-space elements="input"/> <!-- *** here *** --> <xsl:template match="/"> <html> <body> <xsl:element name="input"> <xsl:attribute name="type">text</xsl:attribute> <xsl:attribute name="name"> <xsl:value-of select="@name"/> </xsl:attribute> <xsl:text> </xsl:text> <!-- *** here *** --> </xsl:element> </body> </html> </xsl:template> </xsl:stylesheet>
RTFs and Node sets
Result Tree Fragment. Not a pretty name, and the abbreviation RTF is unfortunate, but we have to live with it.
When the body of an <xsl:variable> element is evaluated (or "instantiated" to use the correct jargon), the result is written to an RTF. There are only three things you can do with an RTF: you can use xsl:copy-of to copy it to the result tree (or to another RTF), you can convert it implicitly or explicitly to a string, and you can pass it to a function. There aren't any standard functions that process RTFs, so in practice this means an extension function.
SAXON and xt both provide extension functions to convert an RTF to a node-set. This conversion can't be done implicitly. The reason your xsl:for-each fails is that the expression in the select attribute must yield a node-set. Nothing else will do, in particular, it cannot be an RTF.
David Carlisle adds:
A node set is what you get back from a select expression so select="aaa[@xxx]|aaa[bbb]"
gives you the set of all elements with name aaa and either a xxx attribute or a bbb child. Note this is a set not a list. If some aaa element has both xxx attribute and bbb child, you only get it once. The set is however ordered (in document order, normally)
A node set is what you can apply templates to
ie it's the relevant part of the input document (or some secondary input document via the docyument() function)
A result tree fragment is what you produce in a template. You can save it in a variable and while it has similar structure to a node set (it's a bunch of XML nodes) it is essentially opaque to XSL You can not apply templates to it or interrogate its structure. The only thing you can do is use xsl:copy-of to put the value of the variable holding the result tree fragment into the result tree at some point.
xt and saxon (at least) have an extension function that converts result tree fragments to node sets.
> <xsl:for-each select="$members">
members holds the result tree fragment, so you can't select into it.
You could use
Mike Brown adds:
You can identify *any combination* of unique nodes from different places in the source tree, using an XPath expression that selects the ones you want. Those nodes are a "node set". They don't have to form a hierarchy or anything.
You can create a new hierarchy of nodes (or multiple hierarchies that are siblings of each other), using various XSLT instructions and/or literal result elements. Those nodes are a "result tree fragment". They're branches of a tree.
So a result tree fragment *is* a set of nodes. It's just not a "node set"
How to create positional ASCII text layout
It is perfectly feasible to create XSL outputs which not only do not add a single additional space, but which also strip various kinds of space from the output.
The key techniques are:
percent sign in output attributes
Q expansion: when I put this in XSL: <a href="foo.cgi?formula=xml%2Bxsl&result=html">...</a> It turns into this in HTML (Saxon 5.1, IBM's XML parser): <a href="foo.cgi?formula=xml%252Bxsl&result=html">...</a> Notice that the intended "%2B" got escaped to "%252B".
well the spec says
so the point is that _you_ shouldn't be % escaping stuff, just put the character directly in the url, XSL will do the escaping for you. As it is it is escaping your %.
Closure - transformation and output separation
It comes down to pipelining, or closure.
A property of XSLT is that the input data model is the same as the output data model. The operations in XSLT take trees as input and produce trees as output. The language is "closed" over the data model. The benefit of this is composability: any two transformations can be combined to produce a larger transformation. Hence pipelines.
Serialization should be separate because it breaks away from the data model and produces something different: its output is a different kind of thing from its input. Only by keeping serialization separate from transformation do you preserve the closure property of the transformation language, and hence its composability.
Dynamic output method?
Even in XSLT 1.0 there's a way to get either
method="xml" or method="html"
If a value for the "method" attribute is not explicitly specified and the generated top element by the transformation is with local-name "html", then the method used for serialisation will be "html", otherwise it will be "xml"
From the XSLT 1.0 specification (http://w3.org/TR/xslt#output):
"The default for the method attribute is chosen as follows. If the root node of the result tree has an element child, the expanded-name of the first element child of the root node (i.e. the document element) of the result tree has local part html (in any combination of upper and lower case) and a null namespace URI, and
any text nodes preceding the first element child of the root node of the result tree contain only whitespace characters,
then the default output method is html; otherwise, the default output method is xml. The default output method should be used if there are no xsl:output elements or if none of the xsl:output elements specifies a value for the method attribute"