Michael Kay When building a string in a variable and you want to
avoid document node creation it's preferable to use item()+ than
xs:string+ as that allows the merging of adjacent text nodes before
atomization, creating a sequence of one item therefore bypassing the
separator issue? Mike Kay responds, logical as ever, with
No, if you want to build a string in a variable then you should define the
type of the variable as xs:string. You can either do:
<xsl:variable name="foo" as="xs:string">
<xsl:value-of>
<xsl:text>abc</xsl:text><xsl:value-of select="'def'"/>
</xsl:value-of>
</xsl:variable>
or
<xsl:variable name="foo" as="xs:string">
<xsl:sequence select="concat('abc', 'def')"/>
</xsl:variable>
or of course
<xsl:variable name="foo" select="concat('abc', 'def')"/>
Personally I prefer to work entirely with strings. If you don't need text
nodes, don't create them.
David Carlisle offers the same advice, with
If you want to build a (sequence of) string(s) in a variable
then you have to use xs:string+ as otherwise you will get other things
in your sequence (like nodes). Sometimes you don't need to worry about
the difference between a string and a text node, but somethimes you do
(as you've demonstrated). In general the distinction is far more
important in 2.0 than 1.0. So, if you want a sequence of strings use xsd:string and use
separator="" on a value-of if you don't want the spaces.
If you want a sequence of nodes and/or strings use item()+
(but personally if I knew there was a possibility of adding an unwanted
space I'd just always add separator="" rather than run through the six
point simple content construct in my head every time to see if the
spaces wouldn't be added)
From Andrews earlier query, Michael Kay justifies
the WG decision: > Indeed, the key here is that text nodes get merged together at stage 2
> before the atomization at stage 3 - which is why I'm still
> confused if I
> specify xs:string+ instead of item()+:
>
> <xsl:variable name="foo" as="xs:string+">
> <xsl:text/>abc<xsl:sequence select="'def'"/>
> </xsl:variable>
>
> <xsl:variable name="foo2" as="xs:string+">
> <xsl:text/>abc<xsl:value-of select="'def'"/>
> </xsl:variable>
>
>
> <xsl:value-of select="$foo" separator=","/>
>
> Gives ,abc,def
>
> <xsl:value-of select="$foo2" separator=","/>
>
> Gives ,abc,def
>
> I would have expected the same behaviour as specifying item()+ as
> atomization occurs after the merging of the text nodes in $foo2
>
> This suggests that by specifying xs:string Saxon is jumping
> the gun and
> converting the text nodes (zero length text nodes as well -
> the leading
> comma) to strings before stage 2.
Indeed, it's converting text nodes to strings before it even starts the
"constructing simple content" process. The overall process here is:
1. Construct the value of the variable 1a. evaluate the sequence constructor, producing a sequence of text nodes
(foo2)
or a sequence containing a mixture of text nodes and strings (foo) 1b. apply the "function conversion rules" to convert the result of 1a
to the required type (xs:string+). This causes text nodes to be
individually atomized to strings 2. xsl:value-of select="$foo" then invokes the "constructing simple content"
rules to convert a sequence of text nodes and/or strings to a single text
nodes.
In both cases the input is a sequence of strings, so the rules for
joining
adjacent text nodes don't kick in.
It's certainly true that this whole business is going to generate a lot of
questions. We've tried to design the rules so that they are (a) backwards
compatible, and (b) do the "right" thing in common cases. The downside of
this is that the rules are quite complicated, and when they don't do the
obvious thing, it's quite tricky to work out why. |