To display a list of software releases with ", " delimited that are mentioned in XML as ";" delimited:
it's simpler than that, in 2.0 you can do
<xsl:value-of select="replace(@software, ';', ', ')"/>
These days I had fun with an f:binSearch() function and then, logically, with f:spell().
I have a dictionary of about 47000 English wordforms, on which I search with f:binSearch()
I had to produce a faster fn than the current quadratical str-split-to-words template -- this is the f:getWords() function.
All these functions can be downloaded from the FXSL CVS at Sourceforge
The combination of these functions works quite well.
This transformation (test-FuncSpell.xsl):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:f="http://fxsl.sf.net/" exclude-result-prefixes="f xs" > <xsl:import href="../f/func-getWords.xsl"/> <xsl:import href="../f/func-spell.xsl"/> <xsl:output omit-xml-declaration="yes"/> <xsl:variable name="vDelim" as="xs:string"> ,—:.-	 !?;</xsl:variable> <!-- Space, Coma, mdash, Colon, Dot, Dash, Tab, NL, CR, Exclamation mark, Question mark, Semicolon --> <!-- To be applied on ../data/othello.xml --> <xsl:template match="/"> <xsl:variable name="vwordNodes" as="element()*"> <xsl:for-each select="//text()/lower-case(.)"> <xsl:sequence select="f:getWords(., $vDelim, 1)"/> </xsl:for-each> </xsl:variable> <xsl:variable name="vUnique" as="xs:string+"> <xsl:perform-sort select="distinct-values($vwordNodes)"> <xsl:sort select="."/> </xsl:perform-sort> </xsl:variable> <xsl:variable name="vnotFound" as="xs:string*" select="$vUnique[not(f:spell(.))]"/> <xsl:value-of separator="
" select="$vnotFound"/> A total of <xsl:value-of select="count($vwordNodes)"/> words were spelt, (<xsl:value-of select="count($vUnique)"/>) distinct. <xsl:value-of select="count($vnotFound)"/> not found. </xsl:template> </xsl:stylesheet>
when applied on othello.xml (around 29000 words)
produces this result:
<a-list-of-567-unknown-words-ommitted/> A total of 28622 words were spelt, (3669) distinct. 567 not found.
So, checking 3669 distinct words in 7015 milliseconds makes
The actual speed is faster, as the total time includes splitting up the words and finding the distinct words.
Among the unknown words are such nice words as:
affordeth affrighted ariseth arithmetician arrivance bethink betimes bewhored
Breaking string into substrings
If you don't need to do different things for different delimiters, then:
The XPath2 tokenize function allows the delimiter to be specified by a regular expression, so in that case you can just specify whatever you want, eg ^[a-zA-Z]+ for any run of non (ascii) letters being a delimiter.
Unwanted spaces in built strings
When building a string in a variable and you want to avoid document node creation it's preferable to use item()+ than xs:string+ as that allows the merging of adjacent text nodes before atomization, creating a sequence of one item therefore bypassing the separator issue?
Mike Kay responds, logical as ever, with
No, if you want to build a string in a variable then you should define the type of the variable as xs:string.
You can either do:
<xsl:variable name="foo" as="xs:string"> <xsl:value-of> <xsl:text>abc</xsl:text><xsl:value-of select="'def'"/> </xsl:value-of> </xsl:variable>
<xsl:variable name="foo" as="xs:string"> <xsl:sequence select="concat('abc', 'def')"/> </xsl:variable>
or of course
<xsl:variable name="foo" select="concat('abc', 'def')"/>
Personally I prefer to work entirely with strings. If you don't need text nodes, don't create them.
David Carlisle offers the same advice, with
If you want to build a (sequence of) string(s) in a variable then you have to use xs:string+ as otherwise you will get other things in your sequence (like nodes). Sometimes you don't need to worry about the difference between a string and a text node, but somethimes you do (as you've demonstrated). In general the distinction is far more important in 2.0 than 1.0.
So, if you want a sequence of strings use xsd:string and use separator="" on a value-of if you don't want the spaces.
If you want a sequence of nodes and/or strings use item()+ (but personally if I knew there was a possibility of adding an unwanted space I'd just always add separator="" rather than run through the six point simple content construct in my head every time to see if the spaces wouldn't be added)
From Andrews earlier query, Michael Kay justifies the WG decision:
> Indeed, the key here is that text nodes get merged together at stage 2
Indeed, it's converting text nodes to strings before it even starts the "constructing simple content" process. The overall process here is:
1. Construct the value of the variable
1a. evaluate the sequence constructor, producing a sequence of text nodes (foo2) or a sequence containing a mixture of text nodes and strings (foo)
1b. apply the "function conversion rules" to convert the result of 1a to the required type (xs:string+). This causes text nodes to be individually atomized to strings
2. xsl:value-of select="$foo" then invokes the "constructing simple content" rules to convert a sequence of text nodes and/or strings to a single text nodes. In both cases the input is a sequence of strings, so the rules for joining adjacent text nodes don't kick in.
It's certainly true that this whole business is going to generate a lot of questions. We've tried to design the rules so that they are (a) backwards compatible, and (b) do the "right" thing in common cases. The downside of this is that the rules are quite complicated, and when they don't do the obvious thing, it's quite tricky to work out why.
value-of with separator
if the doc is
<x> <a>one<!-- here -->two</a> <a>three</a> <a>four</a> </x>
and the current node is x then
element()/text() will select four text nodes with values "one" "two" "three" "four" so <xsl:value-of select="element()/text()" separator=", "/> will generate one text node with value "one, two, three, four"
* will select three element nodes, each with name a and with string values "onetwo" "three" "four" so
<xsl:value-of select="*" separator=", "/>
will generate one text node with string value "onetwo", "three", "four"
>Isn't it that adjacent text nodes are being merged before the
no the separator is added between the string value of each item in the sequence, resulting in a string that is then used to generate a single text node, this node may merge with other text nodes generated under the same parent, but that happens after separators are added.
Entities are expanded by the parser _before_ XSLT starts, so XSLT sees the same input whether you use the entity reference, or just use the character directly, or if you use a numeric character reference (which doesn't need to be declared).
So if your keyboard or editor allows you just to type a c-cedila character then you can just do that (if your editor uses iso-8859-1 you'd need to say your xsl file was in that encoding by putting <?xml version="1.0" encoding="iso-8859-1"?> at the top, or you can use the numeric reference & # x e 7 ;
> Do i have to nest replace-functions for each of them in one
in the most general case of course, nesting replace() calls is always an option but here I suspect that you can do something like
<xsl:analayze-space select="." regex="\^([0-9]+)"> <xsl:matching-substring> <xsl:value-of select="$replacements[number(regex-group(1))]"/> <xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring>
together with something that makes $replacements a sequemce with a c-cedila in position 12 eg
<xsl:variable name="replacements" select="( 'a','b',...,'&#e7;', '&') "/>
The first is technically an "entity reference", the second is a "character reference". The only difference is that entities have to be declared in the DTD, except for the five built-in ones. Numeric character references can be used without needing a declaration.
I use Bob duCharme's "Annotated XML Specification" - very useful because it gives the actual text of the specification, then Bob's explanation of what it really means.
As well as DC's solution, another approach is to have a table of replacements:
<xsl:variable name="mods" as="element(mod)*"> <mod from="\^12" to="ç"/> <mod from="\^13" to="&"/> </xsl:variable>
and run through them with a recursive function:
<xsl:function name="f:multi-replace" as="xs:string"> <xsl:param name="in" as="xs:string"/> <xsl:param name="mods" as="element(mod)*"/> <xsl:choose> <xsl:when test="$mods"> <xsl:sequence select="f:multi-replace( replace($in, $mods/@from, $mods/@to), subsequence($mods, 2))"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="$in"/> </xsl:otherwise> </ </
Conditional display of parameters
<xsl:value-of select="string-join( ('Reps'[$showreplicates=1], 'MetaData'[$showMetadata=1], 'Signatures'[$showElectronicSignature=1]), ', ')"/>
Restrict a string to n words
In XSLT 2.0, that's
tokenize($in, '\W')[position() = 1 to $n]
where $in is your input string and $n is the number of words.
It's a fair bit harder in XSLT 1.0 (most things are).
First n words of a text nodes
In 2.0, assuming the <a> element is the context node, the two sequences are given by
subsequence(tokenize(b/following-sibling::text(), '\s'), 1, 5)
reverse(subsequence(reverse(tokenize(b/preceding-sibling::text(), '\s')), 1,5))
String match, multiple targets
some $x in ('Hamburg', 'Koblenz', 'xxx') satisfies contains($d/ris:organ/text(), $x)
<xsl:if test="some $x in ('Hamburg', 'Koblenz', 'xxx') satisfies contains($d/ris:organ/text(), $x)">
Florent Georges adds
And the OP almost certainly wants the following instead:
some $x in ('Hamburg', 'Koblenz', 'xxx') satisfies contains($d/ris:organ, $x)
Quotes in XST 2.0
in xpath1 it is impossible to have a string literal with both quotes.
that's not much hardship in xslt2 as, as you say, you can use <xsl:variable name="x">'"'"'"</xsl:variable> form but it does cause inconvience in other contexts,
xpath2 allows you to quote the character used to delimit the string by doubling it so you can have a string literal such as "'""" which is the two character string '"
Mike Kay expands on this...
And of course you can escape the attribute delimiter using an XML entity reference.
<xsl:if test="$x = 'He said, "I can''t"'">
In 1.0 you can't have a string literal containing both single and double quotes. Use concat:
<xsl:variable name="quot">"</xsl:variable> <xsl:variable name="apos">'</xsl:variable> <xsl:if test="$x = concat('He said, ', $quot, 'I can', $apos, 't', $quot)">