# Strings

1. Quoting of special characters within XPath string
2. URL decomposition
3. Replacing a string by another which is found by reference in the same XML document
4. Apostrophe problems
5. Quotes in a string
6. splitting lines at n characters
7. String split into elements
8. Replacing newline with break.
9. CRLF to BR
10. Trim or pad a string to a fixed length
11. Computing string-length of nodesets
12. Substituting substrings in an element's text for HTML output
13. Substring-after, last occurrence
14. lastIndexOf('char')
15. count the number of specific characters in a string.
16. Reverse a string
17. ASCII to Hex conversion
18. Testing for an empty string.
19. Last occurrence of a string
20. String to numbers
21. String padding
22. Single quote in select expression
23. Count the number of tokens in a string
24. How to do multiple string replacements
25. How to count words
26. Split a line into fixed lengths
27. Access to individual characters in a string
28. How concatenate the contents of similar elements
29. Strip non Alpha-numeric characters
30. Word highlighting
 1. Quoting of special characters within XPath string David Carlisle> How do I have to escape > quotes to get a well-formed XPath string?First just consider the xpath syntax. You can use " or ' to delimit a string literal, so if you only want one then you can delimit with the other. "'" or '"'if you want both then you can not do it directly in a string literal but you can construct the string '" using translate('a"','a',"'") or concat("'",'"')or if you drop out of xpath, to xslt '"then use $x as this result tree fragment will coerce to a string. Then you need to get one of those expressions into an XML attribute If you use " to delimit the attribute value then you need to quote " so you end up with select="translate('a"','a',"'")"which looks a bit odd but the XML parser eats that and gives the xpath system translate('a"','a',"'") which takes the string a" and replaces the a by '. 2. URL decomposition Juliane HarbarthCan you think of a way to decompose a URL? I want to take something like: http://www.agilic.com/purchase.htm and end up with purchase.htm. ...You can do this by using a recursive named template, e.g. and call it somehow like : Jens Lautenbacher completes the picture withThis template get's a filename and gives back the directory part of it. Giving back the filename part is achieved along the same lineYou call it somehow like foo/bar/baz.xml 3. Replacing a string by another which is found by reference in the same XML document Jeni TennisonQ expansion >My source XML contains something like: > Name="MyClass" > Uuid="80A2B3BD-0000-520C-383BE4980006BE67" > TargetRef="80A2B3BD-0000-520C-383BE4980006A75A"> > > >Name="MyObject" >Uuid="80A2B3BD-0000-520C-383BE4980006B687"> > > > >I have an XSL style sheet to display it on IE5 with apropriate style. I >would like to get as output: > >Class: >Name="MyClass" >TargetRef="MyObject" > >Where the TargetRef string has been replaced by the value of the string >with the same Uuid. This seems to me to be a good instance to use xsl:key to identify the nodes that are uniquely identified through the 'Uuid' attribute. First, set up the key:* name - a name for the key, anything you like * match - an XPath matching the nodes that you want to identify * use - an XPath (relative to the 'match' node) that identifies the node In your case:Note that I haven't named the (element) nodes that are identified by the key because it isn't clear to me whether your 'Class' and 'AnotherObject' elements are indicative of a whole range of possible element names in your input, but we can guarantee at least that they will have a 'Uuid' attribute if they're worth identifying! Then you can access a particular node through its 'Uuid' attribute using the key() function, so try: Class: Name="" TargetRef="" 4. Apostrophe problems David Carlisle> In each case it acts like there are missing end parenthesis, no doubt > because the odd number of single quotes act like one quote. No (unless you are using Xpath2, which I suspect isn't the case) All three of the things you tried are equivalent to ''' (enity and character references are expanded before Xpath is parsed) so are an empty string followed by a trailing ' which is a ayntax error. The FAQ has entries on this but basically, start from the Xpath you need and worry about xml quoting later.You want the string literal "'" (you have to use " to delimit a string literal involving a ' in Xpath 1) so you want the XPathstring-length(string-before($myValue,"'"))>0(or equivalently starts-with($myValue,"'") ) As far as XML is concerned, the above may as well be jhgkjg"nbfjh' just some random string involving " and ' to get that into an attribute you always have two choices delimit with " and quote any " or delimit with ' and quote any ' sotest="string-length(substring-before($myValue,"'"))>0" ortest='string-length(substring-before($myValue,"'"))>0' If you are using Xpath2 then a doubled ' (or ") counts as a single ' in a string literal so you could then dotest="string-length(substring-before($myValue,''''))>0" which avoids using xml quotes at all, although I'm not sure it's any more readable.Incidentally In the file is it properly represented with an internal entity. The XML looks like Broker's ZIP You could of course use ' rather than '; in the source although that wouldn't change the XSLT required. 5. Quotes in a string Mike BrownHow do you put ' or " into a string?' " How do you test for the presence of ' or " in a string? ... Jeni Tennison rounds out the discussion.. Good question! XML defines entities for ' and " (&apos; and &quot;, somewhat unsurprisingly). In certain situations, it is possible to use these. Your first example, for instance, could also be given as: When this is parsed by the XML parser, the value of the 'select' attribute is set to (no extra quotes included): '"Hello, world!"' When the XSLT Processor sees this, it recognises the external quotes as designating a string value, and so sets the variable $foo to the string (no extra quotes included): "Hello, world!" The thing to remember is that you are escaping the " and ' *for the XML parser* and not for the XSLT processor. So your second example: >How do you test for the presence of ' or " in a string? > > > > ... > can be escaped as: ... As there are no unescaped "s within the attribute value, the XML parser can parse this and emerges with the value of the 'test' attribute as: contains($foo, "'") or contains($foo, '"')The XSLT processor can again recognise that "'" designates a string with the value of the single character ' and that '"' designates a string with the value of the single character ". Similarly, if you wanted single quotes rather than double quotes around your Hello, World!, then you should do: (-> "'Hello, world!'" att. value -> 'Hello, world!' string) [Or, alternatively: (-> "'Hello, world!'" att. value -> 'Hello, world!' string)] So, for fairly simple situations like this, it is enough to use the normal XML escaping to get the XSLT processor to see something that it can understand. However, difficulties arise when the quote nesting goes deeper than this. For example, if you wanted to see whether a string contains the string (no extra quotes): "You're here" There is no way to wrap quotes around that string, and no way that I know of within XSLT/XPath to escape internal quotes like this (the XSLT processor is not an XML parser - it won't detect and recognise "/' itself). In these cases, your method, using variables, is the only solution. (BTW, I'd personally declare the variables as: so that they are set as strings rather than result tree fragments.) 6. splitting lines at n characters Paul Tchistopolskii To me that was not rudimentary ( taking into account that there could be exotic situations when some long word is really > 60, so space-scanning could fail ). Anyway.The snippet below takes into account the whitespace not splitting the word in the middle. It looks back to the first space to prevent that. if it fails - it just splits. tune-width does it. 30 Input: 123456 2345 343434 545454 43434 343 12345 343434 545454 43434 343 32345645 343434 545454 43434 343 3422222225 343434 545454 43434 343 llllllllllllllllllllllooooooooooooooonnnnnnnnnnnggggggggg 345 343434 545454 43434 343 Output: 123456 2345 343434 545454 43434 343 12345 343434 545454 43434 343 32345645 343434 545454 43434 343 3422222225 343434 545454 43434 343 lllllllllllllllllllllloooooooo ooooooonnnnnnnnnnnggggggggg 345 343434 545454 43434 343 7. String split into elements Jarno Elovirta I had need to split out an element content into cross references 5,6,7 to produce 8. Replacing newline with break. Norman Walsh These templates insert " "s between the lines of an address. Adapt at will :-) Test input some text on multiple lines which is required to be split up into verbatim lines, for html output Stylesheet. 9. CRLF to BR Jarno Elovirta test.xml This is where the actual article starts. The article contains several paragraphs. Paragaraphs are separated by Carriage returns or Linefeeds, not by Tags test.xsl Output. This is where the actual article starts. The article contains several paragraphs. Paragaraphs are separated by Carriage returns or Linefeeds, not by Tags 10. Trim or pad a string to a fixed length Wendell Piez>Is there away to force string length? For example, my >output contains a string that must contain 10 characters but the >XML source is not guaranteed to supply that number of characters. In order to trim a long string to 10, or pad a short one with spaces: substring(concat(string(.), ' '), 1, 10) 11. Computing string-length of nodesets David Carlisle probably does what you want. > So is there some way to construct a equivalent of sum(), but one that works > on string values of a nodeset? simple cases you can get by as above, but usually you have to use a node-set extension function for this sort of thing (until xslt 1.1) for instance if you wanted to apply normalize-space to each of your nodes in the node set before computing your average, you'd do something like 12. Substituting substrings in an element's text for HTML output Steve Muench | I have an element that looks like this: | | | this is the first line | this is the second line | this is the third line | | | I'd like to transform it so that it could be outputted to html with | the same line breaks, i.e. change all \n to Here are a couple of templates I use for this purpose. The "br-replace" replaces carriage returns with tags. The "sp-replace" replaces *pairs* of leading spaces with *pairs* of leading non-breaking spaces. By combining the two in series, you can achieve the affect of keeping code listings (e.g. XML or Java source code examples in a book) properly formatted without using the tag which tends to mess up the formatting of table cells (often pushing them wider than you'd like). To use the templates, add them (or import them) into the stylesheet you're building, then at the right moment, just do: Here are the templates... 13. Substring-after, last occurrence Paul Brown> I read about the substring-after()-function, which can extract a substring > after the FIRST occurance of a specified string. > > My problem: Have you got an idea, how I can get the substring-after of a > string after the LAST occurance of a specified substring?? > > For example: I want the substring after the last ".". > > String: "A.B.C.0.1.1.hgk" > > The solution should return: "hgk" The answer is recursion: 14. lastIndexOf('char') Jeni Tennison > i'm looking for a xslt method to identify the last iteration of a > char into a string. For example, to extract automatically the name > of the html page into the url. > > string : "h ttp://www.thesite.com/directory1/dir2/dir3../pageindex.htm" > > there are the functions substrings-before() et substring-after(), > but they work on the first occurence of the marker-string. Is there > a Xslt function which gives the last occurence of a marker-string > (like lastIndexOf('/',"string")) into a string? No, there isn't. You can achieve what you want through recursion. Walk through the string, taking bits off the front of it until you get to a string which has no '/' in it whatsoever. To get the filename of a URL held in the URL child of the current node, you can call this template like: It's pretty verbose, but I'm afraid that's the only way to do it in XSLT at the moment. 15. count the number of specific characters in a string. Michael Kay> If I can somehow count the number of periods "." in > the string "id", then I can determine what level the > element is at...anyone have any ideas how to > count the number of occurences in a string with XSL? Try:string-length($x) - string-length(translate($x, '.', '')) 16. Reverse a string Jeni Tennison Dimitre Novatchev offers I compared times from the three templates on a 800MHz 128Mb RAM Pentium, running each test 10 times, averaging the times reported by MSXML run from the command line, and rounding to the nearest millisecond. Here are the results:Length Simple Least Recursive Tail Recursive ------------------------------------------------------------------ 100 22 36 5 200 41 61 11 400 95 124 24 800 241 249 77 1600 650 485 220 3200 3465 975 1369The tail recursive template is always substantially faster than the simple algorithm, but it suffers from the same problem in the end - the time taken increases exponentially rather than linearly based on the length of the string, so for really long strings the least recursive algorithm works best. I haven't taken detailed timings, but there's a similar pattern in Saxon (although Saxon bugs out with the simple algorithm and long strings, I guess a stack overflow). A processor that doesn't optimise tail recursion would probably have similar performance from both the simple and tail-recursive templates. 17. ASCII to Hex conversion Mike Brown > Is there a function that can convert ASCII coded characters to > ASCII coded hex data. Using pure XSLT, and assuming you really meant ASCII (characters 32-127), here is a demonstration of a way to do it: !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~   0123456789ABCDEF                                                                                                                                                  1">                                                       The output is    I have $1,001. 49 20 68 61 76 65 20 24 31 2C 30 30 31 2E 18. Testing for an empty string. Mike Brown > <xsl:if test="string-length()='0'"> '0' when quoted like that is a string, and string-length() returns a number, so this will result in extra overhead as types are converted. Unquote the 0 so you are comparing a number to a number. Better yet, just test for string() or normalize-space() -- the result will be an empty string if the string is empty, and an empty string evaluates to false. 19. Last occurrence of a string Jeni Tennison > Is there a function in XSLT which obtains the position of the last > occurrence of a string within another string? No, there isn't. To get the substring after the last occurrence of a string, you need a recursive template, for example: <xsl:template name="substring-after-last"> <xsl:param name="string" /> <xsl:param name="delimiter" /> <xsl:choose> <xsl:when test="contains($string, $delimiter)"> <xsl:call-template name="substring-after-last"> <xsl:with-param name="string" select="substring-after($string, $delimiter)" /> <xsl:with-param name="delimiter" select="$delimiter" /> </xsl:call-template> </xsl:when> <xsl:otherwise><xsl:value-of select="$string" /></xsl:otherwise> </xsl:choose> </xsl:template> For example to get the extension of a file from its path you could use: <xsl:call-template name="substring-after-last"> <xsl:with-param name="string" select="$file" /> <xsl:with-param name="delimiter" select="'.'" /> </xsl:call-template> Getting the last index would be slightly more complicated, and it looks like you want to use it to get hold of the string after the last occurrence of something, so hopefully this is sufficient. [Note that the XQuery/XPath operators document contains an ends-with() function so it's possible that in XPath 2.0 if you wanted to *test* the value of the string after the last '.' to see if it was 'xml' then you could do:    ends-with($file, '.xml')but that's just speculation at the moment.] 20. String to numbers Jeni TennisonXPath can only convert a string to a number if it uses a '.' as a decimal separator and doesn't have any other non-decimal, non-whitespace characters (so input numbers can't have grouping separators). So the results of number() are *always*: 12.5 => 12.5 foo => NaN 12,5 => NaN 1,234.5 => NaN The point of the xsl:decimal-format element is solely to interpret the format pattern string that you use in format-number(). By default it uses '.' for decimal points and ',' as the grouping separator so: format-number(1234.5, '#,##0.00') => '1,234.5'But you can override the default decimal format so that you can use a different decimal point and grouping separator:If you want to use that decimal format, you have to change the format pattern in the format-number() function: format-number(1234.5, '#.##0,00') => '1.234,5'If you want to chop and change between different numerical formats in different parts of the stylesheet, you should create named xsl:decimal-formats for each of the different formats you want to use: Then you can use different formats in different places: format-number(1234.5, '#,##0.00', 'US') => '1,234.5' format-number(1234.5, '#.##0,00', 'German') => '1.234,5' format-number(1234.5, "#'##0,00", 'AltGerman') => "1'234,5" 21. String padding Mike Kay > I would like to output this as: > > 2002-05-02 ... This is the title ................ 18 > 2002-05-01 ... This is the second title ......... 12 > 2002-05-01 ... This is the third title .......... 5 > > I've been unable to find a simple way in XSL to do the padding that > I want. substring(concat(' ... ', title, ' ...................................'), 1, 25)) (or whatever the correct number is).Dimitre adds:Do have a look at: the archives.One can also use the string analog of the iter() function from FXSL in order to build string of repeated patterns dynamically. Your case seems quite simple -- you could use a static global xsl:variable containing a known in advance maximum of dots and manipulate it using the substring() function. 22. Single quote in select expression Jeni Tennison > I have the following in my stylesheet that gives me errors: > > select="'NRC Research Press: Revue du génie et de la science de > l'environnement'" /> > > Why is this not working? How to make it work? When this gets read by an XML parser, it reports the value of the select attribute to the XSLT (or rather XPath) processor to be: 'NRC Research Press: Revue du génie et de la science de l'environnement' In other words, it's the XML parser's job to resolve the ' character reference into the ' character; that's already been done by the time it gets to the XPath processor. The XPath processor then tries to parse that value as an XPath. It pairs the first ' with the second ' and then tries to parse the part after the second ' as being an operator on that string, which is why you get the error. The solution is to make sure that the XPath processor gets a string that it can parse as an XPath, and since your string has a ' in it, that means using " to delimit the string rather than ': "NRC Research Press: Revue du génie et de la science de l'environnement" Then you can move that into the XML either by delimiting the attribute value with double quotes and escaping the double quotes in the XPath: or by delimiting the attribute value by single quotes and escaping the single quote in the XPath: 23. Count the number of tokens in a string Oleg Tkachenko > Is there an easy way to count > the number of commas in a string? Like this: > > What about string-length($str) - string-length(translate($str,',','')) 24. How to do multiple string replacements Jeni Tennison Here is *a* way to do it. I'm not sure that it's the most efficient, but it's the principal that counts, and the principal is to XMLise the information about what you want to find and replace, or the characters that need to be escaped.So, make up a namespace for this information and include it in your file. For the replacements that you're using, I created two sets of elements: _ %$ { } & ± $\pm$ ° $\degree$ © \copyright $\mathbb{P}$ This separates out the data about the replacements that you want to make (the what). Now you want to specify the procedure about how to do those replacements (the how). I've called your existing templates to actually do the replacement, and focussed on identifying what you want to exchange. First, then, the 'escape_special_characters' template. Basically, you want to first replace the characters on the $input_text, then replace the strings on the output from that: The two templates for replacing the characters and replacing the strings are much the same, so I'll only go through the one replacing the characters in detail. ... First, we need to declare a couple of parameters that we're going to use. One is the text that we need to replace and the other is one to keep track of where we are in the set of replacements that we need to make. I've done this second using an index number, defaulting it to 1 as the initial value. 1 Then we need to create the new string, with the replacements made. We do this by calling your put_slash_in_front_of template, with the$input_text that we already have and the $special_char that is identified by the index number. We get at the character by getting the nth foo:char within the current document (the stylesheet), i.e. document('')//foo:char[$char]. Now the recursive part. If we haven't got to the end of the list of characters that need to be escaped, then we have to move on to the next one, calling this same template with the next index, and with the text that we've created (i.e. that's already been escaped). If we've run out of foo:char, then we just return the escaped text. And that's it. The other one in full is: 1 This is tested in SAXON and gives the same results as your original approach, but it is much easier to extend. You could probably apply templates to the foo:char and foo:search nodes as an alternative approach - the important thing is to separate out the what from the how, not the how you do the how :) 25. How to count words Michael Kay You can count the words without recursive processing:$x := normalize-space(@attr)$y := translate(@attr, ' ', '') $wc := string-length($x) - string-length($y) +1 In XPath 2.0 of course you can have sequences of strings or numbers which makes this kind of thing very much easier. 26. Split a line into fixed lengths Jarno Elovirt > Given the following xml fragment > > Len_5Len_5Length_8Length__9Length__10Len__6L_3Le_4L_3 > .. > > I want to create the following output > > Len_5 Len_5 Length_8Length__9 Length__10Len__6 L_3 Le_4 L_3 > > I other words , I want to regroup my items into fixed length of lines. Dimitre already gave you a solution that used a LINE FEED to break the lines, but if you need the line elements, e.g. will get you there. Though, you can tokenize the output from Dimitre's stylesheet to get the line element and add padding. Anyhow. 27. Access to individual characters in a string David Carlisle > How do I access each char of the string 'copy99' ? copy99[1], copy99[2]... ? substring($copy99,5,1) is the 5th character of the string 28. How concatenate the contents of similar elements Wendell Piez >
>

bold="on"> > 1.(1)Clause 8 (1) (a) > of the > More > Money for All Amendment > is deleted and the > following substituted: >

> ... >
... >May I have suggestions as to how to concatenate content of the three into >one, say, 'para' tag? or will get you what you want, since the value of a node is the concatenated string value of all its descendants. 29. Strip non Alpha-numeric characters Mukul Gandhi > I was wondering if there is any way possible of stripping any  > non-alphanumeric characters from an attribute. ie keep anything that  > is A-Z/0-9 and strip all other characters like ",*-+. etc etc? for e.g. when it is applied to XML - it produces output - 30. Word highlighting Alexander Johannesen >I'm trying to highlight a specific word or phrase in the text of > a document. I've got a word highlighter (it only bolds stuff, but you can put in whatever you need) that is case insensitive and unicode compatible (using str:to-lower function from the xsltsl project [http://xsltsl.sourceforge.net/string.html] : replace this with your own lowercaser transformation if you like).Input is $text (your full text) and$what (what to highlight ; terms, words, etc.) ;