Translate for hex values
Is it possible to specify a hexadecimal value (00-ff) as the third argument to the translate function?
Well, you can't refer to bytes, per se, because XPath and XSLT functions operate on the data model prescribed in the XPath spec. This model does not deal with bytes; it deals with nodes, and (following the reference
Well, you can't refer to bytes, per se, because XPath and XSLT functions operate on the data model prescribed in the XPath spec. This model does not deal with bytes; it deals with nodes, and (following the reference to the XML Infoset spec) at a lower level, with Unicode/UCS characters, regardless of how they are represented when serialized as a byte stream.
Since an XML parser will resolve character references before the stylesheet tree is built, you should be able to put any Unicode characters in the arguments to the translate() function. For example, translate($someString,'abcde','ABCDE') is the same as ($someString,'abcde','ABCDE').
If your goal is to output bytes that represent characters from the Windows CP-1252 (or whatever) character set, you'll need to determine what the proper Unicode/UCS code points are for those characters, and then rely on your serialization mechanism to encode the characters as the bytes you want. It may take some experimentation depending on your particular situation.
David C adds:
> Something special about null character?
yes it's not allowed in XML. But then neither is character 1 (not checking for disallowed characters is a known feature of the IE parser, if I recall)
the XML char production is:
<prod id="NT-Char"><lhs>Char</lhs> <rhs>#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]</rhs> <com>any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.</com> </prod>
In other words nothing in the ascii control range except tab, line feed and newline.
A function which detected whether any of the characters in a string occurred in another string. As often with XSL, a few moments thinking brought the realization that there is a way to do it:
<xsl:if test=". = translate(., '.,:;','')>
Instead of testing the string to see if it contains any characters from the other string directly, you use translate() to strip out any occurrences of the query-set from (a copy of) the original string, and then compare it to the original to see if they're the same. If they're not, you know the string contains characters from the translated-out set. More complicated to describe than use, probably.
You can extend this principle to test, for example, if a string ends with a punctuation mark:
<?xml version='1.0'?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:template match="/"> <xsl:call-template name="endswithpunct"> <xsl:with-param name="str" select="'abc.'"/> </xsl:call-template> <xsl:call-template name="endswithpunct"> <xsl:with-param name="str" select="'abc'"/> </xsl:call-template> </xsl:template> <xsl:template name="endswithpunct"> <xsl:param name="str"/> <xsl:value-of select="$str"/><xsl:text> does </xsl:text> <xsl:if test="string-length(translate(substring($str,string-length($str)), ',.:;?!','')) != 0"> <xsl:text>not </xsl:text> </xsl:if> <xsl:text>end with punctuation mark. </xsl:text> </xsl:template> </xsl:stylesheet>
How to remove all non-alphanumerics?
There's a trick to this:
translate($x, translate($x, 'abcde', ''), '')
will remove all characters except a,b,c,d, and e from your string $x.
something like this would work I havent quite got the regexp right here but it should show the idea, which is construct a regexp that matches any escape sequence then once you find one, look it up in a key constructed by the mapping table to see what the replacement is.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";> <xsl:output encoding="US-ASCII"/> <xsl:variable name="map" select="doc('pvl_mappings.xml')"/> <xsl:key name="map" match="mapping" use="troff"/> <xsl:template match="*"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates/> </xsl:copy> </xsl:template> <xsl:template match="text()"> <xsl:analyze-string select="." regex="[a-zA-Z&"~]|\\\*?\(..|\\\*\(K\\\(wi"> <xsl:matching-substring> <xsl:value-of select="(key('map',.,$map)/unicode,.)"/> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template> </xsl:stylesheet>
The OP asks
> There are two matches here: \(?s and \(?c . When my <xsl:choose> finds
Mike Kay responds
The xsl:matching-substring instruction is executed once for each match. So it's executed once to process \(?s, and once to process \(?c. In the first case, the first xsl:when fires. In the second case, the second xsl:when fires.