This is an attempt to (partially) explain regular expressions (mostly abbreviated to regex, sorry Wendell) in the XSLT domain. New in XSLT 2.0, I have found them a natural companion to XSLT; sufficiently so that I went out and bought the O'Reilly regex book (Friedl), since the W3C does nothing in terms of explaining regex use within the recommendation.
Another option. This file is a testbed I used whilst playing. XSLT based, generates the html output you see on this link.
As a test piece, I'm going to use a repetitious input file, and filter it to death to try and make (some) use of most of the regex capabilities in XSLT. My input file is plain text, the result of doing a full directory listing, on a windows box and a Linux box.
Firstly then to read in the file, in a usable (XML) format.
<xsl:variable name="source1" select="'regex.txt'"/>
<xsl:variable name="encoding" select="'iso-8859-1'"/>
<xsl:variable name="src">
<xsl:value-of select="unparsed-text($source1,$encoding)"/>
</xsl:variable>
This does it as a single line, but I'll go a stage further, I want XML.
<xsl:variable name="source1" select="'regex.txt'"/>
<xsl:variable name="encoding" select="'iso-8859-1'"/>
<xsl:variable name="src">
<doc>
<xsl:for-each select="tokenize(unparsed-text($source1,$encoding), '\n'">
<line><xsl:value-of select="."/></line>
</xsl:for-each>
</doc>
</xsl:variable>
The result of this is that I have a variable $src, holding an xml document, in memory, which has a line element holding one line of my input text file. If you want to see some results, add
<xsl:result-document format = "xmlFormat" href = "src1.xml"> <xsl:copy-of select="$src"/> </xsl:result-document>
The output file isn't quite what I wanted. The tokenization (sorry about the spelling, W3C rec, not me) is based on using \n, whereas the file I'm using, being MSDOS based, has \r\n as its line ending. Change the splitter token in the tokenize statement to \r?\n and its much cleaner. I'll assume you've done that, and the output file line endings have changed from
<line> Volume in drive F is fdisk
</line> to <line> Volume in drive F is fdisk</line>
For those who didn't know, Linux and windows have different line endings. This windows file uses \r (carriage return, 
) followed by \n (newline, 
)
So now we have some XML to play with. An examination of the XML shows a few lines that I really don't want. I'll remove these using templates, each with a predicate. The full list is:
The remainder are wanted. So now I'm going to apply-templates selecting the variable $src/docs/*. See the source stylesheet for that. Now we can move on to processing the <line>'s of interest.
In order to make the output readable in this (HTML) file, I'm going to present the output of the transform as a table, with various pieces of information within it. Each entry will be row.
Now on to the regular expression use. The two places where it is most useful are in the <xsl:analyze-string> element and in the matches($input,'$regex') or replace($input,$regex,$replacement) . The former is a black and white take a string apart using regex, the latter two are more aligned to detailed usage here. I'll take this last statement apart a little before moving on.
If the input form is regular, with a number of alternative patterns within a line, then the analyze-string functionality is ideal. For example, given input lines such as
(date) (withdrawn) (deposited) (who)
12/04/2002 $1202.23 \t\t Dave
18/04/2002 \t $120.50 Mary
(note that I have marked TAB characters as \t)
Then the regularity of this pattern makes an ideal input for a line anaysis using the xsl:analyze-string functionality, since each 'entry' can form a group, which may be identified and extracted for markup or further processing. Four columns, (two present or replaced by the tab character), each of a regular format.
Given that some of the input data for this exercise aligns with this regularity, and another (the filenames) is far less regular, then a different approach is more appropriate, based on using <xsl:choose>, with the test attribute using the match function.
Firstly then the <xsl:analyze-string usage. The form used here to abstract the date is used within the template matchine line:
<td>
<xsl:analyze-string
regex="
^[0-9]{{2}}
/
[0-9]{{2}}
/
[0-9]{{4}}
\p{{Zs}}+"
flags="ix"
select=".">
<xsl:matching-substring>
<xsl:value-of select="regex-group(0)"/>
</xsl:matching-substring>
<!-- See the source for the non-matching-substring content -->
</xsl:analyze-string>
</td>
The text presented is a line from the input variable
<line>26/06/2004 15:26 884 version2.xml</line>
The general form of this piece of xslt is to analyze the string, then process it in one of two ways, either within the <xsl:matching-substring> child, selecting one or more matches, or to process (or not) text which fails to match, within the <xsl:non-matching-substring> child. The example shown includes the non-matching-substring to catch out rougue elements of input (I introduced an 'invalid' lines at the bottom of the source file for this purpose. More later.
Looking at the regular expression itself then. Starting at the end... (OK OK), the select attribute says the input text is the current element, the context node, the contents of a file element. Next the flags attribute (W3C) selected are the i value, implying that I want a case-insenstive match. Unused here, but worthy of note. The x value says ignore whitespace within the expression. That allows me to lay out the regex so that it makes more sense to read, and I can explain it more easily. Generally useful, unless you are explictly matching whitespace, in which case you make other arrangements! Finally, before tackling the rege itself, a note about the braces {{}}. Since XSLT chose to use the brace as an attribute value template (AVT), then the working group decided to escape the brace using double braces, hence to get the brace through the XSLT engine to the regular expression engine, two braces are used. Simple? No it isn't, is it? Now let's take apart the regex.
^[0-9]{{2}} The carat, ^, is a metacharacter, that is, its meaning is special. Interpret it as 'start of line'. Hence the match starts at the first character in the input text. [0-9] is a character class, as apposed to a simple character. Interpret it as any character, between the limits either side of the hyphen, -. In this case any single digit. Similarly, [a-z] may be used to represent any alphabetic character in the given range, (including [A-Z] since the x flag is present). So the first characters we are looking for is, start of line followed by an digit. The {{2}}, read as {2} and it is interpreted as a count for the preceding grouping (a single digit in this case). So now we are looking for start of line, two digits. Moving on to the next item.
/ Just a little easier :-) Next (to make a match) look for a slash character... or SOLIDUS if you want the posh Unicode name. That's it, no more. Just a match on a literal character. So for a full match we need... start of line, two digits and a solidus.
[0-9]{{2}} A repeat (nearly) of the first pattern. Another two digits. As an aside, if you want somewhere between 1 and 6 digits, use [0-9]{{1,6}}. If you want none or 1 digits use [0-9]?, [0-9]* if you want to match 0, 1, 2, ...any number of times. Other combinations of 'quantifiers' are available to specify how many of the preceeding character class you want to match. See W3C for more. The match pattern grows to start of line, two digits, a solidus and two more digits.
/ OK, you get the picture.
[0-9]{{4}} And more of the same; 4 digits this time. The date pattern is now complete, as start of line, two digits, a solidus, two more digits, another solidus, and finally four digits.
\p{{Zs}}+ A bit different. This is stretching into Unicode territory, and I'm less sure of myself, but this one works. The idea is to match a whitespace character. That includes at least the space character  , but also includes such things as non-breaking spaces? The quantifier + states there there must be one, and may be more
I've tried (waiting to see if Tony corrects me!) to show what characters are in what categories, as they are called. Its a separate document, linked from here. Due care please, it is over 1Mbyte in size.
So that completes the pattern. It matches the windows dump of dates and the following whitespace. The next XSLT child is the xsl:matching-substring element
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
</xsl:matching-substring>
</xsl:analyze-string>
This is triggered when the regex matches on the input. In this case, on the date at the start of the line. The value-of expression is of interest. I've chosen regex-group(1) which is the full date upto the white space. When a match occurs, the matching text may be further split by parts of the match, or sub-expressions. Hence regex-group(0) is the entire match. If the regex had been written as
<xsl:analyze-string regex="
^([0-9]{{2}})
/
([0-9]{{2}})
/
([0-9]{{4}})
\p{{Zs}}+"
flags="ix"
select=".">
(Note the three 'sub-expressions' in round brackets). Here, within the matching-substring element, the function regex-group(3) would return the second matching sub-expression, which would be the year value. Another nice feature of regex!
Finishing off with the date then, assuming a match the final output is the date, wrapped in the <td> element. In summary. The template gave us a match on the line element. the <xsl:analyze-string specified this text, and provides the regular expression. Finally the matching-substring and (if needed) the non-matching substring provide access to the results of the match.
Now what? We have a template maching the line element. We have matched content with the regex for the date, and I want to continue to analyse the input string. The <xsl:analyse-string has done its job on the first pattern, the date, but how to get to the rest of the string? A number of options are open. I could extend the regex to cope with the remainder of the line of text, and make heavy use of regex-group() to pick out the patterns and process them. I could start to nest more <xsl:analyze-string inside the xsl:non-matching-substring children of higher <xsl:analyze-string elements. The complexity of this approach begins to mount as the depth increases. I'll extend the match to catch the date and file size. You may note that the file sizes contain a comma to separate the thousands? I want to remove that, to make it work as an integer. The replace() function works nicely for this usage.
Mmm. This seems to be working out nicely! Next the filenames. The regex definition I want to use here is to capture, after the size, everything up to the end of the line! The regex to do this is (.*)$ Its a subexpression, since I want to use it in the output. The . (period sign), represents a single character. the * is another quantifier, meaning zero or more, and the $ means stop at the end of the input text. In summary, everything after the last part of the regex (the whitespace following the file size), and the end of the file element.
So the full regular expression is now:
<xsl:analyze-string regex="
^([0-9]{{2}}
/
[0-9]{{2}}
/
[0-9]{{4}})
\p{{Zs}}+
([0-9]{{2}}:[0-9]{{2}})
\p{{Zs}}+
([0-9,]+)
(.*)$
"
flags="ix"
select=".">
If you can spot the () pairings, you'll see the output cells, from
<xsl:matching-substring>
<td><xsl:value-of select="regex-group(1)"/></td>
<td><xsl:value-of select="regex-group(2)"/></td>
<td><xsl:value-of
select="replace(regex-group(3),',','')"/></td>
<td><xsl:value-of select="regex-group(4)"/></td>
</xsl:matching-substring>
<xsl:non-matching-substring>
<td><b style="color:red">Invalid date format</b></td>
[<xsl:value-of select="substring(.,1,10)"/>]
</xsl:non-matching-substring>
Each regex-group() picks out another wanted piece of the puzzle. The non-matching-substring is used to show up the invalid line. Very handy to show up those strings that don't fit with your specified pattern!
Just to show how the other form of regex can be used, I'm going to process the filenames even further. I want to group them according to the following rules.
I'll color them blue, green, yellow and red; but further xml markup might be more useful for automated processing. Either way, we can use regex to help.
regex-group(4) holds the data of interest, so I'll process this using the <xsl:choose> method above. The syntax for this is
matches($input ,
$pattern ,
$flags )
This provides
<td><xsl:value-of select="replace(regex-group(3),',','')"/></td>
<td>
<xsl:choose>
<xsl:when test="matches(regex-group(4),'[a-z]+.xml$','xi')">
<span
class="blue">
<xsl:value-of select="regex-group(4)"/>
</span>
</xsl:when>
<!-- Add the other cases in as needed,
as well as the otherwise to trap exceptions-->
</xsl:choose>
</td>
</xsl:matching-substring>
The only new regex idea here is the $ added to the end of the expression. Remember the ^ which was the metacharacter for the start of line? This is the metacharacter for the end of line, since a regex of .xml would matched both .xml and .xml~, so the $ says and no more!, i.e. the match must be the last item on the input string.
I'm going to use the flags option, setting the ix flags, so that I can lay the regular expression out clearly. The first grouping is those files with names of the form abc.xml, and since the x flag is in use, the regex is simply [a-z]+.xml I know that some files have digits in the names, but I want to point those out for checking, so I'm using the previous expression rather than including the digits using [a-z0-9]+.xml which would allow for sect21.xml. Oh, OK, lets mark them up the same way, coloured blue. Note that this is optional, not mandatory, hence I've used the regex '[a-z]+[0-9]*.xml$' where the [0-9]* * says its OK to have zero or more digits.
The next group N([0-9]+).xml is easy enough, using a regex of N[0-9]+.xml$ The N is, again, a literal match, as is .xml at the end. Note the use of the brackets round the number? As used earlier, this is a means of accessing specific sub-patterns within the match. The difference is the syntax used to get at them, since there is no regex-group() this time. To access the number, which is the only grouping, use the classic regex access scheme, $1. If there had been a few, these would have been sequentially numbered. This could be to enable the numerically named files to be sorted. I'll put them in italics.
Note that the order of the when clauses is important. This is a choose statement, so once a when clause is matched, no others are tested, so we need to move from the most specific through to the least specific.
The final sought group is specified using '[a-z0-9]+.xml~$' which simply accepts most filenames, and seeks out the tilda on the end. If I knew there might be other than .xml files in this directory, it would be more complex.
This leaves the unwanted ones, which nicely fit into the otherwise clause of the choose. These are coloured red (Yes there is one). This section of the stylesheet now looks like this.
<td>
<xsl:choose>
<xsl:when test="matches(regex-group(4),'N[0-9]+.xml$','xi')">
<i><xsl:value-of
select="replace(regex-group(4),'N([0-9]+).xml','$1')"/>
</i>
<span class="green">
<xsl:value-of select="regex-group(4)"/>
</span>
</xsl:when>
<xsl:when test="matches(regex-group(4),'[a-z0-9]+.xml~$','xi')">
<span class="yellow"><xsl:value-of select="regex-group(4)"/></span>
</xsl:when>
<xsl:when test="matches(regex-group(4),'[a-z]+[0-9]*.xml$','xi')">
<span class="blue"><xsl:value-of select="regex-group(4)"/></span>
</xsl:when>
<xsl:otherwise>
<span class="red"><xsl:value-of select="regex-group(4)"/></span>
</xsl:otherwise>
</xsl:choose>
</td>
That completes this example.
| 27/09/2003 | 12:36 | 4500 | andAndOr.xml |
| 27/09/2003 | 12:36 | 2565 | apply-imports.xml |
| 27/09/2003 | 12:36 | 2054 | applytemplates.xml |
| 22/03/2004 | 15:53 | 16141 | approaches.xml |
| 10/05/2003 | 20:10 | 6858 | ArchForms.xml |
| 18/07/2004 | 12:02 | 48977 | autolayout.xml |
| 15/12/2003 | 16:19 | 5986 | barcodes.xml |
| 29/05/2004 | 12:34 | 25928 | bestpractice.xml |
| 25/04/2004 | 14:00 | 3298 | bib.xml |
| 18/07/2004 | 11:57 | 7474 | bidi.xml |
| 15/02/2004 | 14:24 | 5687 | bidi.xml~ |
| 18/04/2004 | 09:46 | 1954 | bk.xml |
| 25/04/2004 | 11:05 | 8762 | bool.xml |
| 15/07/2001 | 20:06 | 20998 | braille.xml |
| 25/04/2004 | 11:07 | 6764 | break.xml |
| 27/09/2003 | 12:36 | 1710 | casefor.xml |
| 27/09/2003 | 12:36 | 2836 | catalog.xml |
| 25/04/2004 | 11:14 | 17798 | cdata.xml |
| 27/09/2003 | 12:36 | 2116 | checkbox.xml |
| 27/09/2003 | 12:36 | 2957 | colour.xml |
| 28/09/2003 | 18:55 | 6605 | comments.xml |
| 27/09/2003 | 12:36 | 2853 | conform.xml |
| 25/04/2004 | 11:50 | 7979 | conformance.xml |
| 27/09/2003 | 12:36 | 4500 | Copy of andAndOr.xml |
| 27/09/2003 | 12:36 | 2054 | Copy of applytemplates.xml |
| 24/04/2004 | 19:27 | 4230 | csv2xml.xml |
| 18/07/2004 | 11:15 | 19494 | dates.xml |
| 27/09/2003 | 12:36 | 1345 | 10228 N10228.xml |
| 25/04/2004 | 21:56 | 5575 | 1030 N1030.xml |
| 24/01/2004 | 15:50 | 53814 | 10301 N10301.xml |
| 27/09/2003 | 12:36 | 4880 | 10325 N10325.xml |
| 27/09/2003 | 12:36 | 3601 | 10378 N10378.xml |
| 27/09/2003 | 12:36 | 1122 | 10446 N10446.xml |
| 27/09/2003 | 12:36 | 13273 | 10590 N10590.xml |
| 18/04/2004 | 10:41 | 15664 | 138 N138.xml |
| 25/04/2004 | 10:52 | 53987 | 1553 N1553.xml |
| 26/06/2004 | 12:57 | 7444 | 1575 N1575.xml |
| 29/05/2004 | 12:55 | 38050 | 1641 N1641.xml |
| 25/04/2004 | 14:07 | 694 | 1665 N1665.xml |
| 27/09/2003 | 12:36 | 11124 | 169 N169.xml |
| 25/04/2004 | 11:08 | 1718 | 1711 N1711.xml |
| 27/09/2003 | 12:36 | 15949 | 1755 N1755.xml |
| 25/04/2004 | 11:44 | 7957 | 1777 N1777.xml |
| 25/04/2004 | 21:55 | 1334 | 1821 N1821.xml |
| 25/04/2004 | 11:54 | 2314 | 1843 N1843.xml |
| 25/04/2004 | 11:59 | 25464 | 1930 N1930.xml |
| 18/07/2004 | 12:31 | 14961 | xslfaq.xml |
| 18/07/2004 | 12:00 | 14406 | xslfaq.xml~ |
| 27/09/2003 | 12:36 | 3529 | xslinclude.xml |
| 27/09/2003 | 12:36 | 1535 | xsloutput.xml |
| 27/09/2003 | 12:36 | 3307 | xsltuk.xml |
| 27/09/2003 | 12:36 | 2108 | xsltut.xml |
| 27/09/2003 | 12:36 | 3803 | xslvalueof.xml |
| 22/03/2004 | 15:20 | 26548 | xslvocab.xml |
| 22/03/2004 | 15:20 | 26548 | badname.xml.ext |
| Invalid date format [ 01/04/200] |
Starting in the same way, I'll read the file into a variable, tokenize it and then apply-templates to that variables structure. I've created $src2 to hold it, and named the line elements line2 to keep the processing simple. Note that this time, I can tokenize on \n, since this is a Linux listing, which only uses the newline character. Again the stylesheet generates an intermediate listing file, src2.xml should you wish to peruse it.
There is more information provided here, and I only want to focus on a small part of it. The permissions fields, the filename, and the date. In order to do this I first need to extract it.
The goal, as before, is a table, with each line providing
| filename | File type | Owners permissions(read, write, execute) | Date(dd/mm/yyyy) |
So first of all then, to isolate the lines of interest. The code is identicla to the windows example, with just a change to the regex and the selection of groups. Firstly the regex.
<xsl:analyze-string regex="
1 ^([\-drwx]{{10}})
\p{{Zs}}+
2 [0-9]{{1,}}
\p{{Zs}}+
3 [a-z_]+
\p{{Zs}}+
4 [a-z_]+
\p{{Zs}}+
5 [0-9]+
\p{{Zs}}+
6 (
7 [a-z]{{3}}
\p{{Zs}}+
8 [0-9]+
\p{{Zs}}+
9 ([0-9][0-9]:[0-9][0-9] | [0-9]{{4}})
10 )
\p{{Zs}}+
11 (.*)$
"
flags="ixs"
select=".">
Note. The comments are mine. No comments are allowed within a regular expression, which is classed as an Attribute Value Template. Taking each numbered item one at a time. Just remember the class of line we are analysing. A few examples are shown below.
drwx------ 2 dpawson dpawson 4096 Apr 28 08:09 .adobe -rw-rw-r-- 1 dpawson dpawson 24 Jul 22 19:18 .aspell.en.prepl -rw-r--r-- 1 dpawson dpawson 24 Oct 28 2003 .bash_logout
1. This matches the file permissions. Any combination of ten characters from the class containing - d r w x .
2. This matches the number of links. The quantifier {{1,}} is interpreted as at least one, maybe more?
3. This matches the user name, though it probably needs expanding significantly to be usable generally.
4. This matches the group
5. The file size is a string of digits
6. A grouping indicator. Needed to collect all the parts of the date.
7. The month. 3 characters.
8. The day of the month, one or two digits
9. Look at the example entries I've chosen. One has a date and time, one has a date and the year? This piece of the regex is grouped using (), then there are two options (signified by the pipe symbol | ). The first matches the time, the second matches the four digits of a year. This catches both cases.
10. Finally close off the date grouping symbol
11. Everything from here to the end of the input is signified by the (.*)$, just as we did before. I have introduced a new flag, the s flag. The rec describes this as
If the s flag is not specified, the metacharacter . matches any character except a newline (#x0A) character. In dot-all mode, the metacharacter . matches any character whatsoever.
So its called the dot-all mode, and it affects the interpretation of the dot metacharacter. Not essential here, simply used to show its use. Now the only flag not mentioned is the m or multi-line flag, which allows matches over a line boundary. I won't be using that in this example.
That concludes the first part. We now have the content marked up ready for presentational sorting.
The approach is identical to the windows version. to use a xsl:choose statement, selecting required content and marking up or modifying (using replace as necessary. The first item to resolve then is the file type, determined by the first character of the permissions block of text, with the following interpretations:
Hence the regex is a choose on the first character of the first regex-group() of the matching-substring of the analyze-string! This cell of the table now looks like this
<td>
<xsl:choose>
<xsl:when test="matches(regex-group(1), '^-[\-rwx]{9}')">
<b>plain</b>
</xsl:when>
<xsl:when test="matches(regex-group(1), '^b[rwx\-]{9}')">
<b>block</b>
</xsl:when>
<xsl:when test="matches(regex-group(1), '^d[rwx\-]{9}')">
<b>dir</b>
</xsl:when>
<xsl:otherwise>
<span class="red">Unknown filetype:
<xsl:value-of select="substring(regex-group(1),1,1)"/></span>
</xsl:otherwise>
</xsl:choose>
/td>
Note that the regex is no longer using double braces {{}}. Don't ask me why. However, I asked for clarification from Mike Kay, and here he expounds. Again I've used the otherwise clauses to trap an error case (and introduced one, the last one, by changing the value to X).
That completes the file type. Now for owner permissions. We need another table cell for this. These permissions are in 3 groups for the owner (of the file), the group permissions, and finally the 'world'. I'll process them for the group, since the other two repeat this process and hence there is no learning. Each set is a combination of rwx-, meaning this entity has read, write, execute or no privileges on this file. Hence the process is to abstract the group, process them and produce appropriate markup. I've used two functions to obtain string values. Alternatives would be equally easy.
That completes this example. Please report any typo's and errors.
| . | dir | [ Read Write Exec] | Aug 4 08:55 |
| .. | dir | [ Read Write Exec] | Jul 11 11:07 |
| .acrobat | dir | [ Read Write Exec] | Jul 21 18:05 |
| .adobe | dir | [ Read Write Exec] | Apr 28 08:09 |
| a.out | plain | [ Read Write Exec] | Apr 26 17:47 |
| .aspell.en.prepl | plain | [ Read Write] | Jul 22 19:18 |
| .aspell.en.pws | plain | [ Read Write] | Jul 22 19:18 |
| .bash_history | plain | [ Read Write] | Aug 2 20:07 |
| .bash_logout | plain | [ Read Write] | Oct 28 2003 |
| .bash_profile | plain | [ Read Write] | Oct 28 2003 |
| .bashrc | plain | [ Read Write Exec] | Jul 9 19:51 |
| bin | dir | [ Read Write Exec] | Jul 28 08:23 |
| bookmarks.html | plain | [ Read Write] | Jul 30 19:12 |
| cd | dir | [ Read Write Exec] | Jul 21 18:33 |
| .cddbslave | dir | [ Read Write Exec] | Jul 11 13:31 |
| common | dir | [ Read Write Exec] | Mar 7 09:58 |
| contacts.ldif | plain | [ Read Write Exec] | Jul 17 12:08 |
| csrc | dir | [ Read Write Exec] | Jun 15 20:04 |
| .cvspass | plain | [ Read Write] | Mar 6 19:29 |
| .cvsrc | plain | [ Read Write] | Mar 6 19:28 |
| Desktop | dir | [ Read Write Exec] | Jul 30 19:38 |
| .dmrc | plain | [ Read Write] | Feb 27 18:39 |
| Document1.txt | plain | [ Read Write Exec] | Jul 9 19:28 |
| dpawson | dir | [ Read Write Exec] | Apr 29 18:32 |
| .emacs | plain | [ Read Write] | Jul 24 12:38 |
| .emacs~ | plain | [ Read Write] | Jul 19 20:59 |
| .emacs.d | dir | [ Read Write Exec] | Feb 29 10:04 |
| .esd_auth | plain | [ Read Write] | Feb 27 18:54 |
| evolution | dir | [ Read Write Exec] | Aug 3 19:21 |
| fedora-docs | dir | [ Read Write Exec] | Jul 28 19:00 |
| fedoranotes.txt | plain | [ Read Write] | Jul 10 20:03 |
| .fetchmailrc | plain | [ Read Write Exec] | Feb 29 11:43 |
| files | dir | [ Read Write Exec] | Aug 1 18:51 |
| .fonts.cache-1 | plain | [ Read Write] | Aug 4 08:55 |
| .forward | plain | [ Read Write] | Mar 2 06:39 |
| .fullcircle.mac.txt | dir | [ Read Write Exec] | Jul 27 13:30 |
| .gaim | dir | [ Read Write Exec] | Aug 3 19:21 |
| .gconf | dir | [ Read Write Exec] | Aug 4 08:47 |
| .gconfd | dir | [ Read Write Exec] | Aug 4 08:55 |
| getopts.py | plain | [ Read Write] | Jul 21 20:23 |
| .gimp-1.2 | dir | [ Read Write Exec] | Jul 23 18:18 |
| .gnome | dir | [ Read Write Exec] | Feb 27 18:39 |
| .gnome2 | dir | [ Read Write Exec] | Aug 3 19:21 |
| .gnome2_private | dir | [ Read Write Exec] | May 19 19:48 |
| .gnupg | dir | [ Read Write Exec] | Jul 14 20:51 |
| .gpilotd | dir | [ Read Write Exec] | Jul 27 18:20 |
| .gpilotd.pid | plain | [ Read Write] | Jul 27 18:20 |
| .gstreamer | dir | [ Read Write Exec] | Jul 11 14:15 |
| .gtkrc | plain | [ Read Write] | Oct 30 2003 |
| .gtkrc-1.2-gnome2 | plain | [ Read Write] | Feb 27 18:39 |
| ham | dir | [ Read Write Exec] | Jul 12 18:59 |
| .ICEauthority | plain | [ Read Write] | Aug 4 08:47 |
| iptabs.txt | plain | [ Read Write] | Jul 19 06:41 |
| .java | dir | [ Read Write Exec] | Jul 30 19:51 |
| .kde | dir | [ Read Write Exec] | Feb 24 20:19 |
| dir | [ Read Write Exec] | Mar 1 19:49 | |
| dir | [ Read Write Exec] | Mar 1 20:07 | |
| .mcop | dir | [ Read Write Exec] | Jun 25 18:58 |
| memTest | dir | [ Read Write Exec] | Apr 30 19:47 |
| .metacity | dir | [ Read Write Exec] | Feb 27 18:39 |
| .mozilla | dir | [ Read Write Exec] | Jul 25 12:18 |
| music | dir | [ Read Write Exec] | Jul 11 14:09 |
| mypgp.txt | plain | [ Read Write] | Jul 14 20:52 |
| my_rules_du_jour | plain | [ Read Write] | Jul 28 08:13 |
| .mysql_history | plain | [ Read Write] | Jul 28 18:33 |
| .nautilus | dir | [ Read Write Exec] | Feb 27 18:39 |
| NNdbase.txt | plain | [ Read Write Exec] | Jul 7 17:38 |
| notes.txt | plain | [ Read Write] | Jul 28 18:22 |
| notes.txt~ | plain | [ Read Write] | Jul 12 21:05 |
| palm | dir | [ Read Write Exec] | May 1 16:25 |
| pgpsig.txt | plain | [ Read Write] | Jul 15 18:27 |
| .phoenix | dir | [ Read Write Exec] | May 1 12:58 |
| pkgs.txt | plain | [ Read Write] | Jun 29 19:01 |
| Procmail | dir | [ Read Write Exec] | Mar 2 06:43 |
| .procmailrc | plain | [ Read Write] | Mar 2 06:50 |
| .recently-used | plain | [ Read Write] | Aug 1 18:20 |
| redhat-logviewer | dir | [ Read Write Exec] | Mar 6 19:52 |
| regex.py | plain | [ Read Write] | Jul 19 20:59 |
| regex.txt~ | plain | [ Read Write] | Jul 23 19:20 |
| rest | dir | [ Read Write Exec] | Jun 29 19:23 |
| res.txt | plain | [ Read Write] | Jul 20 19:02 |
| .rhn-applet | dir | [ Read Write Exec] | Feb 27 18:40 |
| .rhn-applet.conf | plain | [ Read Write] | Feb 27 18:39 |
| saconf.txt | plain | [ Read Write] | Jul 12 20:34 |
| sig.txt | plain | [ Read Write] | Jul 9 19:14 |
| spam | dir | [ Read Write Exec] | Jul 12 18:59 |
| .spamassassin | dir | [ Read Write Exec] | Aug 4 09:38 |
| spam.txt | plain | [ Read Write] | Jul 20 18:59 |
| speedtouch.pdf | plain | [ Read Write] | Jul 21 19:50 |
| .ssh | dir | [ Read Write Exec] | Jul 24 10:00 |
| struct.txt | plain | [ Read Write Exec] | Jul 9 17:15 |
| subvers | dir | [ Read Write Exec] | May 28 19:40 |
| .subversion | dir | [ Read Write Exec] | May 28 19:37 |
| .thumbnails | dir | [ Read Write Exec] | Jul 25 11:50 |
| tmp | dir | [ Read Write Exec] | Jul 23 17:04 |
| tmp.txt | plain | [ Read Write] | Jul 2 19:52 |
| .Trash | dir | [ Read Write Exec] | Jul 30 19:19 |
| user_prefs | plain | [ Read Write] | Jul 12 20:35 |
| user_prefsOLD | plain | [ Read Write] | Jul 12 20:31 |
| UsingC.xml | plain | [ Read Write] | May 20 18:40 |
| Various | dir | [ Read Write Exec] | Jul 11 14:02 |
| .viminfo | plain | [ Read Write] | Jul 12 17:44 |
| vi.txt | plain | [ Read Write] | Jul 11 16:01 |
| vs.jpg | plain | [ Read Write] | Jul 23 17:23 |
| words.txt | plain | [ Read Write] | Jul 23 18:47 |
| words.txt~ | plain | [ Read Write] | Jul 22 19:25 |
| .Xauthority | plain | [ Read Write] | Aug 4 08:47 |
| .xchat2 | dir | [ Read Write Exec] | Jul 1 20:11 |
| .xmms | dir | [ Read Write Exec] | Jul 11 14:10 |
| .xsession-errors | plain | [ Read Write] | Aug 4 08:48 |
| .xvpics | dir | [ Read Write Exec] | Jul 23 17:23 |
| yumdoco.txt | plain | [ Read Write] | Jul 8 19:40 |
| AprilFirst.txt | Unknown filetype: X | [ Read Write] | Apr 1 19:40 |
I asked Michael Kay to expand on AVT's and curly brackets {}. He replied
In XSLT, there are some attributes whose value is an XPath expression (e.g. select, test), and there are some attributes whose value is an AVT (notably attributes of literal result elements, also a few others like xsl:element/@name and xsl:analyze-string/@regex.)
In the first case, curly braces have no special significance. Curlies are not used in the XPath grammar, so the only place curlies can appear is within comments or string literals. For example,
<xsl:if test="matches($a, 'x{2}')">
uses curlies within a string literal (which happens to be a regular expression). They are written as themselves, just like any other character.
In the second case (AVTs), curly braces are used to separate the fixed part of the attribute value from the variable part. For example <a href="{$file}.html"/>. If you want to generate an attribute value that actually contains a curly brace, you need to double it. For example, to generate <a foo="{bar}"/> you need to write <a foo="{{bar}}"/>.
The regex attribute of xsl:analyze-string is an AVT, so if you want to use the regular expression "s{3}" (which matches a sequence of three s's) you need to write it as regex="s{{3}}". If you wrote regex="s{3}", the "3" would be interpreted as an XPath expression and would be evaluated and replaced by its value, so the actual regular expression used would be "s3". On the other hand, if you want to match a sequence of $n instances of s, where $n is an XPath variable, you could write regex="s{{{$n}}}"; if $n has the value 5 this would expand to the regular expression "s{5}".
If you write regex="a*", the regex is "a*". If you write regex="{$x}", this will use the regex contained in XPath variable $x. If you write regex="a{2+2}", this will be treated as an AVT, so the actual regex will be "a4" If you write regex="a{{3}}", the double curlies are used to escape single curlies, so the regex is "a{3}".
XPath expressions cannot have AVTs inside them: they can be
used inside curlies within an AVT, but curlies within a string literal
within an XPath expression within an AVT do not need to be doubled. If
I write <a val="{concat('{', 3, '}')}"/>, the
result will be <a val="{3}"/>. Similarly if I
write <a val="{matches($x, 'a{3}')}"/> the result will be
true if $x matches the regex "a{3}".
Todo: