Regular Expressions in XSLT 2.0

This is an attempt to (partially) explain regular expressions (mostly abbreviated to regex, sorry Wendell) in the XSLT domain. New in XSLT 2.0, I have found them a natural companion to XSLT; sufficiently so that I went out and bought the O'Reilly regex book (Friedl), since the W3C does nothing in terms of explaining regex use within the recommendation.

Another option. This file is a testbed I used whilst playing. XSLT based, generates the html output you see on this link.

As a test piece, I'm going to use a repetitious input file, and filter it to death to try and make (some) use of most of the regex capabilities in XSLT. My input file is plain text, the result of doing a full directory listing, on a windows box and a Linux box.

Obtain an XML version of the file.

Firstly then to read in the file, in a usable (XML) format.

          <xsl:variable name="source1" select="'regex.txt'"/>
          <xsl:variable name="encoding" select="'iso-8859-1'"/>
          <xsl:variable name="src">
            <xsl:value-of select="unparsed-text($source1,$encoding)"/>
          </xsl:variable>
        

This does it as a single line, but I'll go a stage further, I want XML.

<xsl:variable name="source1" select="'regex.txt'"/>
<xsl:variable name="encoding" select="'iso-8859-1'"/>
<xsl:variable name="src">
  <doc>
  <xsl:for-each select="tokenize(unparsed-text($source1,$encoding), '\n'">
    <line><xsl:value-of select="."/></line>
  </xsl:for-each>
</doc>
</xsl:variable>
         

The result of this is that I have a variable $src, holding an xml document, in memory, which has a line element holding one line of my input text file. If you want to see some results, add

<xsl:result-document
  format = "xmlFormat"
  href = "src1.xml">
  <xsl:copy-of select="$src"/>
</xsl:result-document>

The output file isn't quite what I wanted. The tokenization (sorry about the spelling, W3C rec, not me) is based on using \n, whereas the file I'm using, being MSDOS based, has \r\n as its line ending. Change the splitter token in the tokenize statement to \r?\n and its much cleaner. I'll assume you've done that, and the output file line endings have changed from

 <line> Volume in drive F is fdisk&#xD;</line>

to

 <line> Volume in drive F is fdisk</line>

For those who didn't know, Linux and windows have different line endings. This windows file uses \r (carriage return, &#x0D;) followed by \n (newline, &#x0A;)

So now we have some XML to play with. An examination of the XML shows a few lines that I really don't want. I'll remove these using templates, each with a predicate. The full list is:

  1. Contains Volume
  2. Contains nothing, i.e. <line/>
  3. Contains Files(s)
  4. Contains Dir(s)
  5. Contains Directory

The remainder are wanted. So now I'm going to apply-templates selecting the variable $src/docs/*. See the source stylesheet for that. Now we can move on to processing the <line>'s of interest.

Presenting the content using regex

In order to make the output readable in this (HTML) file, I'm going to present the output of the transform as a table, with various pieces of information within it. Each entry will be row.

Now on to the regular expression use. The two places where it is most useful are in the <xsl:analyze-string> element and in the matches($input,'$regex') or replace($input,$regex,$replacement) . The former is a black and white take a string apart using regex, the latter two are more aligned to detailed usage here. I'll take this last statement apart a little before moving on.

If the input form is regular, with a number of alternative patterns within a line, then the analyze-string functionality is ideal. For example, given input lines such as

(date)     (withdrawn) (deposited) (who)

12/04/2002  $1202.23  \t\t            Dave
18/04/2002  \t          $120.50       Mary

    (note that I have marked TAB characters as \t)

Then the regularity of this pattern makes an ideal input for a line anaysis using the xsl:analyze-string functionality, since each 'entry' can form a group, which may be identified and extracted for markup or further processing. Four columns, (two present or replaced by the tab character), each of a regular format.

Given that some of the input data for this exercise aligns with this regularity, and another (the filenames) is far less regular, then a different approach is more appropriate, based on using <xsl:choose>, with the test attribute using the match function.

Put the date into the first column of the table.

Firstly then the <xsl:analyze-string usage. The form used here to abstract the date is used within the template matchine line:

<td>
<xsl:analyze-string 
         regex="
    ^[0-9]{{2}}
    /
     [0-9]{{2}}
    /
     [0-9]{{4}}
     \p{{Zs}}+" 
    flags="ix"
     select=".">
      <xsl:matching-substring>
        <xsl:value-of select="regex-group(0)"/>
      </xsl:matching-substring>
<!-- See the source for the non-matching-substring content -->
    </xsl:analyze-string>
</td>


The text presented is a line from the input variable

<line>26/06/2004 15:26 884 version2.xml</line> 

The general form of this piece of xslt is to analyze the string, then process it in one of two ways, either within the <xsl:matching-substring> child, selecting one or more matches, or to process (or not) text which fails to match, within the <xsl:non-matching-substring> child. The example shown includes the non-matching-substring to catch out rougue elements of input (I introduced an 'invalid' lines at the bottom of the source file for this purpose. More later.

Looking at the regular expression itself then. Starting at the end... (OK OK), the select attribute says the input text is the current element, the context node, the contents of a file element. Next the flags attribute (W3C) selected are the i value, implying that I want a case-insenstive match. Unused here, but worthy of note. The x value says ignore whitespace within the expression. That allows me to lay out the regex so that it makes more sense to read, and I can explain it more easily. Generally useful, unless you are explictly matching whitespace, in which case you make other arrangements! Finally, before tackling the rege itself, a note about the braces {{}}. Since XSLT chose to use the brace as an attribute value template (AVT), then the working group decided to escape the brace using double braces, hence to get the brace through the XSLT engine to the regular expression engine, two braces are used. Simple? No it isn't, is it? Now let's take apart the regex.

^[0-9]{{2}} The carat, ^, is a metacharacter, that is, its meaning is special. Interpret it as 'start of line'. Hence the match starts at the first character in the input text. [0-9] is a character class, as apposed to a simple character. Interpret it as any character, between the limits either side of the hyphen, -. In this case any single digit. Similarly, [a-z] may be used to represent any alphabetic character in the given range, (including [A-Z] since the x flag is present). So the first characters we are looking for is, start of line followed by an digit. The {{2}}, read as {2} and it is interpreted as a count for the preceding grouping (a single digit in this case). So now we are looking for start of line, two digits. Moving on to the next item.

/ Just a little easier :-) Next (to make a match) look for a slash character... or SOLIDUS if you want the posh Unicode name. That's it, no more. Just a match on a literal character. So for a full match we need... start of line, two digits and a solidus.

[0-9]{{2}} A repeat (nearly) of the first pattern. Another two digits. As an aside, if you want somewhere between 1 and 6 digits, use [0-9]{{1,6}}. If you want none or 1 digits use [0-9]?, [0-9]* if you want to match 0, 1, 2, ...any number of times. Other combinations of 'quantifiers' are available to specify how many of the preceeding character class you want to match. See W3C for more. The match pattern grows to start of line, two digits, a solidus and two more digits.

/ OK, you get the picture.

[0-9]{{4}} And more of the same; 4 digits this time. The date pattern is now complete, as start of line, two digits, a solidus, two more digits, another solidus, and finally four digits.

\p{{Zs}}+ A bit different. This is stretching into Unicode territory, and I'm less sure of myself, but this one works. The idea is to match a whitespace character. That includes at least the space character &#x20;, but also includes such things as non-breaking spaces? The quantifier + states there there must be one, and may be more

I've tried (waiting to see if Tony corrects me!) to show what characters are in what categories, as they are called. Its a separate document, linked from here. Due care please, it is over 1Mbyte in size.

So that completes the pattern. It matches the windows dump of dates and the following whitespace. The next XSLT child is the xsl:matching-substring element

   <xsl:matching-substring>
        <xsl:value-of select="regex-group(1)"/>
      </xsl:matching-substring>
    </xsl:analyze-string>

This is triggered when the regex matches on the input. In this case, on the date at the start of the line. The value-of expression is of interest. I've chosen regex-group(1) which is the full date upto the white space. When a match occurs, the matching text may be further split by parts of the match, or sub-expressions. Hence regex-group(0) is the entire match. If the regex had been written as

<xsl:analyze-string regex="
^([0-9]{{2}})  
/
([0-9]{{2}})
/
([0-9]{{4}})
\p{{Zs}}+" 
flags="ix"
select=".">

(Note the three 'sub-expressions' in round brackets). Here, within the matching-substring element, the function regex-group(3) would return the second matching sub-expression, which would be the year value. Another nice feature of regex!

Finishing off with the date then, assuming a match the final output is the date, wrapped in the <td> element. In summary. The template gave us a match on the line element. the <xsl:analyze-string specified this text, and provides the regular expression. Finally the matching-substring and (if needed) the non-matching substring provide access to the results of the match.

Now what? We have a template maching the line element. We have matched content with the regex for the date, and I want to continue to analyse the input string. The <xsl:analyse-string has done its job on the first pattern, the date, but how to get to the rest of the string? A number of options are open. I could extend the regex to cope with the remainder of the line of text, and make heavy use of regex-group() to pick out the patterns and process them. I could start to nest more <xsl:analyze-string inside the xsl:non-matching-substring children of higher <xsl:analyze-string elements. The complexity of this approach begins to mount as the depth increases. I'll extend the match to catch the date and file size. You may note that the file sizes contain a comma to separate the thousands? I want to remove that, to make it work as an integer. The replace() function works nicely for this usage.

Mmm. This seems to be working out nicely! Next the filenames. The regex definition I want to use here is to capture, after the size, everything up to the end of the line! The regex to do this is (.*)$ Its a subexpression, since I want to use it in the output. The . (period sign), represents a single character. the * is another quantifier, meaning zero or more, and the $ means stop at the end of the input text. In summary, everything after the last part of the regex (the whitespace following the file size), and the end of the file element.

So the full regular expression is now:

<xsl:analyze-string regex="
^([0-9]{{2}}
/
[0-9]{{2}}
/
[0-9]{{4}})
\p{{Zs}}+
([0-9]{{2}}:[0-9]{{2}})
\p{{Zs}}+
([0-9,]+)
(.*)$
" 
flags="ix"
select=".">

If you can spot the () pairings, you'll see the output cells, from

  <xsl:matching-substring>
        <td><xsl:value-of select="regex-group(1)"/></td>
        <td><xsl:value-of select="regex-group(2)"/></td>
        <td><xsl:value-of 
               select="replace(regex-group(3),',','')"/></td>
        <td><xsl:value-of select="regex-group(4)"/></td>
      </xsl:matching-substring>
         <xsl:non-matching-substring>
        <td><b style="color:red">Invalid date format</b></td>
        [<xsl:value-of select="substring(.,1,10)"/>]
      </xsl:non-matching-substring>

Each regex-group() picks out another wanted piece of the puzzle. The non-matching-substring is used to show up the invalid line. Very handy to show up those strings that don't fit with your specified pattern!

Just to show how the other form of regex can be used, I'm going to process the filenames even further. I want to group them according to the following rules.

I'll color them blue, green, yellow and red; but further xml markup might be more useful for automated processing. Either way, we can use regex to help.

regex-group(4) holds the data of interest, so I'll process this using the <xsl:choose> method above. The syntax for this is

matches($input , 
       $pattern , 
       $flags ) 

This provides

  <td><xsl:value-of select="replace(regex-group(3),',','')"/></td>
        <td>
        <xsl:choose>
          <xsl:when test="matches(regex-group(4),'[a-z]+.xml$','xi')">
            <span 
             class="blue">
              <xsl:value-of select="regex-group(4)"/>
            </span>
          </xsl:when>
          <!-- Add the other cases in as needed, 
             as well as the otherwise to trap exceptions-->
        </xsl:choose>
      </td>
    </xsl:matching-substring>
    

The only new regex idea here is the $ added to the end of the expression. Remember the ^ which was the metacharacter for the start of line? This is the metacharacter for the end of line, since a regex of .xml would matched both .xml and .xml~, so the $ says and no more!, i.e. the match must be the last item on the input string.

I'm going to use the flags option, setting the ix flags, so that I can lay the regular expression out clearly. The first grouping is those files with names of the form abc.xml, and since the x flag is in use, the regex is simply [a-z]+.xml I know that some files have digits in the names, but I want to point those out for checking, so I'm using the previous expression rather than including the digits using [a-z0-9]+.xml which would allow for sect21.xml. Oh, OK, lets mark them up the same way, coloured blue. Note that this is optional, not mandatory, hence I've used the regex '[a-z]+[0-9]*.xml$' where the [0-9]* * says its OK to have zero or more digits.

The next group N([0-9]+).xml is easy enough, using a regex of N[0-9]+.xml$ The N is, again, a literal match, as is .xml at the end. Note the use of the brackets round the number? As used earlier, this is a means of accessing specific sub-patterns within the match. The difference is the syntax used to get at them, since there is no regex-group() this time. To access the number, which is the only grouping, use the classic regex access scheme, $1. If there had been a few, these would have been sequentially numbered. This could be to enable the numerically named files to be sorted. I'll put them in italics.

Note that the order of the when clauses is important. This is a choose statement, so once a when clause is matched, no others are tested, so we need to move from the most specific through to the least specific.

The final sought group is specified using '[a-z0-9]+.xml~$' which simply accepts most filenames, and seeks out the tilda on the end. If I knew there might be other than .xml files in this directory, it would be more complex.

This leaves the unwanted ones, which nicely fit into the otherwise clause of the choose. These are coloured red (Yes there is one). This section of the stylesheet now looks like this.

         <td>
        <xsl:choose>
         <xsl:when test="matches(regex-group(4),'N[0-9]+.xml$','xi')">
            <i><xsl:value-of 
                select="replace(regex-group(4),'N([0-9]+).xml','$1')"/>
            </i>
            <span class="green">
              <xsl:value-of select="regex-group(4)"/>
            </span>
          </xsl:when>
          <xsl:when test="matches(regex-group(4),'[a-z0-9]+.xml~$','xi')">
            <span class="yellow"><xsl:value-of select="regex-group(4)"/></span>
          </xsl:when>
          <xsl:when test="matches(regex-group(4),'[a-z]+[0-9]*.xml$','xi')">
            <span class="blue"><xsl:value-of select="regex-group(4)"/></span>
          </xsl:when>
 
          <xsl:otherwise>
            <span class="red"><xsl:value-of select="regex-group(4)"/></span>
          </xsl:otherwise>
        </xsl:choose>
      </td>

That completes this example.

This is the output of the windows file transformation.

27/09/2003 12:36 4500 andAndOr.xml
27/09/2003 12:36 2565 apply-imports.xml
27/09/2003 12:36 2054 applytemplates.xml
22/03/2004 15:53 16141 approaches.xml
10/05/2003 20:10 6858 ArchForms.xml
18/07/2004 12:02 48977 autolayout.xml
15/12/2003 16:19 5986 barcodes.xml
29/05/2004 12:34 25928 bestpractice.xml
25/04/2004 14:00 3298 bib.xml
18/07/2004 11:57 7474 bidi.xml
15/02/2004 14:24 5687 bidi.xml~
18/04/2004 09:46 1954 bk.xml
25/04/2004 11:05 8762 bool.xml
15/07/2001 20:06 20998 braille.xml
25/04/2004 11:07 6764 break.xml
27/09/2003 12:36 1710 casefor.xml
27/09/2003 12:36 2836 catalog.xml
25/04/2004 11:14 17798 cdata.xml
27/09/2003 12:36 2116 checkbox.xml
27/09/2003 12:36 2957 colour.xml
28/09/2003 18:55 6605 comments.xml
27/09/2003 12:36 2853 conform.xml
25/04/2004 11:50 7979 conformance.xml
27/09/2003 12:36 4500 Copy of andAndOr.xml
27/09/2003 12:36 2054 Copy of applytemplates.xml
24/04/2004 19:27 4230 csv2xml.xml
18/07/2004 11:15 19494 dates.xml
27/09/2003 12:36 1345 10228 N10228.xml
25/04/2004 21:56 5575 1030 N1030.xml
24/01/2004 15:50 53814 10301 N10301.xml
27/09/2003 12:36 4880 10325 N10325.xml
27/09/2003 12:36 3601 10378 N10378.xml
27/09/2003 12:36 1122 10446 N10446.xml
27/09/2003 12:36 13273 10590 N10590.xml
18/04/2004 10:41 15664 138 N138.xml
25/04/2004 10:52 53987 1553 N1553.xml
26/06/2004 12:57 7444 1575 N1575.xml
29/05/2004 12:55 38050 1641 N1641.xml
25/04/2004 14:07 694 1665 N1665.xml
27/09/2003 12:36 11124 169 N169.xml
25/04/2004 11:08 1718 1711 N1711.xml
27/09/2003 12:36 15949 1755 N1755.xml
25/04/2004 11:44 7957 1777 N1777.xml
25/04/2004 21:55 1334 1821 N1821.xml
25/04/2004 11:54 2314 1843 N1843.xml
25/04/2004 11:59 25464 1930 N1930.xml
18/07/2004 12:31 14961 xslfaq.xml
18/07/2004 12:00 14406 xslfaq.xml~
27/09/2003 12:36 3529 xslinclude.xml
27/09/2003 12:36 1535 xsloutput.xml
27/09/2003 12:36 3307 xsltuk.xml
27/09/2003 12:36 2108 xsltut.xml
27/09/2003 12:36 3803 xslvalueof.xml
22/03/2004 15:20 26548 xslvocab.xml
22/03/2004 15:20 26548 badname.xml.ext
Invalid date format [ 01/04/200]

The Linux listing example

Starting in the same way, I'll read the file into a variable, tokenize it and then apply-templates to that variables structure. I've created $src2 to hold it, and named the line elements line2 to keep the processing simple. Note that this time, I can tokenize on \n, since this is a Linux listing, which only uses the newline character. Again the stylesheet generates an intermediate listing file, src2.xml should you wish to peruse it.

There is more information provided here, and I only want to focus on a small part of it. The permissions fields, the filename, and the date. In order to do this I first need to extract it.

The goal, as before, is a table, with each line providing

filename File type Owners permissions(read, write, execute) Date(dd/mm/yyyy)

So first of all then, to isolate the lines of interest. The code is identicla to the windows example, with just a change to the regex and the selection of groups. Firstly the regex.

     <xsl:analyze-string regex="
1                                ^([\-drwx]{{10}})  
                                \p{{Zs}}+          
2                                [0-9]{{1,}}       
                                \p{{Zs}}+          
3                                [a-z_]+             
                                \p{{Zs}}+          
4                                [a-z_]+             
                                \p{{Zs}}+
5                                [0-9]+
                                \p{{Zs}}+                                     
6                                (                  
7                                [a-z]{{3}}
                                \p{{Zs}}+          
8                                [0-9]+
                                \p{{Zs}}+
9                                ([0-9][0-9]:[0-9][0-9] | [0-9]{{4}})
10                                )     
                                \p{{Zs}}+
11                                (.*)$
                                 
                                 " 
        flags="ixs"
        select=".">

Note. The comments are mine. No comments are allowed within a regular expression, which is classed as an Attribute Value Template. Taking each numbered item one at a time. Just remember the class of line we are analysing. A few examples are shown below.

drwx------   2 dpawson dpawson    4096 Apr 28 08:09 .adobe
-rw-rw-r--   1 dpawson dpawson      24 Jul 22 19:18 .aspell.en.prepl
-rw-r--r--   1 dpawson dpawson      24 Oct 28  2003 .bash_logout

1. This matches the file permissions. Any combination of ten characters from the class containing - d r w x .

2. This matches the number of links. The quantifier {{1,}} is interpreted as at least one, maybe more?

3. This matches the user name, though it probably needs expanding significantly to be usable generally.

4. This matches the group

5. The file size is a string of digits

6. A grouping indicator. Needed to collect all the parts of the date.

7. The month. 3 characters.

8. The day of the month, one or two digits

9. Look at the example entries I've chosen. One has a date and time, one has a date and the year? This piece of the regex is grouped using (), then there are two options (signified by the pipe symbol | ). The first matches the time, the second matches the four digits of a year. This catches both cases.

10. Finally close off the date grouping symbol

11. Everything from here to the end of the input is signified by the (.*)$, just as we did before. I have introduced a new flag, the s flag. The rec describes this as

If the s flag is not specified, the metacharacter . matches any character except a newline (#x0A) character. In dot-all mode, the metacharacter . matches any character whatsoever.

So its called the dot-all mode, and it affects the interpretation of the dot metacharacter. Not essential here, simply used to show its use. Now the only flag not mentioned is the m or multi-line flag, which allows matches over a line boundary. I won't be using that in this example.

That concludes the first part. We now have the content marked up ready for presentational sorting.

Sorting the presentation

The approach is identical to the windows version. to use a xsl:choose statement, selecting required content and marking up or modifying (using replace as necessary. The first item to resolve then is the file type, determined by the first character of the permissions block of text, with the following interpretations:

Hence the regex is a choose on the first character of the first regex-group() of the matching-substring of the analyze-string! This cell of the table now looks like this

       <td>
          <xsl:choose>
            <xsl:when test="matches(regex-group(1), '^-[\-rwx]{9}')">
              <b>plain</b>
            </xsl:when>
            <xsl:when test="matches(regex-group(1), '^b[rwx\-]{9}')">
              <b>block</b>
            </xsl:when>
            <xsl:when test="matches(regex-group(1), '^d[rwx\-]{9}')">
              <b>dir</b>
            </xsl:when>
            <xsl:otherwise>
              <span class="red">Unknown filetype: 
              <xsl:value-of select="substring(regex-group(1),1,1)"/></span>
            </xsl:otherwise>
          </xsl:choose>
         /td>
      

Note that the regex is no longer using double braces {{}}. Don't ask me why. However, I asked for clarification from Mike Kay, and here he expounds. Again I've used the otherwise clauses to trap an error case (and introduced one, the last one, by changing the value to X).

That completes the file type. Now for owner permissions. We need another table cell for this. These permissions are in 3 groups for the owner (of the file), the group permissions, and finally the 'world'. I'll process them for the group, since the other two repeat this process and hence there is no learning. Each set is a combination of rwx-, meaning this entity has read, write, execute or no privileges on this file. Hence the process is to abstract the group, process them and produce appropriate markup. I've used two functions to obtain string values. Alternatives would be equally easy.

That completes this example. Please report any typo's and errors.

This is the output of the Linux file transformation.

. dir [ Read Write Exec] Aug 4 08:55
.. dir [ Read Write Exec] Jul 11 11:07
.acrobat dir [ Read Write Exec] Jul 21 18:05
.adobe dir [ Read Write Exec] Apr 28 08:09
a.out plain [ Read Write Exec] Apr 26 17:47
.aspell.en.prepl plain [ Read Write] Jul 22 19:18
.aspell.en.pws plain [ Read Write] Jul 22 19:18
.bash_history plain [ Read Write] Aug 2 20:07
.bash_logout plain [ Read Write] Oct 28 2003
.bash_profile plain [ Read Write] Oct 28 2003
.bashrc plain [ Read Write Exec] Jul 9 19:51
bin dir [ Read Write Exec] Jul 28 08:23
bookmarks.html plain [ Read Write] Jul 30 19:12
cd dir [ Read Write Exec] Jul 21 18:33
.cddbslave dir [ Read Write Exec] Jul 11 13:31
common dir [ Read Write Exec] Mar 7 09:58
contacts.ldif plain [ Read Write Exec] Jul 17 12:08
csrc dir [ Read Write Exec] Jun 15 20:04
.cvspass plain [ Read Write] Mar 6 19:29
.cvsrc plain [ Read Write] Mar 6 19:28
Desktop dir [ Read Write Exec] Jul 30 19:38
.dmrc plain [ Read Write] Feb 27 18:39
Document1.txt plain [ Read Write Exec] Jul 9 19:28
dpawson dir [ Read Write Exec] Apr 29 18:32
.emacs plain [ Read Write] Jul 24 12:38
.emacs~ plain [ Read Write] Jul 19 20:59
.emacs.d dir [ Read Write Exec] Feb 29 10:04
.esd_auth plain [ Read Write] Feb 27 18:54
evolution dir [ Read Write Exec] Aug 3 19:21
fedora-docs dir [ Read Write Exec] Jul 28 19:00
fedoranotes.txt plain [ Read Write] Jul 10 20:03
.fetchmailrc plain [ Read Write Exec] Feb 29 11:43
files dir [ Read Write Exec] Aug 1 18:51
.fonts.cache-1 plain [ Read Write] Aug 4 08:55
.forward plain [ Read Write] Mar 2 06:39
.fullcircle.mac.txt dir [ Read Write Exec] Jul 27 13:30
.gaim dir [ Read Write Exec] Aug 3 19:21
.gconf dir [ Read Write Exec] Aug 4 08:47
.gconfd dir [ Read Write Exec] Aug 4 08:55
getopts.py plain [ Read Write] Jul 21 20:23
.gimp-1.2 dir [ Read Write Exec] Jul 23 18:18
.gnome dir [ Read Write Exec] Feb 27 18:39
.gnome2 dir [ Read Write Exec] Aug 3 19:21
.gnome2_private dir [ Read Write Exec] May 19 19:48
.gnupg dir [ Read Write Exec] Jul 14 20:51
.gpilotd dir [ Read Write Exec] Jul 27 18:20
.gpilotd.pid plain [ Read Write] Jul 27 18:20
.gstreamer dir [ Read Write Exec] Jul 11 14:15
.gtkrc plain [ Read Write] Oct 30 2003
.gtkrc-1.2-gnome2 plain [ Read Write] Feb 27 18:39
ham dir [ Read Write Exec] Jul 12 18:59
.ICEauthority plain [ Read Write] Aug 4 08:47
iptabs.txt plain [ Read Write] Jul 19 06:41
.java dir [ Read Write Exec] Jul 30 19:51
.kde dir [ Read Write Exec] Feb 24 20:19
mail dir [ Read Write Exec] Mar 1 19:49
Mail dir [ Read Write Exec] Mar 1 20:07
.mcop dir [ Read Write Exec] Jun 25 18:58
memTest dir [ Read Write Exec] Apr 30 19:47
.metacity dir [ Read Write Exec] Feb 27 18:39
.mozilla dir [ Read Write Exec] Jul 25 12:18
music dir [ Read Write Exec] Jul 11 14:09
mypgp.txt plain [ Read Write] Jul 14 20:52
my_rules_du_jour plain [ Read Write] Jul 28 08:13
.mysql_history plain [ Read Write] Jul 28 18:33
.nautilus dir [ Read Write Exec] Feb 27 18:39
NNdbase.txt plain [ Read Write Exec] Jul 7 17:38
notes.txt plain [ Read Write] Jul 28 18:22
notes.txt~ plain [ Read Write] Jul 12 21:05
palm dir [ Read Write Exec] May 1 16:25
pgpsig.txt plain [ Read Write] Jul 15 18:27
.phoenix dir [ Read Write Exec] May 1 12:58
pkgs.txt plain [ Read Write] Jun 29 19:01
Procmail dir [ Read Write Exec] Mar 2 06:43
.procmailrc plain [ Read Write] Mar 2 06:50
.recently-used plain [ Read Write] Aug 1 18:20
redhat-logviewer dir [ Read Write Exec] Mar 6 19:52
regex.py plain [ Read Write] Jul 19 20:59
regex.txt~ plain [ Read Write] Jul 23 19:20
rest dir [ Read Write Exec] Jun 29 19:23
res.txt plain [ Read Write] Jul 20 19:02
.rhn-applet dir [ Read Write Exec] Feb 27 18:40
.rhn-applet.conf plain [ Read Write] Feb 27 18:39
saconf.txt plain [ Read Write] Jul 12 20:34
sig.txt plain [ Read Write] Jul 9 19:14
spam dir [ Read Write Exec] Jul 12 18:59
.spamassassin dir [ Read Write Exec] Aug 4 09:38
spam.txt plain [ Read Write] Jul 20 18:59
speedtouch.pdf plain [ Read Write] Jul 21 19:50
.ssh dir [ Read Write Exec] Jul 24 10:00
struct.txt plain [ Read Write Exec] Jul 9 17:15
subvers dir [ Read Write Exec] May 28 19:40
.subversion dir [ Read Write Exec] May 28 19:37
.thumbnails dir [ Read Write Exec] Jul 25 11:50
tmp dir [ Read Write Exec] Jul 23 17:04
tmp.txt plain [ Read Write] Jul 2 19:52
.Trash dir [ Read Write Exec] Jul 30 19:19
user_prefs plain [ Read Write] Jul 12 20:35
user_prefsOLD plain [ Read Write] Jul 12 20:31
UsingC.xml plain [ Read Write] May 20 18:40
Various dir [ Read Write Exec] Jul 11 14:02
.viminfo plain [ Read Write] Jul 12 17:44
vi.txt plain [ Read Write] Jul 11 16:01
vs.jpg plain [ Read Write] Jul 23 17:23
words.txt plain [ Read Write] Jul 23 18:47
words.txt~ plain [ Read Write] Jul 22 19:25
.Xauthority plain [ Read Write] Aug 4 08:47
.xchat2 dir [ Read Write Exec] Jul 1 20:11
.xmms dir [ Read Write Exec] Jul 11 14:10
.xsession-errors plain [ Read Write] Aug 4 08:48
.xvpics dir [ Read Write Exec] Jul 23 17:23
yumdoco.txt plain [ Read Write] Jul 8 19:40
AprilFirst.txt Unknown filetype: X [ Read Write] Apr 1 19:40

Attribute Value Templates, curlies and double curlies.

I asked Michael Kay to expand on AVT's and curly brackets {}. He replied

In XSLT, there are some attributes whose value is an XPath expression (e.g. select, test), and there are some attributes whose value is an AVT (notably attributes of literal result elements, also a few others like xsl:element/@name and xsl:analyze-string/@regex.)

In the first case, curly braces have no special significance. Curlies are not used in the XPath grammar, so the only place curlies can appear is within comments or string literals. For example,

<xsl:if test="matches($a, 'x{2}')">
    

uses curlies within a string literal (which happens to be a regular expression). They are written as themselves, just like any other character.

In the second case (AVTs), curly braces are used to separate the fixed part of the attribute value from the variable part. For example <a href="{$file}.html"/>. If you want to generate an attribute value that actually contains a curly brace, you need to double it. For example, to generate <a foo="{bar}"/> you need to write <a foo="{{bar}}"/>.

The regex attribute of xsl:analyze-string is an AVT, so if you want to use the regular expression "s{3}" (which matches a sequence of three s's) you need to write it as regex="s{{3}}". If you wrote regex="s{3}", the "3" would be interpreted as an XPath expression and would be evaluated and replaced by its value, so the actual regular expression used would be "s3". On the other hand, if you want to match a sequence of $n instances of s, where $n is an XPath variable, you could write regex="s{{{$n}}}"; if $n has the value 5 this would expand to the regular expression "s{5}".

If you write regex="a*", the regex is "a*". If you write regex="{$x}", this will use the regex contained in XPath variable $x. If you write regex="a{2+2}", this will be treated as an AVT, so the actual regex will be "a4" If you write regex="a{{3}}", the double curlies are used to escape single curlies, so the regex is "a{3}".

XPath expressions cannot have AVTs inside them: they can be used inside curlies within an AVT, but curlies within a string literal within an XPath expression within an AVT do not need to be doubled. If I write <a val="{concat('{', 3, '}')}"/>, the result will be <a val="{3}"/>. Similarly if I write <a val="{matches($x, 'a{3}')}"/> the result will be true if $x matches the regex "a{3}".

Todo: