Patterns using text
1. | A few specific values |
To limit the available choices to, say, 3 characters, A, B or C, as shown in Eric Van Der Vlist's book books.xmlschemata.org. <select>B</select> Use the following pattern
<element name="select">
<choice>
<value type="token">A</value>
<value type="token">B</value>
<value type="token">C</value>
</choice>
</element>
The data type restricts the selection. It does not fail with whitespace either side of the value, as in <select> A </select> A variant on this is to use the regexp form <element name="selectS"> <data type="token"> <param name="pattern">A|B|C</param> </data> </element> Which validates just the same as the previous example. | |
2. | Restricted String length |
To maintain less than a certain string length, use the following pattern
<element name="rlen">
<data type="string">
<param name="maxLength">25</param>
</data>
</element>
This restricts instance string lengths to the specified value. Of course there is the minimum too. So to restrict a value between a minimum and a maximum length, use the following pattern, e.g. to keep a value between 10 and 25 characters.
<element name="rlen">
<data type="string">
<param name="minLength">10</param>
<param name="maxLength">25</param>
</data>
</element>
Note that both are inclusive. Finally, for a fixed length element, use the param with name="length", which checks only for that length, i.e. less than or more than is reported as an error. The final note on this one, string length here refers normally to a count of Unicode characters. | |
3. | Value patterns |
The basic value pattern enables a choice to be made between two similar but differentiable patterns, based on a value content. As an example, a URI could represent either a namespace or a real web page.
<uri type="namespace" id="homens">http://www.dpawson.co.uk/ns#</uri>
<uri type="website" id="homesite">http://www.dpawson.co.uk</uri>
The former represents a namespace, the latter a website. The pattern to define these is shown below.
<define name="Uri">
<element name="uri">
<attribute name="id"><text/></attribute>
<choice>
<group>
<attribute name="type">
<value>website</value>
</attribute>
</group>
<group>
<attribute name="type">
<value>namespace</value>
</attribute>
</group>
</choice>
<data type="anyURI"/>
</element>
</define>
Each group offers a choice of either the uri which represents a namespace or a uri which represents a webpage. The only differentiator is the value of the type attribute. A simple but useful pattern. | |
4. | Enumerations |
The basic pattern is as shown below, the available values are wrapped in a choice element
<choice>
<value>A</value>
<value>B</value>
<value>C</value>
</choice>
To use this as an element then. Also noting that the default data type is a token, which is the better option than string, using:
<element name="letter">
<choice>
<value>A</value>
<value>B</value>
<value>C</value>
</choice>
</element>
This form will allow whitespace around the values. For example if the enumeration is an element, as per the above example, <letter> A </letter> would be valid with the additional whitespace. | |
5. | String value exceptions |
On occasion it is useful to allow any string value except a given one, for whatever reason. For this purpose the except element is avalable. As an example, suppose a part number is an eight character alphanumeric, 4 digits, 2 alpha uppercase and two more digits. Validation is required that part number 1234AB56 is not used in this field. The except element enables this.
<define name="Pn">
<element name="pn">
<data type="token">
<param name="pattern">[0-9]{4}[A-Z]{2}[0-9]{2}</param>
<except>
<value>1234AB56</value>
</except>
</data>
</element>
</define>
This example allows any 4,2,4 combination except the named string. The pattern specifies the 4 digits, 2 alpha's (uppercase) followed by 2 more digits. Other values can be added within the <except> to provide a whole list of exceptions. | |
6. | Regex and Grouping. |
As an example of this, I was trying to generate a schema for a set of directions, left, right, 3rd left etc. This is the element I defined
## A list of right or left turns, comma separated.
Directions = element directions {
xsd:string {pattern = " *[0-9]? (right|left)(, *[0-9]?\s?(right|left))*"}
}
OK, and the XML version. After a month with the XML version, I've slowly converted to the compact syntax.
<define name="Directions">
<a:documentation>A list of right or left turns, comma separated.</a:documentation>
<element name="directions">
<data type="string">
<param name="pattern"> *[0-9]? (right|left)(, *[0-9]?\s?(right|left))*</param>
</data>
</element>
</define>
The two groups consist of the optional numeric value and the string right or left. White space (either \s or _* ) is used to seperate the first from the second group, which can be repeated. An instance valid to this is: <directions> 3 right, 7 left, 1 left</directions> | |
7. | Email address regexp. |
if you need a regular expression for addr-spec, then it is
No line breaks! \ is a continuation.
Please re-join the lines.
start=element addr-spec {
xsd:token {
pattern=
"""([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'* \
+\-/=?\^_`{|}~]+)*|"([^"\\]|\\.)*")@([a-zA-Z0-9!#$%&'*+\-/=? \
\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*|\[([^\[\]\\]|\\.) \
*\])"""
}
}
or in xml format,
<element name="addr-spec">
<data type="token">
<param name="pattern">
([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%& \
'*+\-/=?\^_`{|}~]+)*|"([^"\\]|\\.)*")@([a-zA-Z0-9!#$%&' \
*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)* \
|\[([^\[\]\\]|\\.)*\])</param>
</data>
</element>
Works for both normal addresses and forms like "David Tolpin"@[Obscure Place] The expression above is slightly more allowing than required, but should be appropriate for the majority of cases. John Cowan then sent me ... something. Its either a regex or line noise, I'm still not sure. | |
8. | up to n mutually exclusive values |
I'm trying to define an attribute that can have a list of one or more (up to three) mutually exclusive values. For example, all the following are valid: <test state="hide"/> <test state="hide enable"/> <test state="hide enable lower"/> <test state="show raise"/> <test state="raise show"/> I tried to specify the element and attribute as follows:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
<start>
<ref name="test"/>
</start>
<define name="test">
<element name="TEST">
<attribute name="state">
<interleave>
<choice>
<value>hide</value>
<value>show</value>
</choice>
<choice>
<value>enable</value>
<value>disable</value>
</choice>
<choice>
<value>raise</value>
<value>lower</value>
</choice>
</interleave>
</attribute>
</element>
</define>
</grammar>
However, when I run jing on this to verify it, I get the following message:
David Tolpin replies The specification explains this in 7.2 String sequences. One cannot have strings interleaved or grouped. Thus you are probably out of luck unless you list all combinations of attribute constituents. John Cowan adds You have stepped on one of the very few arbitrary restrictions of RNG: you can't interleave typed data, only elements and text. You could write a bit of XSLT to generate all the cases. There are 27 logical cases.
Total physical cases: 79. David comes back One should not go so far. Relax NG is not that bad.
hl=("hide"|"show")?
ed=("enable"|"disable")?
lr=("raise"|"lower")?
state=attribute state {list {
(hl,((ed,lr)|(lr,ed)))
| (lr,((hl,ed)|(ed,hl)))
| (ed,((hl,lr)|(lr,hl)))}}
start = element test { state }
Choice is allowed. But in general, regular expressions with interleave would help. rng form is:
<define name="Options">
<element name="options">
<ref name="OptAtts"/>
</element>
</define>
<define name="OptAtts">
<attribute name="state">
<list>
<choice>
<group>
<ref name="hl"/>
<choice>
<group>
<ref name="ed"/>
<ref name="lr"/>
</group>
<group>
<ref name="lr"/>
<ref name="ed"/>
</group>
</choice>
</group>
<group>
<ref name="lr"/>
<choice>
<group>
<ref name="hl"/>
<ref name="ed"/>
</group>
<group>
<ref name="ed"/>
<ref name="hl"/>
</group>
</choice>
</group>
<group>
<ref name="ed"/>
<choice>
<group>
<ref name="hl"/>
<ref name="lr"/>
</group>
<group>
<ref name="lr"/>
<ref name="hl"/>
</group>
</choice>
</group>
</choice>
</list>
</attribute>
</define>
<define name="hl">
<optional>
<choice>
<value>hide</value>
<value>show</value>
</choice>
</optional>
</define>
<define name="ed">
<optional>
<choice>
<value>enable</value>
<value>disable</value>
</choice>
</optional>
</define>
<define name="lr">
<optional>
<choice>
<value>raise</value>
<value>lower</value>
</choice>
</optional>
</define>
<options state="show enable"/>
<options state="disable hide lower "/>
<options state="lower "/>
now shows as valid. | |
9. | Match any string except a specific one |
If <except> is parallel to <choice>, then it should be possible to have multiple <value> elements inside it. <except> <value>foo</value> <value>fum</value> <value>bar</value> </except> RELAX NG is a well thought out schema format--so much better than W3C schemas ("The Schemas from Hell"). This is the first time I've signed up for relaxng-user, and I'm delighted to find that there are helpful people involved.
|