Selective Validation

Revision History
Revision 0.12008-05-14T12:36:47ZDave Pawson
Initial Issue

Table of Contents

Context

This chapter talks about selective validation; using NVDL to pick out parts of an XML instance for despatch to a specific validator. The first example uses Atom from IETF. This is an XML language meant for publishing content. One use is for a weblog or blog. Atom encapsulates content which may be in the XHTML namespace, which is ideal for a weblog. There are a couple of wrinkles with this. Firstly, rather than including, or referencing the XHTML schema, Atom chose to allow anything in the XHTML namespace within a div element. Second, the XHTML schema does not allow div as a start element. One solution would be to modify one or both of the schemas to address these two problems. A better one is to validate the div element and it's children twice. Once in the loose Atom manner, the second time using the XHTML schema, via an include that adds the div element as a top level element. In this way neither 'standard' schema is modified. The atom schema can be found at the above url, where it is incorporated into the rfc. The XHTML schema is available from the W3C, but a relax NG schema is available as part of Jing, the relax NG validator from James Clark, at this location. The include file which allows the start element of div looks like Example 8.1

Example 8.1. A schema to enable a root element of div for xhtml. file xhtmldiv.rng

  
<?xml version="1.0" encoding="utf-8"?>
<grammar  xmlns="http://relaxng.org/ns/structure/1.0">
<include href="xhtml.rng">
<start combine="choice">
   <ref name="div"/>
</start>
</include>
</grammar>




Now for the NVDL script. The key point is to enable validation using both schemas, firstly the atom schema, where validation of the included XHTML is very loose, then to create a validation candidate where the content in the XHTML namespace is validated properly, against the XHTML schema. Example 8.2 shows such a script

Example 8.2. The NVDL script for XHTML embedded in Atom

<?xml version="1.0"?>

<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"
   xmlns:a="http://www.w3.org/2005/Atom" xmlns:x="http://www.w3.org/1999/xhtml"
   startMode="doc"   >

   <mode name="doc">
       <namespace ns="http://www.w3.org/2005/Atom">
           <validate schema="atom.rng">        1
               <context path="feed/entry/content"  
	          useMode="xhtml"/>              2
           </validate>
       </namespace>
   </mode>


   <mode name="xhtml">
       <namespace ns="http://www.w3.org/1999/xhtml">
           <attach/>                          3
           <validate schema="xhtmldiv.rng" /> 4
       </namespace>
   </mode>
</rules>  
1

First section is the atom content

2

On meeting the content element, the xhtml mode is selected

3

The first action is to attach this to the parent element, i.e. to validate against the atom schema

4

Finally, the same section is validated against the XHTML schema, via the selected include


The new construct here is the context element. The function of this is to select a new validation candidate based on the relative location of an element in the current context. The current (to this rule) context is defined by the Atom namespace. The content which requires different validation is that in the XHTML namespace, namely the div element within the content element.

[Note]Note

The div element is defined in both schemas, which is why it is validated against both schemas

[Important]Important

Note that the path attribute holds the parent of the section to be validated in the changed mode. Hence the new section starts with the first child of the content element, Example 8.3 shows such a structure

An instance which uses this form is shown in Example 8.3. This is a blog entry from my weblog

Example 8.3. An XML instance using this schema

  <?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
  xmlns:x="http://www.w3.org/1999/xhtml">
   <title>Dave Pawson's Blog. nodeSets</title>
   <author>
      <name>Dave Pawson</name>
      <uri>http://www.dpawson.co.uk/nodesets/atom.xml</uri>
   </author>
   <category term="nodesets" label="nodesets"/>
   <id>http://www.dpawson.co.uk/nodesets</id>
   <updated>2005-04-15T21:21:00.0Z</updated>
   <subtitle>Getting used to Atom </subtitle>
   <category term="nodesets"/>
   <link rel="self" href="http://www.dpawson.co.uk/nodesets/atom.xml"
         type="application/atom+xml"/>
   <entry>
      <title>Entry title</title>
      <published>2005-04-01T08:00:00Z</published>
      <id xml:lang="en-GB">http://www.dpawson.co.uk/nodesets/050401</id>
      <updated>2005-11-25T12:39:47.0Z</updated>
      <author>
         <name>Dave Pawson </name>
         <uri>http://www.dpawson.co.uk/nodesets</uri>
         <email>blog@dpawson.co.uk</email>
     
      </author>
 
      <category term="ibm" label="entry"/>
      <rights>© Dave Pawson 2005-2008</rights>

      <summary>Entry Summary</summary>
      <link rel="alternate" href="http://www.dpawson.co.uk/nodesets/entries/050401.html"
            type="text/html"/>
      <content type="xhtml">
	<x:div>
	      <x:p>P content</x:p> <!-- OK  -->
	</x:div>
      </content>
   </entry>
</feed>

Context

The context element refines the selection process which actions sections. It can appear as a child of the following elements

validate
allow
reject
attach
attachPlaceholder
unwrap

James Clark phrases this differently.

Any element that takes a 'useMode' attribute can also have one or more 'context' children that override the useMode attribute in specific contexts.

This hints at the modification of the basic useMode action. For example, if the validate element has the useMode attribute, then rather than changing modes for any child section, the context child constrains or overrides the selection of the next mode.

The options are illustrated in Example 8.4

Example 8.4. validate and context combinations

  

        <!--Stay in the same mode-->
        <namespace ns="http://B">
            <validate schema="b.rng">
                <context path="a"/>         1
            </validate>
        </namespace>

        <!-- Selective change to c1 mode-->
        <namespace ns="http://B1">
            <validate schema="b.rng">
                <context path="a" useMode="c1"/> 2
            </validate>
        </namespace>
        
        <!--   For context A  stay in same mode, 
	    for any other context, change to mode B   -->
        <namespace ns="http://B2">
            <validate schema="b.rng" useMode="B">  3
                <context path="a" />
            </validate>
        </namespace>
        
       <!-- For context c change to mode c1, 
            otherwise change to mode B-->
        <namespace ns="http://B3">
            <validate schema="b.rng" useMode="B">      4
                <context path="c" useMode="c1"/>       5
		<context path="d" useMode="d1"/>
            </validate>
        </namespace>


1

Neither the validate or the context use the useMode attribute so no mode change occurs

2

Mode is changed only if the a context is met in a child section

3

Remain in the same mode for a section matching a, otherwise change to mode B

4

This combination switches modes to either B or c1, depending on whether the c context is matched

5

Any section matching the c context are worked in mode c1


As you can see, your can have quite a few combinations! Made all the more complex by the fact that context elements can be added freely within the validate element.

Finally, just a little more about the path attribute. I really can't better James Clarks definition in his NRL tutorial, shown below. It's nearly an xpath expression, but with constraints. I only hope James doesn't mind

The path attribute allows a restricted form of XPath: a list of one or more choices separated by '|', where each choice is a list of one or more unqualified names separated by '/', optionally preceded by '/'. It is interpreted like a pattern in XSLT, except that the names are implicitly qualified with the namespace URI of the containing 'namespace' element. When more than one path matches, the most specific is chosen. It is an error to have two or more equally specific paths. The path is tested against a single section not the entire document: a path of '/foo' means a 'foo' element that is the root of a section; it does not mean a 'foo' element that is the root of the document.

Note: The names are in the current namespace (don't add them). Tests are against a single section and finally / means the root of a section! As with much of James writing, it is precise and clear. I always need to read it six times to extract all the meaning.