Table of Contents
This chapter talks about selective validation; using NVDL to
pick out parts of an XML instance for despatch to a specific
validator. The first example uses Atom from IETF. This
is an XML language meant for publishing content. One use is for a
weblog or blog. Atom encapsulates content which may be in the XHTML
namespace, which is ideal for a weblog. There are a couple of wrinkles
with this. Firstly, rather than including, or referencing the XHTML
schema, Atom chose to allow anything in the XHTML namespace within a
div element. Second, the XHTML schema does not allow
div as a start element. One solution would be to modify one
or both of the schemas to address these two problems. A better one is
to validate the div element and it's children twice. Once in the loose
Atom manner, the second time using the XHTML schema, via an include
that adds the div element as a top level element. In this way neither
'standard' schema is modified. The atom schema can be found at the
above url, where it is incorporated into the rfc. The XHTML schema is
available from the W3C, but a relax NG schema is available as part of
Jing, the relax NG validator from James Clark, at this
location. The include file which allows the start element of div looks like Example 8.1
Example 8.1. A schema to enable a root element of div for xhtml. file xhtmldiv.rng
<?xml version="1.0" encoding="utf-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <include href="xhtml.rng"> <start combine="choice"> <ref name="div"/> </start> </include> </grammar>
Now for the NVDL script. The key point is to enable validation using both schemas, firstly the atom schema, where validation of the included XHTML is very loose, then to create a validation candidate where the content in the XHTML namespace is validated properly, against the XHTML schema. Example 8.2 shows such a script
Example 8.2. The NVDL script for XHTML embedded in Atom
<?xml version="1.0"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"
xmlns:a="http://www.w3.org/2005/Atom" xmlns:x="http://www.w3.org/1999/xhtml"
startMode="doc" >
<mode name="doc">
<namespace ns="http://www.w3.org/2005/Atom">
<validate schema="atom.rng">
<context path="feed/entry/content"
useMode="xhtml"/>
</validate>
</namespace>
</mode>
<mode name="xhtml">
<namespace ns="http://www.w3.org/1999/xhtml">
<attach/>
<validate schema="xhtmldiv.rng" />
</namespace>
</mode>
</rules>
The new construct here is the context element. The
function of this is to select a new validation candidate based on the
relative location of an element in the current context. The current
(to this rule) context is defined by the Atom namespace. The content
which requires different validation is that in the XHTML namespace,
namely the div element within the content element.
![]() | Note |
|---|---|
The div element is defined in both schemas, which is why it is validated against both schemas |
![]() | Important |
|---|---|
Note that the |
An instance which uses this form is shown in Example 8.3. This is a blog entry from my weblog
Example 8.3. An XML instance using this schema
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:x="http://www.w3.org/1999/xhtml">
<title>Dave Pawson's Blog. nodeSets</title>
<author>
<name>Dave Pawson</name>
<uri>http://www.dpawson.co.uk/nodesets/atom.xml</uri>
</author>
<category term="nodesets" label="nodesets"/>
<id>http://www.dpawson.co.uk/nodesets</id>
<updated>2005-04-15T21:21:00.0Z</updated>
<subtitle>Getting used to Atom </subtitle>
<category term="nodesets"/>
<link rel="self" href="http://www.dpawson.co.uk/nodesets/atom.xml"
type="application/atom+xml"/>
<entry>
<title>Entry title</title>
<published>2005-04-01T08:00:00Z</published>
<id xml:lang="en-GB">http://www.dpawson.co.uk/nodesets/050401</id>
<updated>2005-11-25T12:39:47.0Z</updated>
<author>
<name>Dave Pawson </name>
<uri>http://www.dpawson.co.uk/nodesets</uri>
<email>blog@dpawson.co.uk</email>
</author>
<category term="ibm" label="entry"/>
<rights>© Dave Pawson 2005-2008</rights>
<summary>Entry Summary</summary>
<link rel="alternate" href="http://www.dpawson.co.uk/nodesets/entries/050401.html"
type="text/html"/>
<content type="xhtml">
<x:div>
<x:p>P content</x:p> <!-- OK -->
</x:div>
</content>
</entry>
</feed>
The context element refines the
selection process which actions sections. It can appear as a child of the following elements
validate |
allow |
reject |
attach |
attachPlaceholder |
unwrap |
James Clark phrases this differently.
Any element that takes a 'useMode' attribute can also have one or more 'context' children that override the
useModeattribute in specific contexts.
This hints at the modification of the basic useMode action. For example, if the validate element has the useMode attribute, then rather than changing
modes for any child section, the context child constrains or overrides the
selection of the next mode.
The options are illustrated in Example 8.4
Example 8.4. validate and context combinations
<!--Stay in the same mode-->
<namespace ns="http://B">
<validate schema="b.rng">
<context path="a"/>
</validate>
</namespace>
<!-- Selective change to c1 mode-->
<namespace ns="http://B1">
<validate schema="b.rng">
<context path="a" useMode="c1"/>
</validate>
</namespace>
<!-- For context A stay in same mode,
for any other context, change to mode B -->
<namespace ns="http://B2">
<validate schema="b.rng" useMode="B">
<context path="a" />
</validate>
</namespace>
<!-- For context c change to mode c1,
otherwise change to mode B-->
<namespace ns="http://B3">
<validate schema="b.rng" useMode="B">
<context path="c" useMode="c1"/>
<context path="d" useMode="d1"/>
</validate>
</namespace>
As you can see, your can have quite a few combinations! Made all
the more complex by the fact that context
elements can be added freely within the validate element.
Finally, just a little more about the path attribute. I really can't better James
Clarks definition in his NRL tutorial, shown below. It's nearly an
xpath expression, but with constraints. I only hope James doesn't mind
The
pathattribute allows a restricted form of XPath: a list of one or more choices separated by '|', where each choice is a list of one or more unqualified names separated by '/', optionally preceded by '/'. It is interpreted like a pattern in XSLT, except that the names are implicitly qualified with the namespace URI of the containing 'namespace' element. When more than one path matches, the most specific is chosen. It is an error to have two or more equally specific paths. The path is tested against a single section not the entire document: a path of '/foo' means a 'foo' element that is the root of a section; it does not mean a 'foo' element that is the root of the document.
Note: The names are in the current namespace (don't add them). Tests are against a single section and finally / means the root of a section! As with much of James writing, it is precise and clear. I always need to read it six times to extract all the meaning.