Table of Contents
The simplest task an NVDL implementation can do is to validate a simple document with no namespaces. Example 2.1 shows such a schema.
Example 2.1. A basic NVDL script
This script runs with a simple schema shown in Example 2.2.
Example 2.2. The schema for example 1
<?xml version="1.0" encoding="iso-8859-1"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
<start>
<ref name="root"/>
</start>
<define name="root">
<element name="doc">
<oneOrMore>
<element name="chapter">
<oneOrMore>
<element name="para"><text/></element>
</oneOrMore>
</element>
</oneOrMore>
</element>
</define>
</grammar>
This schema validates a simple document instance as shown in Example 2.3
Example 2.3. The xml instance for example 1
<doc>
<chapter>
<para>Paragraph content</para>
<para>More content</para>
</chapter>
<chapter>
<para>Paragraph content</para>
<para>More content</para>
</chapter>
</doc>
This is as simple as it gets. To see how it all fits together,
without real understanding, go back to Example 2.1. The script has a rules element as its document element. This is
namespaced in the http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0
nvdl namespace. The first and only child of this is
the namespace element. This element defines
the namespace of interest (the default namespace), and within it, the
validate element states that the schema
basics.rng should be used to validate the
content. It can be a part of the content of the xml instance, but
since there's nothing else in the script, in this case it turns out to
be the entire document. This shows no advantage at all compared to
validating as we have done since SGML, against a single schema. It
only really becomes interesting when we start to mix
namespaces. It does however introduce the idea of sections
At its simplest a section is a well-formed part of the document instance where the document element and all its children are in the same namespace. The example above clearly matches this criteria. When more than one namespace is in use, sections are created for each element such that it and all its descendants belong to the same namespace. An attribute section is a set of attributes which belong to the same namespace. In order to be validated using conventional methods, these attributes are attached to a virtual element.
If a source document for validation is sectioned as defined here, it generates single namespaced well-formed instances which can be dispatched to a validator for treatment as a single entity. If each section is reported as being valid, then the entire document is valid to the NVDL standard.
Don't forget attribute sections. I rarely see attribute sections that would make sense to validate alone, which does not mean they don't happen. A rule which allows us to ignore them is the default,
<anyNamespace match="attributes"> <attach/> </anyNamespace>
so the attribute sections are quietly attached to the parent section (most normally the element on which the attributes occur). If, however, you have reason so to do, feel free to validate an attribute section. How? I'm not sure.
The trigger decomposes sections from the same namespace. It defines a namespace with the ns attribute and a space separated list of local element names with the nameList attribute. If we have a section for the trigger namespace and we find an element different from the section root contained in the trigger nameList whose parent is not in the trigger nameList then the section is decomposed, the found element being the root of a new section.
A more realistic use of NVDL is to validate a document with two or more namespaces. Rather than present XHTML, SVG and Docbook examples, I want to use simpler namespaces, to keep the focus on what is happening. So the examples will be using namespaces with simple prefixes and I'll call out the namespaces by the simpler value, ignoring http://. Boring, but I'm sure the reader can make the transition from simple namespaces to more useful ones. So. Consider an instance like Example 2.4
Example 2.4. An XML instance using three namespaces
Complex? No... and yes. It does illustrate the use of sectioning though. Three namespaces, three sections. Internally, an NVDL implementation will produce 3 sections as shown in this example
Example 2.5. The three sections from Example 2.4
Remember that you aren't presented with these fragments. This is just the way the standard talks about sectioning. The point to remember is that each one needs a schema against which each section may be validated. From the original source document schema it should be possible to extract an appropriate schema for each section. Take this example. A possible schema for the full document is shown in below
Example 2.6. A possible schema for the full document
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" xmlns:rng="http://relaxng.org/ns/structure/1.0" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" xmlns:d="http://document"xmlns:h="http://head" xmlns:b="http://body" > <start> <ref name="root"/>
</start> <define name="root"> <element name="d:doc"> <ref name="head"/> <ref name="body"/> </element> </define> <define name="head"> <element name="h:head"> <text/> </element> </define> <define name="body"> <element name="b:body"> <oneOrMore> <element name="b:chapter"> <element name="b:para"> <text/> </element> </element> </oneOrMore> </element> </define> </grammar>
The schemas that we want for the three sections are there, just
rather buried. The key points to note are that the 'sections' that we
need for NVDL are within define
elements. This allows them to be referenced from within the start
pattern. If another schema is created, which includes this one, but
with a greater choice of start patterns, we have just what is needed
to validate the sections, without spoiling the main schema! This is
shown in Example 2.7
Example 2.7. A schema with additional start patterns
The effect of using this as a subschema is
that we now have a schema for each of the sections isolated by the
NVDL script. Except that the root element (as a section - see above) is empty. The schema requires the
head and body content. So the doc element needs
to be declared empty, as shown in Example 2.8
Example 2.8. A reduced schema, matching the sections
<grammar
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
xmlns:rng="http://relaxng.org/ns/structure/1.0"
xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
xmlns:d="http://document"
xmlns:h="http://head"
xmlns:b="http://body" >
<start combine="choice">
<choice>
<ref name="root"/>
<ref name="head"/>
<ref name="body"/>
</choice>
</start>
<define name="root">
<element name="d:doc">
<empty/>
<!--
<ref name="head"/>
<ref name="body"/>
-->
</element>
</define>
<define name="head">
<element name="h:head">
<text/>
</element>
</define>
<define name="body">
<element name="b:body">
<oneOrMore>
<element name="b:chapter">
<oneOrMore>
<element name="b:para">
<text/>
</element>
</oneOrMore>
</element>
</oneOrMore>
</element>
</define>
</grammar>
The content model which has been removed is simply commented out. For simpler schemas this approach is satisfactory. There are other occasions when it won't suffice. We'll address those later. For now note that the schema is simple, has one entry point for each namespace, and a matching series of patterns for each of those sections. Validation can now take place, once the script is written.
This way of generating a script is generally applicable, although it is easier to isolate the sections in your head.
Example 2.9 shows a suitable script
Example 2.9. The NVDL script for Example 2.4
<?xml version="1.0" encoding="utf-8"?> <rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"> <namespace ns="http://document"><validate schema="routing.1.rng"/> </namespace> <namespace ns="http://head"> <validate schema="routing.1.rng"/>
</namespace> <namespace ns="http://body"> <validate schema="routing.1.rng"/>
</namespace> </rules>
This class of script is the simplest. Each namespace is despatched for validation against a dedicated schema. Any child elements within each section are rejected, causing the dispatcher to look for another namespace against which validation can be actioned. You might try inserting foreign namespaced content into one of the sections to see this in action. What you will be seeing is the action of two default rules, shown below.
<anyNamespace>
<reject/>
</anyNamespace>
The result of this is that once a foreign namespace has been
found without a matching section, it falls back to this action, which
rejects it. Another possibility is to accept anyNamespace, for which the rule is
<anyNamespace>
<accept/>
</anyNamespace>
You may find a use for this, but you'll need to remember that the default rule (unwrittten, which makes it harder to remember) is to reject foreign namespaces.
![]() | The downsides! |
|---|---|
A word of warning before moving on. If you swap the |
The use case for this class of dispatching is for generally mixed namespace processing; where each namespaced section is valid to some schema. An example might be an XSLT stylesheet where the two namespaces are the XSLT namespace and the output file namespace. Just be aware of the limitations though.
The glossary shows some fairly precise definitions. For comprehension it may be easier to relate those terms to the processing model and some sort of mental image of what is happening.
Example 2.4 shows an XML instance with a number of sections identified. Sections are combined by the attach action. The net result is an element, which may have descendant elements and slot nodes. Since it is typically validated by some validate action, it is called "validation candidate". Sections are formed by analysis of the namespaced XML elements. Validation candidates are developed by analysis of the NVDL script. After attachment, these candidates are dispatched for validation.
An NVDL script may be divided up by namespaces, or by modes. A single mode wraps selection of a single namespace (or anyNamespace) and associates it with an action. The sequence of modes may be used to validate a hierarchical requirement.
A context consists of a base URI, a namespace map which maps prefixes to namespace URIs, and also may specify a default namespace URI (as declared by the xmlns attribute). Every element has a context.