Table of Contents
unwrapvalidate, unwrap and attach exampleMy thanks to George Cristian Bina (from oXygen XML Editor) for this explanation.
The basic units on which an NVDL script acts are the sections of the source XML instance. Two document fragments are in the same section if they share the same parent.
There are element sections and attribute
sections - we will focus on element sections initially, to avoid
complexity. We will ignore triggers for the
same reason. In this simplified context sections represent XML
document fragments.
| Start with one element which is either the document element or it has a different namespace than its parent. |
| All the children of an element belonging to a section that are in the same namespace as that element also belong to that section. |
Sections are defined by the source document itself, they do not depend on the NVDL script (we ignored triggers).
We use the term Validation candidate to define a combination of sections attached for dispatching to the validator.
As the section starts with one element then the validation candidate will also start with one element (as if we serialize a fragment of XML we get again a wellformed XML document).
Suppose we have a document D that is decomposed into sections S1, S2,... SN. A validation candidate represents the combination of sections for validation. We assume that section S1 is an ancestor of S2, S2 is an ancestor of S3, S3 is an ancestor of S4, and S4 is an ancestor of S5. Si1 is the section that starts the validation candidate.
The NVDL processing model - how does it work? First it splits the document into sections - that's clear. Let's add another constraint to simplify the explanations: let's require that each mode will contain rules with only one action each.
The processing starts using an initial mode let's name that mode M1, and the section that contains the root element, let's name that the root section, S1.
Based on the section namespace we identify a condition in a rule in the current mode (a namespace rule or the anyNamespace rule). That rule contains one action A1 so we identified that section S1 will be processed with the action A1 - we do not do anything yet, we just note that section S1 will be processed with action A1.
This is the interesting part: the processing does not end, NVDL will also process the child section(s) of S1, if any, no matter what the action A1 is.
Suppose we have S2 and S3 as children of S1. The action A1 has a useMode property, that is either the same mode M1 if no useMode attribute or mode element is specified in that action; or the mode specified by the attribute or element. This mode determines what rules we use to process the children S2 and S3, let's name it M2.
For section S2 we identify in mode M2 one rule that matches the namespace of section S2 and thus one action A2. So we now know that S2 will be processed with A2 and we just make a note of that for now, we do not do anything yet. Similarly for S3 we identify an action A3 and we make a note of that.
Again, the child sections of S2 and S3 will be processed further, using the mode specified by the useMode property of the actions A2 and A3 respectively and in the end what we will get will be an action associated with each section, so the result might be:
(S1,A1), (S2, A2), (S3, A3),... (SN, AN)
This may be interpreted as above, that section S1 will be processed by action A1 etc. It is possible that an action will be paired with more than one section. This list could named an interpretation or simply a list of sections and actions.
To recap. So far we have
Split the document into sections
Started to process the sections from the root section using the initial mode and we determined as a result a list of the form (S1, A1), (S2, A2),..., (SN, AN)
These sections form a tree since one section can have multiple children. In the simple case when each section has exactly one child section S1 is the parent of S2, S2 is the parent of S3 and so on, SN-1 is the parent of SN.
Now take 5 sections and a few possible cases for actions and see what happens.
(S1, validate[X]),
(S2, validate[X]),
(S3, validate[X]),
(S4, validate[X]),
(S5, validate[X])
where validate[X] means validate with schema X.
In this case no validation candidates will be created, instead each section will be validated with schema X independent of the other sections. There will be 5 independent validate operations.
A script that generates this situation is for instance
<rules ... startMode="validateWithX">
<mode name="validateWithX">
<anyNamespace>
<validate schema="X" useMode="validateWithX"/>
</anyNamespace>
</mode>
</rules>
Note that the useMode attribute can be omitted, the default useMode is the current mode which is
validateWithX.
We start processing section S1 and the initial mode validateWithX, the rule that matches is the
anyNamespace rule and we get the action
validate with schema X. The validate action will initiate the
processing of child sections. For processing section S2 (we have only
one child of S1) we will use the mode validateWithX since it is the value of useMode attribute of the validate
action. Again the anyNamespace rule will
match, and for section S2 we get the validate with schema X action,
and so on.
(S1, validate[X]), (S2, attach), (S3, attach), (S4, attach), (S5, attach)
In this case we see that we have one validate action on section S1 and attach actions on all the other sections. A validation candidate will be created by combining the section S1 with all sections that have an attach action, that is S2, S3, S4 and S5. In this case the validation candidate is equivalent to the entire document as it contains all the sections. The validate action will be applied on this validation candidate, which means we apply the validation to the entire document, as normal validation of an entire document against a schema.
A script that generates this situation is shown in Example 10.1
Example 10.1. The NVDL script demonstrating attach processing
We start processing section S1 in the initial mode validateWithX, the rule that matches is the
anyNamespace rule and we get the action
validate with schema X. The validate action
will start the processing of child sections. For processing section
S2 (we have only one child of S1) we will use the mode attachEverything, which is the useMode property of the validate action. The anyNamespace rule from the attachEverything mode will match, and for
section S2 we have the attach action. The attach action on S2 will
start the processing of S2 child sections using the mode attachEverything. Thus we will process section
S3 in the same mode and will match again the anyNamespace rule from the
attachEverything mode and therefore we will get the attach action for
S3, and so on for sections S4 and S5 we will get the same attach action.
With the section list as follows:
(S1, validate[X]),
(S2, attach),
(S3, validate[X]),
(S4, attach),
(S5, attach)
In this case two document validation candidates will be created, F1 and F2. F1 will combine sections S1 and S2. F2 will combine sections S3, S4 and S5.
There will be two validate operations, validation candidate F1 will be validated with schema X by the validate action on S1 and F2 will be validated also with schema X by the validate action on S3.
Let's suppose sections S1 and S3 have the same namespace,
different from the namespace of S2, S4 and S5. Let be that namespace
http://mainNs. Then a script that
generates the above interpretation (list of section-action pairs) would be as shown in Example 10.2
Example 10.2. Mixing validate and attach
Note that the useMode attribute can
be omitted for both actions as the same mode will be automatically
used.
We start with section S1 that has the http://mainNs namespace and with the mode
validateMainNs. The namespace rule will match and we will get the
validate action with schema X for section S1.
The action will initiate the processing of the child section S2,
which will be processed in the same mode, validateMainNs. In this case the anyNamespace
rule will match and we will get for S2 the attach action. The action
again will start the processing of S2 child sections and S3 will be
processed. S3 has the namespace http://mainNs so the namespace rule will
match and section S3 will be validated with schema X. Next, section S4
will be processed, the anyNamespace rule will match and resulting in the attach
action, and similarly for S5 the anyNamespace rule
will be action using the attach action.
What we did so far. In part 1:
| we split the document into sections |
| we started to process the sections from the root section using the initial mode and we determined as a result a list of the form (S1, A1), (S2, A2),..., (SN, AN) |
In part 2:
| we presented 3 examples with 5 sections showing validate and attach actions and how attach creates validation candidates by combining sections and have the validate action applied on those validation candidates. |
Let's continue.
Let's replace attach with unwrap in case
2 and case 3 that we presented
earlier. The NVDL scripts will be similar with attach replaced by
unwrap so I will not explain again how the actions for sections are
computed, if you can just replace attach with unwrap in those descriptions.
(S1, validate[X]),
(S2, unwrap),
(S3, unwrap),
(S4, unwrap),
(S5, unwrap)
In this case we see that we have one validate action on section S1 and unwrap actions on the other sections. The unwrap
action says just this: ignore this section. So the above
interpretation is equivalent to (S1, validate[X]) That is we have
one validate action applied on section S1.
A script that generates this situation is shown in Example 10.3
Example 10.3. Script using unwrap
(S1, validate[X]),
(S2, unwrap),
(S3, validate[X]),
(S4, unwrap),
(S5, unwrap)
The unwrap action say ignore
sections S2, S4 and S5 so we have the equivalent interpretation (S1,
validate[X]), (S3, validate[X]),
We have two validate actions, one on section S1 and one on section S3.
For the NVDL example script let's suppose similar with Case 3 that sections S1 and S3 have the same namespace, different from the namespace of S2, S4 and S5. Let be that namespace http://mainNs. Then a script that generates the above interpretation (list of section-action pairs) will be as shown in Example 10.4
Example 10.4. Script for case 5, mixed validate and unwrap
Note that the nameSpace elements are all in the same mode
You may already know what will happen if we have both attach and unwrap, don't you?
(S1, validate[X]),
(S2, unwrap),
(S3, attach),
(S4, unwrap),
(S5, attach)
Recall the unwrap actions say ignore sections S2 and S4 so we have the equivalent interpretation
(S1, validate[X]),
(S3, attach),
(S5, attach)
This gets us to Case 2 when we have a validation candidate that combines S1, S3 and S5 and there is one validate action applied on this validation candidate.
For an NVDL script let's assume sections S1, S3 and S5 have the
same namespace http://mainNs and the sections S2 and S4 have a
different namespace. Example 10.5 shows this
Example 10.5. Script using validate, unwrap and attach
<rules ... startMode="validateMainNsWithX">
<mode name="validateMainNsWithX">
<namespace match="http://mainNs">
<validate schema="X" useMode="attachMainNsIgnoreOthers"/>
</namespace>
</mode>
<mode name="attachMainNsIgnoreOthers">
<namespace match="http://mainNs">
<attach useMode="attachMainNsIgnoreOthers"/>
</namespace>
<anyNamespace>
<unwrap useMode="attachMainNsIgnoreOthers"/>
</anyNamespace>
</mode>
</rules>
We start with initial section S1 and the mode validateMainNsWithX. The first namespace rule
matches and the action is to validate with
schema X. The action starts the processing on S1 child sections and
the mode we use to process S2 is attachMainNsIgnoreOthers. S2 will be matched by the anyNamespace rule with the unwrap action. The action specifies the same
attachMainNsIgnoreOthers mode to be
used. Next, section S3 will be matched by the (http://mainNs)namespace rule and the attach
action is selected. Section S4 matches the anyNamespce rule with its unwrap
action and section S5 matches the attach action.
Well, is there any difficulty with unwrap? Can we agree that it
is the simplest action - compared with validate for which we need to determine a
validation candidate it applies to or with attach for
which we need to see to what validation candidate we attach the section? unwrap
just says ignore this section, nothing more.
(S1, attach),
(S2, attach),
(S3, attach),
(S4, attach),
(S5, attach)
This will not cause any validation to be carried out, in fact it will not cause any processing as there is no action for the sections to be attached to. The script for this is Example 10.6
It may be a litte pointless, but it does clearly demonstrate the
need for a parent for an attach
action
(S1, unwrap),
(S2, unwrap),
(S3, unwrap),
(S4, unwrap),
(S5, unwrap)
The unwrap actions say ignore sections S1, S2, S3, S4 and S5, so it is equivalent with an empty interpretation. No validation, no processing on any section. The script for this is shown in Example 10.7
(S1, unwrap),
(S2, validate[X]),
(S3, unwrap),
(S4, attach),
(S5, unwrap)
The unwrap actions say ignore sections S1, S3 and S5 so we have the equivalent interpretation (S2, validate[X]), (S4, attach)
This gets us to Case 2 when we have a validation candidate that combines S2, and S4 and there is one validate action applied on this validation candidate.
For an NVDL script let's assume sections S2, S4 have the same
namespace http://mainNs and the sections S1, S3 and S5 have a
different namespace. Example 10.8
Example 10.8. validate, unwrap and attach
example
<rules ... startMode="validateMainNsWithX">
<mode name="validateMainNsWithX">
<namespace match="http://mainNs">
<validate schema="X" useMode="attachMainNsIgnoreOthers"/>
</namespace>
<anyNamespace>
<unwrap useMode="validateMainNsWithX"/>
</anyNamespace>
</mode>
<mode name="attachMainNsIgnoreOthers">
<namespace match="http://mainNs">
<attach useMode="attachMainNsIgnoreOthers"/>
</namespace>
<anyNamespace>
<unwrap useMode="attachMainNsIgnoreOthers"/>
</anyNamespace>
</mode>
</rules>
This warrants a state diagram! Imagine each mode as a state. There are two for this script
![A state diagram for example 16 - two states, following each Section. The sequence follows the sequence (S1, unwrap), (S2, validate[X]), (S3, unwrap), (S4, attach), (S5, unwrap) , from the initial state (validateMainNsWithX), S1 links back to the same state. S2 transistions to attachMainNsIgnoreOthers. S3, s4 and S5 remain in the same state.](processingModel.ex16.png)
For the interpretation we start with initial section S1 and the
mode validateMainNsWithX. The anyNamespace rule matches and we get the unwrap
action for S1. The action starts the processing on S1 child sections
and the mode we use to process S2 is validateMainNsWithX. S2 will be matched by the
namespace rule and the action is to
validate with schema X. The action specifies the attachMainNsIgnoreOthers mode to be used (mode
2 in the diagram). Section S3 will be matched by the anyNamespace rule and the unwrap action. The
processing of S4 is done in the same mode and matches with the namespace rule and the attach action. For state
S5 the unwrap action occurs again.
unwrap is the simplest action, it just says ignore this
section.
From Case 4 to Case 9 we can see that we just ignore the section if the action to be executed on that section is unwrap.
attach without an action on an
ancestor section that has an action such as validate has no effect,
there is no validation candidate to which the current section can be
attached.
This is a conclusion from Case 7 (S1, attach), (S2, attach), (S3, attach), (S4, attach), (S5, attach) As there is no validate action to define a section that starts a validation candidate in the ancestors of the attached sections then the attach says attach this to section... no section, that is attach this section to nothing.
The validation candidate a validate action applies to is formed by the
section with the validate action and descendant sections that have
an attach action applied to them until we reach another action such as
validate, see case 3.
(S1, validate[X]),
(S2, attach),
(S3, validate[X]),
(S4, attach),
(S5, attach)
F1 will combine S1 with S2.
F2 will combine S3 with S4 and S5.
A validate action will collect into a validation candidate its section and all the descendant sections associated with an attach action until the first validate action. This validation candidate represents the XML that is validated by the validate action.
From the attach perspective;
(equivalent to the above) the attach action
attaches the current section to the validation candidate that starts with the
first ancestor section that has an action such as validate.
You can think of the process in the following way: A validate action starts a validation candidate and puts the associated section in that candidate. This candidate then becomes the current one. Then the attach actions attach the associated section to the current validation candidate.
Thus for (S1, validate[X]), (S2, attach), (S3, validate[X]), (S4, attach), (S5, attach)
We iterate the interpretation:
step 1:
(S1, validate[X]) This has a validate action, so we create a new validation candidate and add S1 to that. The current candidate is set to the new candidate
step 2
(S2, attach) has an attach action, so we add S2 to the current candidate
step 3
(S3, validate[X]) has a validate action, so we create a new validation candidate and add S3 to that
The current validation candidate is set to the new candidate
step 4
(S4, attach) has an attach action, we add S4 to the current validation candidate
step 5
(S5, attach) has an attach action, so we add S5 to the current candidate
Thus we finish with the two validation candidates for the two validate actions, the first combining S1 and S2 and the second combining S3, S4 and S5.
Now, we quickly cover all the NVDL actions.
reject - equivalent to a
script with a validate action that uses a schema
that does not allow anything.
allow - equivalent to a script with
a validate action that uses a schema that allows anything.
attachPlaceholder - instead of
attaching the section associated with the action we attach the special placeHolder element as
defined in the specification.
If we remove the limitation and we have a true tree formed by sections the logic is similar.
| The sections with attach will be attached to the validation candidate that starts with the first ancestor section that has a validate (reject or accept - we consider them equivalent with validate) action. |
| The sections with unwrap are ignored |
| The sections with attachPlaceholder will execute an attach on the placeHolder element instead of attaching the section. |
| The sections with validate, reject or allow will apply the respective validate action on the validation candidate that starts with that section to which we may have other descendant sections attached. |
If we allow multiple actions for a rule then we will obtain multiple interpretations. All those interpretations will be processed as in our examples.
Adding triggers into play: they will split some of the sections into more sections.
That's about everythig. Well, wasn't that hard, was it?
In this section we explain how sections are joined prior to being sent for validation. As an example, consider this instance and script as in Example 10.9
Example 10.9. Combining sections
file pm.ex17.xml
<test xmlns="http://example.com">
<a>
<b/>
</a>
</test>
file pm.ex17.nvdl
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0">
<trigger ns="http://example.com" nameList="a"/><!-- s2 -->
<trigger ns="http://example.com" nameList="b"/><!-- s3 -->
<namespace ns="http://example.com">
<validate schema="pr.ex17.rng"/><!-- s1 -->
<attach/>
</namespace>
</rules>
There are 3 sections, obtained using a trigger.
So we have sections S1, S2 and S3, S1 is the parent of S2 which is the parent of S3. Remember that an interpretation associates a section with an action and a mode. For this example the modes are discarded.
For section S1 there are 2 actions, so we have 2
interpretations. These are (S1, validate[X]) and (S1, attach) where X
is the schema pr.ex17.rng
Section S2 again has two actions so for both the above interpretations there are two more partial interpretations:
(S1, validate[X])(S2, validate[X])
(S1, attach)(S2, validate[X])
(S1, validate[X])(S2, attach)
(S1, attach)(S2, attach)
Why? Since both child elements are in the same namespace the
validate or attach option is a possibility that must be
considered.
In these 4 cases we still have to process S3, and for each case two possible actions are considered, thus there are 8 interpretations as below:
I1: (S1, validate[X])(S2, validate[X])(S3, validate[X])
I2: (S1, attach)(S2, validate[X])(S3, validate[X])
I3: (S1, validate[X])(S2, attach)(S3, validate[X])
I4: (S1, attach)(S2, attach)(S3, validate[X])
I5: (S1, validate[X])(S2, validate[X])(S3, attach)
I6: (S1, attach)(S2, validate[X])(S3, attach)
I7: (S1, validate[X])(S2, attach)(S3, attach)
I8: (S1, attach)(S2, attach)(S3, attach)
Note that the validate action applied to S1 appears in I1, I3, I5 and I7.
In I1 we have:
I1: (S1, validate[X])(S2, validate[X])(S3, validate[X])
which is interpreted as
Apply the validation with X on S1 only
Apply the validation with X on S2 only
Apply the validation with X on S3 only
We are interested in what happens with the validation that starts on S1. Note that I1 requires us to apply the validation with X on S1 only
Moving on:
I3: (S1, validate[X])(S2, attach)(S3, validate[X])
This implies
validate with X the validation candidate obtained by attaching S2 to S1.
validate with X the section S3 only.
Remember: I3 applies validation with X on S1 plus S2.
Next I5
I5: (S1, validate[X])(S2, validate[X])(S3, attach)
This implies
validate with X S1 only
validate with X the validation candidate obtained by attaching S3 to S2
I5 requires validation of S1 only, which is the same as I1.
Now I7
I7: (S1, validate[X])(S2, attach)(S3, attach)
This requires validation with X of the validation candidate obtained by attaching S2 and S3 to S1
Note that I7 requires validation with X of the validation candidate obtained by attaching S1, S2 and S3
To summarize
I1 says apply the validation with X on S1 only
I3 says apply the validation with X on S1 plus S2.
I5 says validate with X S1 only
I7 says validate with X the validation candidate obtained by attaching S1, S2 and S3
Stage 4 in the standard (section 8.6) says that if we have multiple interpretations that apply some action starting with a node (in this case the start element of S1) then execute the action on the largest validation candidate and ignore the others. That means in the above case that we should execute only I7 and validate with X the validation candidate obtained from attaching S2 and S3 to S1 and to ignore the other 3 interpretations.