Schema association with an XML instance
I'm a relatively new "convert" (from XML Schema) to RELAX NG. I understand that there is no standard way to associate a RELAX NG schema with a document. I'm just wondering if there is any plan to make this possible.
I have got the following Request For Enhancement http://bugzilla.gnome.org/show_bug.cgi?id=125266 the idea growing up seems to have the association done at the toolkit level instead of having it into the instance. IMHO it's quite cleaner, the author can't really know what are the acceptance rule a given instance will go though, but on the other hand at a tool level automating the association between the instance and the schemas simplifies things a lot with the associated results (relibility, cost of maintainance, etc ...)
Ideally having a predefined (standardized) vocabulary to match the instances to Schemas (or processing rules but this is yet another can of worms) would be the best way to attack this problem. It's unclear to me whether the approach taken by nXML can really be expanded to other frameworks, I need to read and digest the proposal first,
What's the problem?
John Cowen answers..
It's part of the general problem of specifying appropriate XML processing; an RNG-specific solution is neither particularly general nor, IMHO, particularly useful.
James Clark responds,
I would divide the problem of specifying appropriate XML processing for a document into:
I see (b) as a special case of the problem of how to specify rules that, given an XML document, find a related resource. This is problem that the XML vocabulary that I've designed for nXML mode is intended to solve. It's not specific to RELAX NG or for that matter to schemas. You could use the same vocabulary to describe how to find the XSLT stylesheet to use to display an XML document.
Although it's important to be able to individually specify the schema
to use for a particular document, it's also convenient to be able to
specify rules that apply to classes of document. For example, on my
system I have a rule that says when the namespace URI of the document
James Clark picks up another thread
Right. I believe it's quite wrong for a schema language to dictate that a document type designer must use an element name or an attribute for a particular purpose (as W3C XML Schema does with xsi:type or xsi:schemaLocation). The document type designer should be the one who decides what element names and attribute names to use, and the schema language should allow the document type designer to write a schema to reflect their decision. I think this extends to any association mechanism and it also applies to processing instruction target names.
This does not prevent your having something in the instance that influences the association but it does imply that the ultimate authority must be outside the instance. If, for example, you want a processing instruction in the instance to point to the schema, then there must be something outside the instance that says that this is what you want and that specifies the processing instruction target to use for this purpose.
Once you decide that you want something outside the instance to control the association, I think it's an obvious decision to express that something in XML and it's highly desirable to make it application-independent.
If it can't, then it's a bug. It was a fundamental design goal to express the association rules in an application-independent way. Eventually, I hope to implement this for Jing as well.
In response to a question,
James Clark replied,
That's one policy that you might want, but I don't think it's the only one. You can get it by explicitly adding a <uri resource="..." ...> element to a schema locating file.
> Thus, for me the only reasonable choice is
If you want to use DOCTYPEs, the nXML method can accomodate you (by doctypePublicId rules). However, I find the problems of using DOCTYPEs worse by far than the problem of associations disappearing on a rename. And even with DOCTYPEs, you can still get problem of the association changing; you still have to associate your DOCTYPEs with schemas. If you force me to put something in the instance, I would much prefer a processing instruction.
There's no single right way to do the association. Different users will legitimately prefer different approaches. A solution needs to be flexible enough to accomodate them.
Another concern expressed as
> My opinion is that this association should be
brought this response from James
It's a basic tenet of RELAX NG that the schema is not inherent in the document and that validation is a process that has two independently-specifiable inputs. Section 8 of the spec says: "A conforming RELAX NG validator must be able to determine for any XML document and for any correct RELAX NG schema whether the document is valid with respect to the schema." If I understand you correctly, you're saying that if the document contains a particular processing instruction, then there should not be a way to validate it against a schema other than that specified in the processing instruction. That's clearly non-conformant. A conforming RELAX NG validator must allow you to use any schema to validate a document, no matter what processing instruction the document contains.
With regards to document management over time, James responds to this question
> Documents have very long
Which is exactly why your documents should not contain anything specific to a particular schema language. Who knows what schema language we'll all be using in 20 years?
The assertion shouldn't be specific to a particular schema language. The assertion should be an assertion that the document belongs to a particular abstract type; an abstract document type involves more than just the (usually infinite) set of documents belonging to the type; there's also semantics, whether formal or informal.
There is no standardized way to make such an assertion. It's not the job of RELAX NG (or indeed of any particular schema language) to standardize such a mechanism. If you want there to be a standard way, I suggest you take it up with the W3C or some other standards body.
I agree that it's often desirable to have a document include information about the abstract type to which it belongs. But it's up to you to decide how your documents should represent this information, just like it's up to you to decide how they should represent any other information. If namespaces aren't enough, then use a PI or use an attribute on the document element. The choice is yours. A schema association mechanism should be able to make use of whatever reasonable way you've chosen rather than mandate a particular way.
Locating a schema in nxml-mode
Unlike DTDs, RELAX NG does not specify a way to locate the schema for a document. nXML mode's way is to use a list of schema locating files. A schema locating file is an XML document specifying rules for locating a schema. It must be valid with respect to the schema locate.rnc. Each file specifies a list of rules. The rules from each file are appended in order. To locate a schema each rule is applied in turn until a rule matches. The matching rule is then used to determine the schema.
The variable rng-schema-locating-files specifies the list of schema locating files that nXML mode should use. It is not an error if some of the files do not exist. If a file-name is relative, it will be resolved relative to the document for which a schema is being located.
You can, of course, use nXML mode itself to edit schema locating files.
You can use the command C-c C-s to manually select the schema for the document in current buffer. Emacs will read the file-name of the schema from the minibuffer. After reading the file-name, Emacs will ask whether you wish to add a rule to a schema locating file that persistently associates the document with the selected schema.
Schema locating file basics
The document element of a schema locating file must be locatingRules and the namespace URI must be thaiopensource The children of the document element specify rules. The order of the children is the same as the order of the rules. Here's a complete example of a schema locating file:
<?xml version="1.0"?> <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/> <documentElement localName="book" uri="docbook.rnc"/> </locatingRules>
This says to use the schema
As usual with XML-related technologies, resources are identified by URIs. The uri attribute identifies the schema by specifying the URI. The URI may be relative. If so, it is resolved relative to the URI of the schema locating file that contains attribute. This means that if the value of uri attribute does not contain a /, then it will refer to a filename in the same directory as the schema locating file. The xml:base attribute may be used to change the base URI used for resolving relative URIs.
Schema locating files are designed to be useful for other applications that need to locate a schema for a document. In fact, there is nothing specific to locating schemas in the design of the schema for schema locating files; it could equally well be used for locating a stylesheet.
Using the document's URI to locate a schema
A uri rule locates a schema based on the URI of the document. The resource attribute locates a schema for a particular resource. For example,
<uri resource="spec.xml" uri="docbook.rnc"/>
specifies that that the schema for spec.xml is docbook.rnc. The pathSuffix attribute locates a schema based on the suffix of the URI. It considers only the path component of the URI. In terms of files, it is equivalent to matching on the file extension. For example,
<uri pattern=".xsl" uri="xslt.rnc"/>
specifies that the schema for documents whose URI ends with .xsl is xslt.rnc.
A transformURI rule locates a schema by transforming the URI of the document. If there is a pathSuffix attribute, then the path component of the document's URI must end with the specified suffix. If there is a pathAppend attribute, then the URI is transformed by appending the specified string to the URI's path component. If there is a replacePathSuffix attribute, then the URI is transformed by replacing the suffix matched by the pathSuffix attribute by the value of the replacePathSuffix attribute. A transformURI rule matches only if the transformed URI is a valid URI that identifies an existing resource. For example,
<transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/>
specifies that to locate a schema for a document foo.xml, Emacs should test whether a file foo.rnc exists in the same directory as foo.xml, and, if so, should use it as the schema.
Using the document element to locate a schema
A documentElement rule locates a schema based on the local name and prefix of the document element. For example, a rule
<documentElement prefix="xsl" localName="stylesheet" uri="xslt.rnc"/>
specifies that when the name of the document element is
A namespace rule locates a schema based on the namespace URI of the document element. For example, a rule
<namespace ns="http://www.w3.org/1999/XSL/Transform" uri="xslt.rnc"/>
specifies that when the namespace URI of the document is http://www.w3.org/1999/XSL/Transform, then xslt.rnc should be used as the schema.
Using the DOCTYPE declaration to locate a schema
A doctypePublicId rule locates a schema based on the public identifier specified in the DOCTYPE declaration. For example, a rule
<doctypePublicId publicId="-//W3C//DTD XHTML 1.0 Transitional//EN" uri="xhtml1-transitional.rnc"/>
specifies that when the document has a DOCTYPE declaration with a public identifier -//W3C//DTD XHTML 1.0 Transitional//EN, then xhtml1-transitional.rnc should be used as the schema.
Specifying a default schema
A default rule specifies a default schema. This rule always matches. For example,
says to use the schema docbook.rnc.
Type identifiers for documents
Type identifiers allow a level of indirection in locating the schema for a document. Instead of associating the document directly with a schema URI, the document is associated with a type identifier, which is in turn associated with a schema URI. nXML mode does not constrain the format of type identifiers. They can be simply strings without any formal structure or they can be public identifiers or URIs. Note that these type identifiers have nothing to do with the DOCTYPE declaration. When comparing type identifiers, whitespace is normalized in the same way as with the xsd:token datatype. Using type identifiers makes it easy for users to select from a set of known schemas using C-c C-t.
Each of the rules described in previous sections that uses a uri attribute to specify a schema, can instead use a typeId attribute to specify a type identifier. The type identifier can be associated with a URI using a typeId element. For example,
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> <namespace ns="http://www.w3.org/1999/xhtml" typeId="XHTML"/> <typeId id="XHTML" typeId="XHTML Strict"/> <typeId id="XHTML Strict" uri="xhtml-strict.rnc"/> <typeId id="XHTML Transitional" uri="xhtml-transitional.rnc"/> </locatingRules>
declares three type identifiers XHTML (representing the default variant of XHTML to be used), XHTML Strict and XHTML Transitional. Such a schema locating file would use xhtml-strict.rnc for a document whose namespace is http://www.w3.org/1999/xhtml. But it is considerably more flexible than a schema locating file that simply specified
<namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml-strict.rnc"/>
A user can easily use C-c C-t to select between XHTML Strict and XHTML Transitional. Also, a user can easily add a catalog
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> <typeId id="XHTML" typeId="XHTML Transitional"/> </locatingRules>
that makes the default variant of XHTML be XHTML Transitional.
A typeIdProcessingInstruction rule allows a document to specify its own typeId with a processing instruction. The target attribute specifies the processing instruction target that should be recognized as specifying a typeId in its value. For example, with an additional rule
a document that started
<?my-doctype XHTML Transitional?> <html xmlns="http://www.w3.org/1999/xhtml">
would be validated against xhtml-transitional.rnc.
A typeIdBase rule makes it possible to avoid having to add an explicit rule for every typeId. For example, a rule
occuring in a schema locating file /home/jjc/schema/schemas.xml would make Emacs try to use file /home/jjc/schema/DocBook.rnc for a type identifier of DocBook; it would test whether that file existed, and if it did, it would use it. In terms of URIs, Emacs appends the value of the append attribute to the typeId; it then %-escapes all URI-significant characters; this is then treated as a relative URI and resolved relative to the base URI applicable to the typeIdBased element. The typeId will be mapped to this URI, provided that the URI identifies an existing resource.
Using multiple schema locating files
The include element includes rules from another schema locating file. The behavior is exactly as if the rules from that file were included in place of the include element. Relative URIs are resolved into absolute URIs before the inclusion is performed. For example,
includes the rules from rules.xml.
The process of locating a schema takes as input a list of schema locating files. The rules in all these files and in the files they include are resolved into a single list of rules, which are applied strictly in order. Sometimes this order is not what is needed. For example, suppose you have two schema locating files, a private file
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/> </locatingRules>
followed by a public file
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> <transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/> <namespace ns="http://www.w3.org/1999/XSL/Transform" typeId="XSLT"/> </locatingRules>
The effect of these two files is that the XHTML namespace rule takes precedence over the transformURI rule, which is almost certainly not what is needed. This can be solved by adding an applyFollowingRules to the private file.
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0"> <applyFollowingRules ruleType="transformURI"/> <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/> </locatingRules>