1. | The tools | |||
Provisional start on tools. Please help me fill this out. James Clarks tool suite fills most requirements.
See the relax-ng home page | ||||
2. | Sun Multi-Schema XML Validator | |||
sun.com, The Sun Multi-Schema XML Validator (MSV) is a Java technology tool to validate XML documents against several kinds of XML schemata. It supports RELAX NG, RELAX Namespace, RELAX Core, TREX, XML DTDs, and a subset of XML Schema Part 1. This latest (version 1.2) release includes several bug fixes and adds better conformance to RELAX NG/W3C XML standards and JAXP masquerading. This download package includes binaries, source code, and detailed documentation. | ||||
3. | Python relax-ng parser | |||
I've implemented (most of) a parser in Python for the RELAX NG compact syntax. It's written in Python and uses PLY (yet another implementation of lex and yacc for Python). The parser produces a parse tree whose nodes are instances of a class ASTNode, which is defined in the parser module. It's recognizes most but *not* all of the compact syntax, but, hopefully, recognizes enough to make it useful, and can be extended when necessary. You can find documentation on the parser here: rexx.com And, you can find a distribution file here: rexx.com If I can be of help with this, please let me know. | ||||
4. | Relax-ng compact syntax validator with a pipeline wrapper | |||
is a Relax NG (Compact Syntax) XML validator. It is written in ANSI C and distributed under BSD License. RNV supports XML Schema Part 2: Datatypes; the command-line utility uses Expat by James Clark. Releases and updates are announced at http://davidashen.net/. RNV is a command line utility for validating XML instances to a Relax-ng Compact schema. It has been tested validating docbook documents as well as many other grammars. Available at davidashen.net I have implemented XML Schema datatypes, with the some limitations, listed in the documentation: The support for datatypes includes an implementation of Unicode regular expressions (in XML Schema syntax). I've run it through 1117 NIST datatype tests, and the only tests which pass when they should not are due to double overflow and underflows (double is not 64 bit IEEE on my computer). I've also tested the validator itself with everything I could find (that is, I run it on many Relax NG and XML Schema files The program is available with the full source code and should be easy to build in any unix-like environment. A windows executable (compiled with Borland C/C++ Builder) is included with the distribution. I've also successfully compiled and tested it in cygwin environment (the command-line utility uses Expat). A user comment: I'm impressed! It ate all my test files at first go, at lightning speed, and gave a good error report when I introduced a mistake. 65000 lines of the TEI Guidelines in 1.9 seconds, 5 times faster than jing. To facilitate the embedding of RNV into heterogeneous environments, I have developed RVP, a pipe that expects validation primitives on one end and emits validation diagnostics from the other. Embedding examples in Perl and Python are provided; I believe that, on the day of writing it, these are the fastest and most conformant (if not the only) Relax NG validation solutions for these languages. Several changes have been made to the core modules, mostly to provide better separation of layers. This solution will work everywhere where the pipe() call is available in C; that is, most modern Unix, as well as cygwin. I have tested it with perl 5.005 and python 2.2 under FreeBSD, Linux and Win32/cygwin. I am willing to provide a sample for Ruby too,
| ||||
5. | Additional regexp support with RNV | |||
I have built and put on the server for download RNV for Win32 (built with cygwin tools). rnv there has pluggable datatypes support turned on and allows to specify regular expressions split in named parts. davidashen.net Readme is cygwin\readme-rnv.txt Sample is cd cygwin\usr\local\lib\rnv\samples rnv -e ../scm/dsl.scm addr-spec-dsl.rnc addr-spec.xml And the regexp looks like:
s-pattern="""
comment = "\(([^\(\)\\]|\\.)*\)"
atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
atoms = atom "(\." atom ")*"
person = "\"([^\"\\]|\\.)*\""
location = "\[([^\[\]\\]|\\.)*\]"
local-part = "(" atom "|" person ")"
domain = "(" atoms "|" location ")"
start = "(" comment " )?" local-part "@" domain "( " comment ")?"
"""
It helps me debug long regular expressions; hopefully, it will also be useful to others. A Scheme implementation of XML Schema regular expressions is a part of the RNV distribution; it makes possible to conveniently debug regular expressions by writing: (define addr-spec-regex
(let* (
(atom "[a-zA-Z0-9!#$%&'*+\\-/=?\\^_`{|}~]+")
(person "\"([^"\\\\]|\\\\.)\"")
(location "\\[([^\\[\\]\\\\]|\\\\.)*\\]")
(domain (string-append atom "(\\." atom ")*")))
(string-append
"(" domain "|" person ")"
"@"
"(" domain "|" location ")")))
...
(rx-match (rx-compile addr-spec-regex) s)
instead of: pattern=
"(\(([^\(\)\\]|\\.)*\) )?"
~ "([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
~ "(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*"
~ """|"([^"\\]|\\.)*")"""
~ "@"
~ "([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
~ "(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*"
~ "|\[([^\[\]\\]|\\.)*\])"
~ "( \(([^\(\)\\]|\\.)*\))?"
| ||||
6. | XML Schema datatype testing | |||
There is a NIST testsuite that checks XML schema datatypes for conformance, nist.gov Below are two simple transformations that convert xml schema test files (there are 1117 tiny schema files) into Relax NG (or Relax NG Compat). Could be useful for implementors and users.
<!-- XML -->
<xsl:transform xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<grammar
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
ns='{concat(/xsd:schema/xsd:element/@name,"-NS")}'>
<start>
<element name="{/xsd:schema/xsd:element/@name}">
<optional>
<attribute name="xsi:schemaLocation"/>
</optional>
<data>
<xsl:apply-templates
select="/xsd:schema/xsd:simpleType[1]/xsd:restriction"/>
</data>
</element>
</start>
</grammar>
</xsl:template>
<xsl:template match="xsd:restriction">
<xsl:attribute name="type"><xsl:value-of
select="@base"/></xsl:attribute>
<xsl:for-each select="*">
<param>
<xsl:attribute name="name"><xsl:value-of
select="name()"/></xsl:attribute>
<xsl:value-of select="@value"/>
</param>
</xsl:for-each>
</xsl:template>
</xsl:transform>
<!-- Compact -->
<xsl:transform xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
default namespace = "<xsl:value-of
select="/xsd:schema/xsd:element/@name"/>-NS"
namespace xsi = "http://www.w3.org/2001/XMLSchema-instance"
start = element <xsl:value-of
select="/xsd:schema/xsd:element/@name"/> {
attribute xsi:schemaLocation { text } ? ,
<xsl:apply-templates
select="/xsd:schema/xsd:simpleType[1]/xsd:restriction"/>
}
</xsl:template>
<xsl:template match="xsd:restriction">
xsd:<xsl:value-of select="@base"/> {
<xsl:for-each select="*">
<xsl:value-of
select="name()"/>="<xsl:value-of select="@value"/>"
</xsl:for-each>
}
</xsl:template>
</xsl:transform>
| ||||
7. | Can Jing simply ignore <!DOCTYPE ..> ? | |||
I believe the answer is "no".
But not because if its content model constraint declarations ... those are being ignored. You still need it there in case there are any entities that are needed by the document.
As with MSV, Jing needs to follow the SYSTEM identifier but it will ignore any content model constraints ... so can you put in the place of the DTD location even an empty file and things should proceed. Kohsuke Kawaguchi adds, Actually, MSV has an option to completely ignore an external DTD. In this mode, the parser will behave as if it didn't see the DOCTYPE declaration. | ||||
8. | A mode that display ".rnc" files in color? | |||
See "RNC Emacs Mode" at pantor.com | ||||
9. | RELAX-NG Documentation Tools | |||
I have been using RELAX-NG (very successfully, thank you ;-) on a project in the course of which I got the itch to produce documentation and diagrams from the RNG sources. The result is a stylesheet for creating rudimentary Docbook documentation from the documentation annotations of a RELAX-NG schema and a set of stylesheets for creating a variety of SVG diagrams from a schema. Details and download links are at techquila.com The stylesheets presented here are geared towards the production of documentation from a RELAX-NG schema. The RELAX-NG syntax allows schemas to be annotated with documentation strings. The stylesheets use the structure of the RELAX-NG schema instance and the documentation strings to produce Docbook XML and SVG diagrams. | ||||
10. | Relaxer schema compiler and eclipse plugin | |||
I am happy to announce that Relaxer version 1.0 is available for download. Home page Download relaxer zip file Relaxer is a schema compiler for RELAX NG. Relaxer generates Java sources, DTDs, XSLT scripts, HTMLs with FORM, and more from RELAX NG schemas. Relaxer supports almost full specification of RELAX NG. For example, Relaxer can handle the James Clark's RELAX NG schema for XHTML Modularization. <http://thaiopensource.com/relaxng/xhtml/> To read the tutorial, you can understand what Relaxer can do. Tutorial Also the Relaxer Eclipse Plugin version 0.1.0 is available for download. jar file Relaxer Eclipse Plugin is a eclipse plugin for the Relaxer schema compiler. Because it is in the early stage, documentations has not been prepared yet. However I guess that average eclipse users can understand the way to use easily.
| ||||
11. | Simplified? Simple syntax? | |||
> ? What's the definition of 'simplified' please? Simplification is a transformation defined in the RELAX NG specification: relaxng.org The simple syntax (not to be confused with the compact syntax) is a subset of the full syntax which you normally author schemas in. The semantics of RELAX NG validation are entirely described in terms of the simple syntax. Simply put, the simplification transformation flattens a schema into a more DTD like structure where the only remaining top level constructs are definitions containing a single element pattern. Additionally, there is a single start pattern. All other named patterns, includes, overrides, combination attributes and nested grammars go away. Also, inherited attributes are resolved so you don't have to look upwards to determine the value of for example a ns attribute. So, if you want to process a schema and don't care how the originial schema was structured and modularized, the simplified version is just much simpler to use. >> relaxng.org has links to tools that can perform the simplification rng2srng is a tool which will perform the simplification transformation described above.
| ||||
12. | How to resolve external references | |||
See this page, Abstract incelim takes a Relax NG grammar in XML syntax, expands all includes and externalRefs, and optionally replaces references to text, empty, or notAllowed with the patterns. The result is a 'compiled' schema convenient for distribution. The package includes stylesheets for each of the transformation steps, and two kinds of glue: XSLT stylesheet incelim.xsl, which chains the transformations using exsl:node-set(), and a shell script, incelim, which applies each of the stylesheets to the serialized result of the previous one. |