Chapter 1. Background

I asked on xml-dev list about validation and its history. Thanks to those that replied. This is some of the information that came out.

The Roots of SGML, by Charles G provides some history. From that we learn that “GML finally saw the light of day under its own name in 1973

Goldfarb makes the statement, “ The DCF GML User's Guide (IBM SH20-9160), which I wrote in 1978, includes the first published formal document type "descriptions" (DTDs) ” which puts validation back nearly 30 years. Then, “After the completion of GML, I continued my research on document structures, creating additional concepts, such as short references, link processes, and concurrent document types, that were not part of GML. By far the most important of these was the concept of a validating parser that could read a document type definition and check the accuracy of markup, without going to the expense of actually processing a document. At that point SGML was born -- although it still had a lot of growing up to do. ” This seems like a good baseline.

Early XML took the hassle out of DTD's in 96-98, making it easier to implement a validator. They also loosened the tie between instance and schema by allowing the schema-less document. xml WD 1 shows the starting blocks. Jon Bosak, who was there, tells the tale of offering instances without a DTD. This was of the nature of the two legs bad, four legs good debate! Reality seems to have won out, and this separation of the instance from the schema could well have set the seeds for the variety of validation needs which are being addressed by DSDL. Jon says “So without thinking about it very hard, or in the terms that I've just laid out for you, some of us working in the trenches had gotten used to the thought that DTDs were something separate from documents and that documents had a goodness or badness independent of any particular DTD.” This was based on experience outside of the working group at the time. The same document discusses UBL needs for phased validation.

Robin Cover tells us, at the bottom of the page, that DSDL was proposed in May 2001. The proposal reads more like a problem statement than a proposed solution! NVDL is part four of that piece of work. It's taking a long time, but it's getting there.

Murata Makoto mailed me with the following background

The origin of NVDL is RELAX Namespace (ISO/IEC DTR 22250-2:2001), which was submitted by Japan to JTC1 in early 2001. RELAX Namespace is quite different from NVDL, but RELAX Namespace introduced three ideas; 1 - namespace-based dispatching, 2 - decomposition of documents into pieces and 3 - validation of different pieces against different schemas. In other words, RELAX Namespace is basically NVDL minus modes, attach, unwrap, and some other mechanisms. I know that James was aware of RELAX Namespace and its basic ideas, since we were working together for RNG at that time. (I did explain the basic ideas of RELAX Namespace to James.) But I do not think he carefully studied RELAX Namespace. He started to be really interested after he the first committee draft of NVDL was created in SC34.

RELAX Core is restricted to single namespace documents, since my view on namespaces is radical. (I would say that the right way to use namespaces in schemas is to use NVDL rather than RNG.) But James insisted that RNG should provide some mechanisms for multi-namespace documents. I agreed.

By the way, Kohsuke Kawaguchi implemented RELAX Namespace even before he implemented RELAX NG.

On the basis of RELAX Namespace, the first working draft of NVDL was created in 2002 (ISO/IEC JTC 1/SC34 N363). This working draft was more sketchy than RELAX Namespace, so that it clearly fitted in the overall framework of NVDL. Then, James Clark created MNS in 2003, and then NRL in 2003 again. My original work, RELAX Namespace, was done without any influence from James. After the first committee draft of NVDL was written, we sometimes compared our notes.

On the basis of NRL, I finished NVDL at ISO/IEC JTC1 SC34. I did all editorial work

NVDL would not have been completed without siginificant contributions by James Clark. In particular, he introduced modes, unwrap, and attach. My biggest contributions were the idea of namespace-based validation dispatching and the editing of the 19757-4 text. This story is very similar to RELAX NG, where I started RELAX Core and James Clark designed TREX. RELAX NG is almost identical to TREX. My biggest technical contributions are the use of tree regular grammars for XML schemas and PSVI-free validation.

Thanks Makoto.

The need to validate in multiple namespaces developed from the need to mix vocabularies from different schemas. With the advent of namespaces, around 1998, the possibility arose of mixing different vocabularies in the same XML instance. In 2003 James Clark started developing a method of routing parts of an XML instance to a validation tool. Initially called Modular namespaces, then Namespace Routing Language, this work was integrated into DSDL, the Document Schema Definition Language, ISO/IEC 19757, being published in 2006.

The possibilities of this approach to validation are still being explored. The principles are that validation needs vary with the user needs. As there are many variant needs, so there needs to be many approaches to validation. DSDL supports this work.

I do not pretend to fully comprehend the extent of usage of NVDL. I shall present what I know. With help from other users I intend this to be as correct as I can make it. Any errors are my own. Please let me know.

Although NVDL does not mandate any particular style of schema, throughout this document I shall be using Relax NG.