oXygen XML Editor

Catalogs and Docbook

1. What's a catalog
2. What use are catalogs
3. Where can I get software to manage catalogs
4. Whats an fpi or Formal Public Identifier.
5. Sample setup for entity resolution catalog using saxon
6. Resources.
7. Catalogs on libxml
8. Debugging catalog operation (Sun Resolver)
9. Hardcoded import paths
10. Multiple catalog problems
11. New catalog resolver
12. Order of catalog files
13. Catalog resolution
14. Using catalogs with Xalan
15. catalogs in XSLTC
16. Opening a DTD through a firewall!
17. Catalogs with Saxon and Sun resolver.
18. Sun resolver clarifications
19. Command line option (variable with import) via catalog
20. Override
21. Where to find the style-sheet fpi
22. Catalog usage
23. Use of public id's in a catalog?
24. next catalog question
25. Catalog problems with XEP

1.

What's a catalog

DaveP

A means of linking names with addresses in simple English. In its older, simple form, as used in SGML, it links names such as the docbook formal identifier, with addresses, i.e. the location on your disk or the web where they may be found.

Put simply, for SGML the catalog support software takes the fpi and find the system location for that fpi.

For XML it gets really clever. The Document Type Declaration with an fpi and an external DTD declaration can be used by the catalog resolver to re-direct the application to use other files. See Norms software for complete details, or the formal definition of XML Catalogs at the Oasis site.

2.

What use are catalogs

DaveP

Well, strictly speaking you don't need them. I found that I became annoyed by the problem that catalogs solve. To quote Norm Walsh

I have an XML document that I want to publish on the web or include in the distribution of some piece of software. On my system, I keep the doctype of the document in some local directory, so my doctype declaration reads:

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.0//EN"
  "file:///n:/share/doctypes/docbook/xml/docbookx.dtd">

As soon as I distribute this document, I immediately begin getting error reports from customers who can't read the document because they don't have DocBook installed at the location identified by the URI in my document. Drat!

The same happens when you try to use directly a document which refers to the schema that resides in a different directory on someone elses machine.

Another reason to have them in place is if you use emacs+psgml as your SGML XML editor. Emacs supports them naturally. Other editors do also, although I don't have a complete list. Do you? Let me know.

Norm Walsh wrote a whole exposition on them, and has worked for a goodly time to get them supported for XML and SGML, via the Oasis group. XML Catalogs is, at the time of writing, the latest work on catalogs.

3.

Where can I get software to manage catalogs

DaveP

Norm Walsh has written a piece of java that works with XML to provide catalog support in XML. See the sun web site for details and download. It includes examples as well.

4.

Whats an fpi or Formal Public Identifier.

DaveP + web sources

Its a string found in the document type declaration which identifies either informally or formally, what the document is, without specifying where the document type may be found. Docbook has a string such as

PUBLIC "-//OASIS//DTD XML DocBook V4.0//EN"
       

Which is one of the fpi's for docbook.

Murray explains it as well as I've seen it.

5.

Sample setup for entity resolution catalog using saxon

Denis Bradford

In my doc group's authoring environment, we use a catalog to point to the DTD and other resources. In order to do XSL transformations in the same environment, we needed to enable Saxon to use a catalog too. Norm Walsh's recently published XML Entity and URI Resolvers was just what the doctor ordered.

The instructions in this publication were indispensible. However, the real XML world is full of variables that make even simple instructions difficult to implement, especially for non-programmers like us. In the hope that concrete examples might help someone else, this describes how we configured a catalog for our toolchain on a Windows 2000 system, we expect the UNIX implementation to be similar when we get around to it. We may not have done it the best way, but it seems to work.

To illustrate how our system works, we will step through the components that relate to the catalog implementation: the software and documentation that we assembled and installed, and the configuration files and settings we created.

Installed components

The following table lists required software and essential documentation. Version information is shown just in case it's useful; the versions shown are not necessarily required.

ComponentVersion used
XSLT engineSaxon version 6.4.3 (2001-07-13)
Java 2 Runtime EnvironmentVersion 1.3.1
ParserXerces Java Parser 1.4.3
Sun entity resolver classPreview Version 0.2. See resolver.html in the distribution for important installation and configuration information.
Norm's instructions for changing the Saxon parserhttp://lists.oasis-open.org/archives/docbook-apps/200108/msg00148.html

We downloaded all of these components from their web sites and installed them according to the instructions provided.

We installed the Xerces parser because in Saxon 6.4.3, the native XML parser (Microstar's AElfred) has a bug that breaks the entity resolver. In this regard, Norm's message to the docbook-apps list about using Saxon with the Sun resolver classes was critical to our implementation. When and if that bug is fixed, the catalog implementation for Saxon should be a lot prettier.

Our DTD, ratl.dtd, is a customization of docbookx.dtd (with very few changes); likewise, our stylesheets are a customization of docbook.xsl.

Environment variables

We defined the following environment variables that directly and indirectly support our catalogs. In our Windows 2000 system, these are all defined at the user level (that is, as user variables).

  • CLASSPATH = .;C:\saxon\saxon.jar;C:\xerces-1_4_3\xerces.jar;c:\xmlentres\resolver.jar;d:\cfg

    The paths in our classpath are all required for our Java installations. The first three items are obviously where we installed Saxon, Xerces, and the resolver classes. The last item, d:\cfg, is the directory where we maintain our CatalogManager.properties file.

  • DBFACTORY = org.apache.xerces.jaxp.DocumentBuilderFactoryImpl

  • SPFACTORY = org.apache.xerces.jaxp.SAXParserFactoryImpl

The DBFACTORY and SPFACTORY environment variables point to jaxp classes in Xerces, and force Saxon to use the Xerces parser.

Catalog file

The catalog is the file that resolves external entity names (used in your documents) to some physical location. As I mentioned, we had an existing catalog (named catalog) that is used by our editor. It's not in the OASIS format required by the entity resolver. So for the new file, we basically transcribed the old pointers to the DTD and other external entities to public ids in the new format — somewhat simpler than Norm's example. We also gave it a different name: ratldocbook.cat. I'm sure you'll want to change that.

Example 1. Sample catalog: ratldocbook.cat

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

<public 
publicId="-//Rational//DTD DocBook Customization for Rational CMBU//EN" 
        uri="ratl.dtd"/>

<public 
publicId="-//Arbortext//ELEMENTS DocBook XML Information Pool V4.0//EN"
       uri="dbpoolx.mod"/>

<public 
publicId="-//Arbortext//ELEMENTS DocBook XML Document Hierarchy V4.0//EN" 
       uri="dbhierx.mod"/>

	.
	.
	.
</catalog>

Catalog resolver

In addition to your catalog, you must create the catalog resolver file, CatalogManager.properties. Adding the directory of the catalog resolver to your classpath enables the entity resolver to find this file, which defines the location and certain other properties of your catalog. See resolver.html for details. Ours looks like this:

Example 2. Sample catalog resolver

#CatalogManager.properties

verbosity=1

catalogs=file:///d:/denisb_view/doc/tools/xml/doctypes/ratldocbook.cat

prefer=public

static-catalog=yes

allow-oasis-xml-catalog-pi=yes

catalog-class-name=com.sun.resolver.Resolver

For debugging purposes, you can set verbosity as high as 4 to generate lots of messages.

The catalogs URI specifies the location of your catalog. How you specify the location determines your catalog's availability, such as through a network resource or Web server. The catalogs path shown in the example reflects our use of ClearCase to access shared elements. I wanted to use an environment variable here, but couldn't get that syntax to work. So for now, each user has to hard code the path to his or her local view directory.

I'm not 100% sure the prefer=public option is absolutely necessary, but it seems to work for us. Without having tested it, I assume that it assures that the public id will always be used, even if a document specifies a system id that is correct. I think this is the behavior we want - otherwise, why use a catalog?

Saxon invocation

The command line for invoking Saxon using the Xerces parser and the entity resolver required some trial and error. Here's what eventually worked for us:

Note that linebreaks should not be in the following, simply there for readability.

java -Djavax.xml.parsers.DocumentBuilderFactory=%DBFACTORY% \
-Djavax.xml.parsers.SAX ParserFactory=%SPFACTORY% com.icl.saxon.StyleSheet \
-r com.sun.resolver.tools.CatalogResolver -x com. sun.resolver.tools.ResolvingXMLReader \
..\xmldiffmrg.xml D:\denisb_view\doc\tools\xsl\cmbuxsl\xdm-htmlhelp.xsl 

6.

Resources.

DaveP

The following resources are available if you want to read up on catalogs in General.

Daniel Vaillard has a good write up on catalogs, and provides a tarball of an example, at xmlsoft.org.

The XML Catalog specification is relatively recent so there isn't much literature to point at:

You can find a good rant from Norm Walsh about the need for catalogs, it provides a lot of context informations.

The page from the OASIS Technical Committee on Entity Resolution who maintain XML Catalog, you will find pointers to the specification update, some background and pointers to others tools providing XML Catalog support .

7.

Catalogs on libxml

Daniel Veillard

Have a look at xmlsoft.org

> But it's frustrating when you try to use catalogs
> and they don't work.  It is hard to figure out *why*
> they don't work.  Did I give it the right syntax?
> Did it find my catalog? Which relative path didn't
> it get?  Was there something wrong
> with the catalog file? Or the catalog entry?
> Is there a debug option that traces the catalog resolution
> process?  The latter would be especially nice.  8^)

Hum, export XML_DEBUG_CATALOG=1 should do this. This may be incomplete or too verbose but the facility is there.

I think the page addresses most of those points, using the xmlcatalog command with the --shell option is IMHO a good way to learn an debug XML catalog construction. It is supposed to work with SGML catalogs too but I don't claim the same level of support (for example libxml2 2.4.7 broke the SGML Catalog handling, I got the bug report only yesterday, it's now fixed in 2.4.8).

Since XML Catalogs are a bit new I have put an example "installation ready" (at least for unices) of DocBook XML 4.1.2 + XML Catalogs at: xmlsoft.org

8.

Debugging catalog operation (Sun Resolver)

Norm Walsh

In order to see what's happening, you can pass a command line parameter to cause debug output.

In the Java Resolver code that Sun published set 'verbosity' to 4 or greater and you'll see darn near everything that happens.

9.

Hardcoded import paths

Norm Walsh


| I created some custom xsl files which refer to the docbook
| ones via <xsl:import href="/usr/share/sgml/docbook/../filename">
| statements. Would it be possible to avoid this hardcoding of pathnames?

No, but you can use XML Catalogs with most processors to avoid the problems that these URIs cause.

10.

Multiple catalog problems

Sasha Zucker

In brief, once I realized that the SGMLDECL directives were a problems (which translates into hours of reading list archives and online manuals), I hunted them down and pared down my catalog as appropriate.

The not-so-brief: unaware of the chaos that would result, I had set up a catalog file in share/sgml that pointed to every catalog installed by docbook (v1.0 - XML 4.1.2) openjade, and your modular stylesheets. After a couple of hours of searching list archives, reading SGML manuals, and fiddling with catalog files, I realized that including certain catalog files (such as pre-docbook 3.x and certain catalogs included with openjade) would start breaking things. So, I distilled my catalog down to the following:

CATALOG "/sw/share/sgml/dsssl/docbook-dsssl-nwalsh/catalog"
CATALOG "/sw/share/sgml/dtd/docbook/catalog"
CATALOG "/sw/share/sgml/entities/iso8879/catalog"
CATALOG "/sw/share/sgml/openjade-1.3/dsssl/catalog"
CATALOG "/sw/share/xml/dtd/docbookx/catalog"

the docbook catalogs only point to docbook >= v3.0

Note: I use /sw as the base directory because I am working on fink (http://fink.sf.net) packages to install docbook dtds/dsssl and sgml entities correctly on macos x.

On the bright side, the package finally installs correctly. All I have to do post install is comment out the DTDDECL directives. When the package is finalized, MacOS X fink/users will be able to set up a complete docbook authoring environment with a few commands, which will hopefully mean less newbie questions from the likes of me. ;)

11.

New catalog resolver

Jirka Kosek

For you who are using "old" Saxon catalog stuff (using "new" method with new Norm's resolver is encouradged -- see http://www.sun.com/software/xml/developers/resolver/) I upgraded information page at my web page

Starting from Saxon 6.5.1 catalog support can use Saxon's internal parser, dependency on Crimson was removed. Off course you can still use Crimson with new version of Saxon.

12.

Order of catalog files

Matt Gruenke

The ORDER in which you specify catalog files (i.e. in SGML_CATALOG_FILES) DOES MATTER!! I have them listed in this order:

* docbook.cat (from the XML docbook DTD)
* catalog, from the DSSSL distribution
* catalog, from the OpenJade distribution, in the dsssl dir
* my own catalog that maps the PUBLIC and SYSTEM identifiers for the XML DocBook DTD to the location where I installed it

Nick Hunt adds

For those who may be interested, my setup is this:

1. SGML_CATALOG_FILES env variable points to my "super-catalog".

2. This in turn contains the following elements:

OVERRIDE = YES

A SYSTEM identifier that can be put into the doctype declaration of the xml files:

SYSTEM "urn:x-oasis:docbook-xml-v4.1.2"
"wherever/the/docbook/dtd/is/docbookx.dtd"

Translation of the PUBLIC to SYSTEM identifiers for the ISO entities, in the form:

PUBLIC "ISO 8879:1986//ENTITIES Diacritical Marks//EN"
"/users/shared/isoents/isodia"

Note that these point to the SGML version of the entity files, not the XML ones that come with docBook.

The docbook catalog:

CATALOG /users/shared/docbook/lib/docbook/docbook-dtd-412/docbook.cat

The dsssl catalog:

CATALOG /users/shared/dsssl/docbook/catalog

The jade catalog:

CATALOG /users/shared/dsssl/openjade/catalog

Note that the three included catalogs are unmodified, so updating should be easy.

13.

Catalog resolution

Steffen Maier


> Now I have the following in the driver which I want to be located via a 
> SYSTEM entry in the catalog.

> <xsl:import href="docbook.xsl"/>

> I have a catalog that points to the docbook dtds which is working and I 
> am using the older Java catalog resolution software.

> CATALOG "docbook-dtd/docbook.cat"
> SYSTEM  "urn:?:?"  "docbook-xsl/xhtml/docbook.xsl"

Seems as if you tried SGML-CATALOGs which are "only" capable of resolving PUBLIC or SYSTEM Identifiers. But the href attribute in xsl:import is an URI neither PI nor SI. To rewrite that stuff you need XML-Catalogs. In case you want Norm's latest resolver implementation try sun.com. The description of XML-Catalogs is at the oasis website.

Watch out for an Oasis update. (April 2002)

14.

Using catalogs with Xalan

Eric Richardson



The following is the layout for the demonstration of Catalog support(resolver-1.1) using Xalan.

jdocbook
       docbook-xsl
       docbook-dtd
       catalog.xml
       

The layout above is demonstrated so that the relationship between components can be understood. The docbook-xsl contains the XSLT stylesheets. The docbook-dtd contains the Docbook DTDs entities etc. used for validation. The catalog.xml file is the catalog used be the resolver to map local resourses to the ones referenced in stylesheet customization layers and in the Docbook documents. The following is a example catalog:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE catalog
   PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
 
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog" xml:base=".">

   <!-- for import driver files, one for each stylesheet used -->
   <uri name="xhtml/docbook.xsl" uri="docbook-xsl/xhtml/docbook.xsl"/>
   <uri name="xhtml/chunk.xsl" uri="docbook-xsl/xhtml/chunk.xsl"/>
   <uri name="fo/docbook.xsl" uri="docbook-xsl/fo/docbook.xsl"/>

   <!-- the docbook catalog -->
   <nextCatalog catalog="docbook-dtd/docbook.cat" prefer="public"/>
</catalog>

The resolver works by finding the catalog above first and then reads in the information to resolve other resources. If we take a look at the above file there is one entry for each stylesheet we would like to use. The uri attribute in the <uri> element is the relative path from the catalog.xml file to the stylesheets. The name attribute is the one we are using in our stylesheet customization file commonly called a driver. The following shows the entry in the driver file that allows it to find the imported stylesheet independent of driver file location.

<xsl:import 
href="xhtml/docbook.xsl"/>

The catalog.xml file must be referenced from the Catalog.properties file which needs to be located on the classpath or via a system property passed to the Java VM. The system property is -Dxml.catalog.files=path/to/catalog.xml where the name of the catalog file and path to the file can be changed based on your setup. The follow shows an Ant target the uses the system property approach to using the Resolver. Note that the resolver.jar needs to be included in the classpath.

<target name="html" depends="prepare">

   <echo message="Transforming: ${input.file}"/>
   <echo message="Using Driver: ${xhtml.driver}"/>
   <echo message="Into        : ${output.file}"/>

   <!-- Transform document into HTML: fork false doesn't work -->
   <java fork="true"
classname="${xslt.processor}"
classpathref="xalan.classpath">
     <sysproperty key="xml.catalog.verbosity" value="3"/>
     <sysproperty key="xml.catalog.prefer" value="public"/>
     <sysproperty key="xml.catalog.files" value="${catalog.file}"/>
     <arg line="-IN ${input.file}
-XSL ${xhtml.driver}
-OUT ${output.file}
-entityresolver com.sun.resolver.tools.CatalogResolver
-uriresolver com.sun.resolver.tools.CatalogResolver"/>
   </java>
</target>
       

This above translates to the following for using directly on the command line. Of course paths to the jar files and text files referenced would need to be adjusted. Note that FO processing using FOP is very similar to this approach.

java
   -classpath xalan.jar:xerces.jar:bsf.jar:resolver.jar
   -Dxml.catalog.verbosity=3
   -Dxml.catalog.prefer" value="public
   -Dxml.catalog.files=catalog.xml
   org.apache.xalan.xslt.Process
   -IN example.xml
   -XSL myhtmldriver.xsl
   -OUT example.html
   -entityresolver com.sun.resolver.tools.CatalogResolver
   -uriresolver com.sun.resolver.tools.CatalogResolver
       

An article on the resolver article and software can be found at the Sun Web Site Sun Web Site

15.

catalogs in XSLTC

Daniel Veillard



> The best solution is to use catalog files within your XML/SGML system.
> One way how to add them to Saxon is described in
> http://www.kosek.cz/xml/saxon/. Other way is to use XML resolver classes
> from Norm, I didn't URL in my hand, try Google or search archives.

I have put some explanations on-line too and a specific page for xsltproc users. This is becoming a real FAQ as we are deploying catalogs on Linux/Unix to handle the DocBook docs of the Gnome project catalog.html and docbook.html

16.

Opening a DTD through a firewall!

Bob Stayton



> The problem is the <!DOCTYPE> header, created by profile.xsl:

> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> "http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd">

> saxon hits the firewall when it's trying to open the DTD from the
> URL of the SYSTEM identifier.

> Appearently it doesn't pick up the http_proxy setting from the
> environment, or it is unable to use it.

> So the question is: do I massage the <!DOCTYPE>?  Or do I try to make
> saxon use the proxy server?

It might be easier to resolve the DTD reference locally. You can use an XML catalog to map the PUBLIC ID to a local filesystem DTD, like this:

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <group xml:base="/usr/share/xml/" prefer="public" >
       <public 
         publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
         uri="docbook412/docbookx.dtd"/>
  </group>
</catalog>

The <group> element provides the base path and the preference for using the public id. So this would resolve the PUBLIC ID to the local file "/usr/share/xml/docbook412/docbookx.dtd".

17.

Catalogs with Saxon and Sun resolver.

Steffen Maier



> > I'm trying to get Ant to use the catalog code so I'm trying to pass a 
> > -D<resolver property-unknown>=com.sun.resolver.tools.CatalogResolver to 
> > ant. I traced the ant code and it passes the -D into the system 
> > properties so it should work if Xerces2 will use it internally.
> >
> > Is there a property I can set that will get the SAXReader to use the 
> > CatalogResolver in Xerces2?
> > 
> > The Sun resolver has the wrapper for xp and xt for the parser. I see 
> > there is a org.xml.sax.driver property, so is there a 
> > org.xml.sax.resolver property or a driver or something for xerces?

> I dug into this pretty extensively but still don't understand it all. It 
> seems that there are three -D<name>=<value> that java understands. The 
> names are as follows.
> javax.xml.parsers.SAXParserFactory
> javax.xml.parsers.DocumentBuilderFactory
> org.xml.sax.parser

> Passing one of these with the value of an appropriate class file should 
> force JAXP in the first two to use the passed in class file. Likewise 
> for the SAX parser. I have not tried this out yet.

> I looked at the source code for ant and they do the following. First 
> they do a class.forName on the javax.xml.parsers.SAXParserFactory which 
> forces a SAXParserFactory class into memory if it is found. Later they 
> do SAXParserFactory.newInstance(). Later they use their own version of 
> an entity resolver so I don't think any attempts to use the resolver 
> will work with at least ant 1.4.1. I wanted internal subsets to use 
> Norm's entity resolver so shared properties and targets in ant could be 
> held in one location for the ease of maintainability.

Good news: it does work... somehow at least ;-)

> I still would like to know which class of the resolver would be 
> appropriate for the SAXParserFactory or if the new resolver code has to 
> be used at a lower level as used in Xalan. This question is for the 
> xerces2 parser as the documentation shows hoe for xp and xt.

Sorry for quoting your whole message but it's been awhile that I worked through all that mess and I forgot a lot. So I base my reply on a private mail I sent to one of the list member earlier this year.

At the moment it doesn't seem to be possible to force saxon to use the catalog resolver when using ant's style task. Saxon provides command line parameters but no java properties to configure a URIResolver, neither as saxon extension nor through JAXP (which supports parsers and transformers through java properties but no URIResolver, the latter would have to be implemented through direct use of jaxp methods if I understood the spec correctly).

Furthermore there were some issues concerning failing classpath and disfunctional systemproperties within the style task:

search result:

http://marc.theaimsgroup.com/?l=ant-dev&w=2&r=1&s=saxon&q=b

selected postings:

http://marc.theaimsgroup.com/?l=ant-dev&m=100564428929226&w=2
http://marc.theaimsgroup.com/?l=ant-dev&m=100015372026767&w=2

(http://marc.theaimsgroup.com/?l=ant-user&w=2&r=1&s=docbook&q=b)

search result:

http://marc.theaimsgroup.com/?l=ant-user&w=2&r=1&s=saxon&q=b

selected postings:

http://marc.theaimsgroup.com/?l=ant-user&m=100478289007456&w=2
http://marc.theaimsgroup.com/?l=ant-user&m=99309341227134&w=2
http://marc.theaimsgroup.com/?l=ant-user&m=99245693820222&w=2

To overcome these problems I used a script to call ant. Theoretically not all the parameters in there would be necessary, especially if one does without the style task and only uses java tasks (where classpath and systemproperties can be specified). The "." in classpath is for the file CatalogManager.properties. Crimson.jar is not in the classpath because ant.jar contains "Class-Path: jaxp.jar parser.jar crimson.jar optional.jar xalan.jar", so crimson can be used for ...SAXParserFactory right away.

BTW, I just used crimson because saxon's parser aelfred had a bug when resolving relative URIs which revealed as soon as one used the catalog resolver instead of aelfred's default resolver. This was true for saxon 6.4.4 and might be fixed as of 6.5.

The other software I used was ant-1.4.1 as well as crimson and jaxp packaged in Sun's java_xml_pack-fall01 and Norm's catalog resolver, of course.

file b.bat - all on one line of course

@REM     catalog       ;website-ext;db-extensions;saxon-itself;...
java -cp \
resolver.jar;.;saxon64.jar;saxon644.jar;saxon.jar;ant.jar;jaxp.jar \
 -Djavax.xml.transform.TransformerFactory=\
   com.icl.saxon.TransformerFactoryImpl \
 -Djavax.xml.parsers.SAXParserFactory=\
  org.apache.crimson.jaxp.SAXParserFactoryImpl \
 org.apache.tools.ant.Main %1 %2 %3 %4 %5 %6 %7 %8 %9

file build.xml

<project name="foo" default="test" basedir=".">
    ...
    <path id="classpath.saxon">
      <pathelement path="${classpath}"/><!-- is empty at this point? -->
      <pathelement location="saxon.jar"/><!-- saxon 6.4.4 -->
    </path>
    <path id="classpath.saxon.ext">
      <pathelement location="saxon644.jar"/><!-- db-xsl-1.45 -->
      <pathelement location="saxon64.jar"/><!-- db-website-2.0b1 -->
    </path>
    <path id="classpath.resolver">
      <pathelement location="resolver.jar"/>
      <pathelement location="."/><!-- for CatalogManager.properties -->
    </path>
    <path id="classpath.crimson">
      <pathelement location="crimson.jar"/>
    </path>
    <property name="html.generator" value="com.icl.saxon.StyleSheet"/>
    <property name="generator.args" value="-x
com.sun.resolver.tools.ResolvingXMLReader -y
com.sun.resolver.tools.ResolvingXMLReader -r
com.sun.resolver.tools.CatalogResolver"/><!-- "-t" -->
    ...
  <!-- =================================================================== -->
  <!-- Generate layout for stylesheets only if necessary                   -->
  <!-- =================================================================== -->
  <target name="autolayout.xml" depends="init-src">
    <style basedir="${build.src}"
           extension="xml"
           style="${build.src}/${autolayout.xsl}"
           destdir="${build.dest}"
           processor="trax"
           in="${build.src}/${layout.xml}"
           out="${build.src}/${autolayout.xml}">
      <classpath refid="classpath.saxon"/>
      <classpath refid="classpath.saxon.ext"/>
    </style>
  </target>

  <!-- =================================================================== -->
  <!-- Compiles the source directory                                       -->
  <!-- =================================================================== -->
  <target name="pre-compile" depends="init-src,autolayout.xml,template"
          description="Generate webpages by transforming sources through an XSLT processor.">
    <!-- Somehow basedir gets ignored, so source-docs are not found! -->
<!-- THE FOLLOWING STYLE-TASK DOES NOT WORK !
    <style basedir="${build.src}"
           style="${build.src}/${stylesheet}"
           destdir="${build.dest}"
           processor="trax"
           out="${build.src}/foo"
           in="${build.src}/${autolayout.xml}">
      <classpath refid="classpath.saxon"/>
      <classpath refid="classpath.saxon.ext"/>
      <classpath refid="classpath.resolver"/>
      <classpath refid="classpath.crimson"/>
    </style>
-->
<!-- just in case style should not work, use following: -->
<!-- dir is ignored if fork is not enabled ! -->
<!-- classpath.crimson is really needed to find crimson, at least if fork=yes
-->
    <java fork="yes" classname="${html.generator}" dir="${build.src}">
      <classpath refid="classpath.saxon"/>
      <classpath refid="classpath.saxon.ext"/>
      <classpath refid="classpath.resolver"/>
      <classpath refid="classpath.crimson"/>
      <sysproperty key="javax.xml.parsers.SAXParserFactory"
value="org.apache.crimson.jaxp.SAXParserFactoryImpl"/>
      <sysproperty key="javax.xml.transform.TransformerFactory"
value="com.icl.saxon.TransformerFactoryImpl"/>
      <arg line="${generator.args} ../../${build.src}/${autolayout.xml} 
../../${build.src}/${stylesheet}"/>
    </java>
    <copy todir="${build.dest}">
      <fileset dir="${build.predest}" includes="**/*"/>
    </copy>
  </target>
  ...
</project>

File CatalogManager.properties

#CatalogManager.properties

# 0 is off, 1 seemed to be also off so far
# 2 is reasonable, 4 is very verbosy (including catalog parse success!)
verbosity=1

# Always use semicolons in this list
catalogs=file:/d:/usr/share/sgml/CATALOG;file:/d:/usr/share/sgml/catalog.xml

prefer=public

static-catalog=yes

allow-oasis-xml-catalog-pi=yes

# catalog-class-name=com.sun.resolver.Resolver
#***EOF***

18.

Sun resolver clarifications

Bob Stayton



>  In the documentation for the catalog resolver , it does 
> not state if there are default values for verbosity etc. I also was 
> wondering in there was a resolution protocol for using a properties file 
> and a system property? I would think that the system one would override 
> the property file one. Thanks for adding the system properties Norm. It 
> made integration with Ant easier so I don't need to change properties in 
> two places.

> The second question I have is about the actual catalog. I have the 
> following based on previous usage of just pointing to the catalog.
> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

>    <public publicId="-//OASIS//DTD XML DocBook V4.1.2//EN"
>      uri="docbook-dtd/docbook.cat"/>
>    <!-- docbookx.dtd? -->
> </catalog>

Your comment is on the right track. You currently have the public ID for the Docbook DTD resolving to the SGML catalog file. That won't load the DTD. You want to resolve the public ID to the main dtd file:

    <public publicId="-//OASIS//DTD XML DocBook V4.1.2//EN"
      uri="docbook-dtd/docbookx.dtd"/>

You don't need the rest of the docbook.cat because the XML dtd has SYSTEM identifiers that resolve to the relative filenames dbpoolx.mod, etc. Those are resolved relative to docbookx.dtd, so all you have to do is locate docbookx.dtd with your XML catalog.

> The old one pointed at docbook.cat so I'm not sure if I should use 
> following tag instead.
> <nextCatalog
>    id = id
>    catalog = uri-reference
>    xml:base = uri-reference />

You would use nextCatalog to include another XML Catalog, not the SGML catalog.

> Also about the import on the customization xsl which points to the 
> stylesheets for docbook; does somebody have an example. Should I use 
> rewrite or are there more than one way to do it?

There are many ways to do it, depending on what you want to do. Here is one way of doing it.

I keep my catalog in a location parallel to the stylesheets. I want to be able to refer to different stylesheets (chunk, fo, etc.):

/usr/share/xml/catalog/catalog.xml
/usr/share/xml/xsl/docbook-xsl-1.49/html/docbook.xsl
                                   /html/chunk.xsl
                                   /fo/docbook.xsl

and I want portable references in my Makefiles that work regardless of where the files are located or if the stylesheets are updated to a new release. So I use a stable "virtual" path in the Makefile command, which the catalog resolves to the actual location of the installed XSL files.

The catalog looks like this:

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN"
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd" >

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <group id="docbook-xsl" xml:base="../xsl/">

       <rewriteSystem id="docbook-xsl-file"
         systemIdStartString="/tools/xsl/docbook/"
         rewritePrefix="docbook-xsl-1.49/"/>

       <rewriteURI id="docbook-xsl-uri"
         uriStartString="/tools/xsl/docbook/"
         rewritePrefix="docbook-xsl-1.49/"/>

  </group>
</catalog>

I use both rewriteURI and rewriteSystem because some tools use URIs and some use system identifiers.

Then I point the XSL processor to the catalog location. The catalog can resolve pathnames relative to the catalog file location. The command in the Makefile looks like this:

XML_CATALOG_FILES=/usr/share/xml/catalog/catalog.xml \
  xsltproc  /tools/xsl/docbook/html/chunk.xsl  input.xml

The "/tools/xsl/docbook" directory doesn't actually exist. The catalog resolves the stylesheet reference as follows:

1. /tools/xsl/docbook/chunk.xsl matches the StartString "/tools/xsl/docbook" in the catalog.
2. That item rewrites that prefix to be "docbook-xsl-1.49", so now I have "docbook-xsl-1.49/html/chunk.xsl".
3. The <group> element adds a base of "../xsl", so now I have "../xsl/docbook-xsl-1.49/html/chunk.xsl".
4. This path resolves relative to the catalog file location, so now I have "/usr/share/xml/catalog/../xsl/docbook-xsl-1.49/html/chunk.xsl" which is really just: "/usr/share/xml/xsl/docbook-xsl-1.49/html/chunk.xsl" when you resolve the "../catalog" part.

Voila, I've got a full path to the xsl file.

If I want the fo stylesheet, my stylesheet reference is /tools/xsl/docbook/fo/docbook.xsl, which resolves in a similar manner.

When the 1.50 stylesheets pass muster, I install them in /usr/share/xml/xsl/docbook-xsl-1.50.0 and change the catalog entry to point to that directory:

       <rewriteSystem id="docbook-xsl-file"
         systemIdStartString="/tools/xsl/docbook/"
         rewritePrefix="docbook-xsl-1.50.0/"/>

I don't have to edit dozens of Makefiles.

XML catalogs are way cool for doing this kind of stuff.

BTW, if you are using xsltproc, you'll want to initially set the env variable XML_DEBUG_CATALOG=1 so you can see how each reference is being resolved. Very helpful for debugging unresolved references.

19.

Command line option (variable with import) via catalog

Bob Stayton

I've used another technique with XML catalogs that simulates the use of variables in xsl:import with a wrapper script:

1. Use a script whose first step is to take the desired path value and generate a temporary XML catalog file that maps the hardcoded xsl:import path to your new path value.

2. Have your script execute your XSL process using that XML catalog.

3. Then the script deletes the temporary catalog file.

XML catalogs can map any stylesheet path to another path. If your <xsl:import> looks like this:


  <xsl:import href="/usr/share/docbook/html/docbook.xsl"/>

And you want to instead use:

  /home/foo/docbook-xsl/html/docbook.xsl

Then this catalog entry in the temporary catalog file will map it:

       <rewriteSystem 
         systemIdStartString="/usr/share/docbook/html/"
         rewritePrefix="/home/foo/docbook-xsl/html/"/>

Since the generation of the catalog file can be controlled by any number of variables in the script, you have complete control over the rewritePrefix value.

BTW, if you are already using an XML catalog file for other purposes, then just add a line like this to your temporary catalog:

 <nextCatalog catalog = "/path/to/your/regular/catalog.xml"/>

Of course, this setup requires a processor and configuration that can use XML catalogs. But they are pretty much required when you need portability.

20.

Override

Norm Walsh


| I just figured this out: if I put the "OVERRIDE YES" 
| option at the top of
| the catalog file (as it is in my catalog file for xhtml) everything works
| fine. If anyone could explain _why_ this works, I'd appreciate it.

OVERRIDE YES in OASIS Catalogs tells the processor that PUBLIC identifiers are to be preferred over SYSTEM identifiers. With OVERRIDE NO, a SYSTEM identifier (if specified) will be used in preference to the PUBLIC identifier.

Since XML *requires* a SYSTEM identifier, OVERRIDE YES is necessary in order for the OASIS Catalog to have any effect if you're using XML.

21.

Where to find the style-sheet fpi

Norm Walsh


| Okay.  According to that error message, that the file named
|
| /sw/share/sgml/dsssl/docbook-dsssl-nwalsh/print/docbook.dsl
|
| contains the following line
|
| <!DOCTYPE style-sheet PUBLIC "-//James Clark//DTD DSSSL Style
| Sheet//EN" [
|
| and that OpenJade wants to know how to resolve this public identifier
| into a SYSTEM file.  

Right. You need a catalog file for Jade to find it's doctypes. But they're part of the Jade distribution so adding a '-c' catalog to your Jade command line ought to fix it.

| Should I manually add a line to the catalog file that does a mapping
| here?  And if so, what file on my SYSTEM should I map this PUBLIC
| identifier to?

On my system, this catalog exists in the directory where Jade was installed. Make sure you've got a catalog like this on your SGML_CATALOG_FILES path or on a -c argument to Jade.

PUBLIC "-//James Clark//DTD DSSSL Flow Object Tree//EN" "fot.dtd"
PUBLIC "ISO/IEC 10179:1996//DTD DSSSL Architecture//EN" "dsssl.dtd"
PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" "style-sheet.dtd"

22.

Catalog usage

Norm Walsh


| 1. When should I use PUBLIC identifiers
|    respectively SYSTEM identifiers?

In your documents, you should always use both. In catalogs, it may be handy to map both. My catalog contains:


  <public publicId="-//OASIS//DTD DocBook XML V4.2//EN"
          uri="/share/doctypes/docbook42/xml/docbookx.dtd"/>

  <system systemId="http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
          uri="/share/doctypes/docbook42/xml/docbookx.dtd"/>

  <?xml version='1.0'?>
  <!DOCTYPE catalog "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
           "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">

  <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

  <group prefer="public" xml:base="/usr/share/sgml/docbook/xml-dtd/">

    <public publicId="-//OASIS//DTD DocBook XML V4.2//EN"
            uri="docbookx.dtd"/>

    <system systemId="http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
            uri="docbookx.dtd"/>
  </group>
  </catalog>

|    Does the XML processor look in the catalog for an entry identified by
|    "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN", and then, if
|    that entry doesn't exist, look at "http://www.oasis-open.org
|    /committees/entity/release/1.0/catalog.dtd"?

Since you've said prefer=public, the resolver begins by looking for a matching public identifier. It finds one and therefor it resolves the DTD for your document to "/usr/share/sgml/docbook/xml-dtd/docbookx.dtd".

Had you omitted the public identifier from the DocType declaration, it would have found the matching system identifier and done the same thing.

23.

Use of public id's in a catalog?

John Cowan



> According to Section 7.1.2 of the spec if a system identifier can be matched 
> in the catalog then that will always be used; an accompanying public 
> identifier is only used as a fallback if the system id doesn't match 
> (or is unavailable) and the prefer setting is "public".

So far so correct.


> So public ids are 
> only useful if the system identifier isn't available.

Not so. A system identifier matches only if there is a system, rewriteSystem, or delegateSystem entry in the catalog. So if there are none of these, then as long as prefer="public" is set (which it should always be), then public, rewritePublic, and delegatePublic rules are applied.

The setting prefer="system" is really only useful for SGML backward compatibility. Historically, the SGML system id was generally a local one, whereas the public id was meant to be universal. XML has made the system id universal instead; nevertheless, XML Catalogs were meant to be semantically identical to TR9401 catalogs.


> Given that you 
> have to provide both in XML, public ids seem pretty redundant.

There are two cases in which a public id can exist without a system id: in a NOTATION declaration, and after delegating to a new catalog via delegatePublic (in which case the original system id is discarded).


> Toying with the command-line resolver application shows behaviour consistent 
> with this reading.

> Everything I've read up to now set me up to expect the following: that 
> Public Ids would be preferred and that System Ids would be the fallback.

> What am I missing?

[1]. oasis-open.org [2]. apache.org [3]. oasis-open.org

24.

next catalog question

Bob Stayton



> Is it possible to have a <nextCatalog> for both DocBook XML 4.1.2, and
> 4.2 in a single XML catalog?  Or do I need separate catalogs for this?

> I'm in the position where two new documents are using 4.2 (in an
> attempte at getting <olink> to work), while all production documents
> are using 4.1.2.

Yes, you can have a <nextCatalog> that points to an XML catalog for 4.1.2 and another <nextCatalog> that points to 4.2. If it is resolving the PUBLIC identifier in the document, then there is no ambiguity.

25.

Catalog problems with XEP

Alexander Peshkov



> I have some trouble of the sort
> {?Could not retrieve image from 'http://docbook.sourceforge.net/release/images/draft.png': 
>   java.net.NoRouteToHostException: No route to host}
> when trying to render a docbook document translated to fo with XEP. (this is 
> due to the fact that we are now firewalled and use a proxy). But more 
> importantly, I would like to avoid fetching things like that, as they are
> locally installed in my case (eg 
> /usr/share/sgml/docbook/stylesheet/xsl/nwalsh/images/draft.png).
> I thought this would be taken care of by catalog files, but I couldn't find
> any mapping of docbook.sourceforge.net to local files. Is there a reason ?

First of all you can just turn off Docbook draft mode using "draft.mode" parameter or change URL of watermark image using "draft.watermark.image". You can set these parameters in your customization stylesheet or using command-line (please refer to Docbook stylesheets documentation).

The more general approach is to setup XML catalogs properly. Very helpful info about XML catalogs creation can be found here: sun Note that starting from version 3.5 XEP provides support for XML catalogs (so there is no real need for extension any more). Use of XML catalogs with XEP described in section "3.5. Resolution of External Entities and URIs" of "XEP 3.5 User Guide" (userguide.pdf, included in XEP distribution).