Oasis catalogs

2009-08-11T12:01:33
Dave Pawson.  link
Home

catalogs

Yesterday I was playing with a large number of files, all of which were headed something like:


<!DOCTYPE xxx SYSTEM "filename.dtd">


With the minor difficulty that none of the files were in the same directory as the XML instance... and there were about 8 directories containing the files of interest... and many of the dtds referenced standard W3C entity sets. I immediately wondered about using an Oasis catalog resolver to resolve this issue. (Sorry, couldn't resist). First I had to pull all the DTD's and entity sets together. Bash did this for me. They were scattered all over one directory

#find /installation/erlang/otp_src_R13B01 -name "*.dtd" -exec  cp {} dtds \;   
find /installation/erlang/otp_src_R13B01 -name "*.ent" -exec  cp {} dtds \;

So that collected them in the ./dtds directory. Fine. Next, a catalog. I was using XSLT 2.0, Saxon 9, so my resolver had to be Java. Choice between Apache, which is Norm Walshs initial work and xmlresolver which is Norms more recent venture into catalogs. Find that one at ... java.net - Pick up both jars, you'll need them. This one is just a little different from the Apache one. The properties file is named XMLResolver.properties instead of CatalogManager.properties, though the actual catalog is the same, having the extensions mentioned at Norms blog entry above. This is the properties file

$ more XMLResolver.properties                                                  
#Controls the operation of the xml catalog manager
verbosity=6
relative-catalogs=yes
# Always use semicolons in this list
catalogs=erlang.catalog.xml
prefer=public
static-catalog=yes
allow-oasis-xml-catalog-pi=yes

I've got the debug turned up high (verbosity), to show what's happening. For the actual catalog, I'm using the model presented in version 1.1 (thanks George and Jirka) of Oasis catalogs. Don't forget the version... as I did. My catalog, for one of these entries, reads as follows


<?xml version="1.0" encoding="utf-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
	 >
  <systemSuffix
      systemIdSuffix="erlref.dtd"
      uri="/files/erlang/dtds/erlref.dtd"/>
..

 <systemSuffix
      systemIdSuffix="xhtml-lat1.ent"
      uri="/files/erlang/dtds/xhtml-lat1.ent"/>

</catalog>


which maps a couple of the files from a 'loose' DTD to a specific file location. I had about 50 of these. Now using it. First, if you want to test it out (not very reliable, but all I had available at the time) then you can try xmlcatalog from Daniel, though note it implements catalogs version 1.0! My use of systemSuffix is from 1.1 so it doesn't help much.

Just to show it working though... If I add an entry into the catalog


<system
  systemId = "chapter"
  uri = "/files/erlang/dtds/chapter.dtd"
  />

I can now use the xmlcatalog application to see how it resolves


$ xmlcatalog -v erlang.catalog.xml chapter.dtd
Resolve sysID chapter.dtd
0 Parsing catalog erlang.catalog.xml
erlang.catalog.xml added to file hash
No entry for SYSTEM chapter.dtd
Resolve URI chapter.dtd
No entry for URI chapter.dtd
Catalogs cleanup
Free catalog entry chapter
Free catalog entry erlang.catalog.xml
Free catalog entry


Which shows the resolution of the entity chapter.dtd

So, using XMLCatalogs, from ndw! I missed the parser when I first used it, which blew it up quite badly. Eventually, I finished up with a script ncats.sh which looked like:

java   -cp .:saxon9.jar:xercesImpl.jar:xmlresolver.jar:xmlresolver-sunjaxp.jar  \
       -Djavax.xml.parsers.SAXParserFactory=org.xmlresolver.sunjaxp.jaxp.SAXParserFactoryImpl \
-Djavax.xml.stream.XMLInputFactory=org.xmlresolver.sunxml.stream.XMLInputFactoryImpl\
-Djavax.xml.parsers.DocumentBuilderFactory=org.xmlresolver.sunjaxp.jaxp.DocumentBuilderFactoryImpl\
       net.sf.saxon.Transform   -o $3  -x org.xmlresolver.tools.ResolvingXMLReader \
       				       -y org.xmlresolver.tools.ResolvingXMLReader \
                                       -r org.xmlresolver.Resolver \
  -w1 $1  $2  "saxon.extensions=1"  $4 $5 $6

Note you'll have to change the location of the jar files to match your location! xercesImple.jar is from Apache, see apache project. xmlresolver.jar and xmlresolver-sunjaxp.jar are from java.net. That's it. Use? Remember I'm doing an XSLT 2 transform, so it is:

$ncats xmlfile xslfile outputfile

Which transforms the xml file into the output file using the xslfile! The unsubtle difference being that for any system or web references, the resolver is used.

The next issue was processing a whole list of files together. my exemplar missed it, so I clearly didn't understand the spec. Looking at Mike Kays documentation I see I can use a file listing, in XML. This tells you about it. There is an alternative, but I wanted to filter only XML files, so I used bash first, collecting the file listing into xml.

#!/bin/bash
#
# Collect a list of all the modules in a source package
# 2009-08-10T14:10:32Z. Dave Pawson
 
base=/installation/erlang/otp_src_R13B01/lib/
dirs=`ls -Alp $base | grep '/'| gawk '{print $8 }'`
opfile=collection.catalog.xml

echo listing dirs in $base

echo "<collection stable=\"true\">">$opfile
for dir in $dirs
   do
     entry=`ls  "$base""$dir"doc/src/*.xml`
     for f in $entry 
        do
            echo "<doc href=\"$f\"/>" >>$opfile
	done
   done
echo "</collection>" >>$opfile


This collects all the xml files. The root element (in my case) is sufficient to filter out only those xml files I want. This resulted in (truncated)


<collection stable="true">
<doc href="/installation/erlang/otp_src_R13B01/lib/appmon/doc/src/appmon_chapter.xml"/>
<doc href="/installation/erlang/otp_src_R13B01/lib/appmon/doc/src/appmon.xml"/>
<doc href="/installation/erlang/otp_src_R13B01/lib/appmon/doc/src/book.xml"/>
<doc href="/installation/erlang/otp_src_R13B01/lib/appmon/doc/src/fascicules.xml"/>
<doc href="/installation/erlang/otp_src_R13B01/lib/appmon/doc/src/notes.xml"/>
...
</collection>

Which is used by XSLT 2, as follows


  <xsl:template match="/">
    <xsl:text>Generated: Dave Pawson </xsl:text><xsl:value-of select="current-dateTime()"/> &nl;
    <xsl:for-each select="collection($catalog)/*">
      <xsl:apply-templates select="/erlref"/>

    </xsl:for-each> 
  </xsl:template>

Which runs through each of the files in the file list, and uses an imported xslt file which processes a single file!

Easy... once you've all the pieces together

Keywords: catalogs

Comments (View)

Return to main index