2009-08-11T12:01:33
Dave Pawson.
link
Home
catalogs
Yesterday I was playing with a large number of files, all of which were headed something like:
<!DOCTYPE xxx SYSTEM "filename.dtd">
With the minor difficulty that none of the files were in the same directory as the XML instance... and there were about 8 directories containing the files of interest... and many of the dtds referenced standard W3C entity sets. I immediately wondered about using an Oasis catalog resolver to resolve this issue. (Sorry, couldn't resist). First I had to pull all the DTD's and entity sets together. Bash did this for me. They were scattered all over one directory
#find /installation/erlang/otp_src_R13B01 -name "*.dtd" -exec cp {} dtds \;
find /installation/erlang/otp_src_R13B01 -name "*.ent" -exec cp {} dtds \;
So that collected them in the ./dtds directory. Fine. Next, a
catalog. I was using XSLT 2.0, Saxon 9, so my resolver had to be
Java. Choice between Apache,
which is Norm Walshs initial work and xmlresolver
which is Norms more recent venture into catalogs. Find that one at
... java.net
- Pick up both jars, you'll need them. This one is just a little
different from the Apache one. The properties file is named
XMLResolver.properties instead of
CatalogManager.properties, though the actual catalog is
the same, having the extensions mentioned at Norms blog entry
above. This is the properties file
$ more XMLResolver.properties #Controls the operation of the xml catalog manager verbosity=6 relative-catalogs=yes # Always use semicolons in this list catalogs=erlang.catalog.xml prefer=public static-catalog=yes allow-oasis-xml-catalog-pi=yes
I've got the debug turned up high (verbosity), to show what's happening. For the actual catalog, I'm using the model presented in version 1.1 (thanks George and Jirka) of Oasis catalogs. Don't forget the version... as I did. My catalog, for one of these entries, reads as follows
<?xml version="1.0" encoding="utf-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
>
<systemSuffix
systemIdSuffix="erlref.dtd"
uri="/files/erlang/dtds/erlref.dtd"/>
..
<systemSuffix
systemIdSuffix="xhtml-lat1.ent"
uri="/files/erlang/dtds/xhtml-lat1.ent"/>
</catalog>
which maps a couple of the files from a 'loose' DTD to a specific
file location. I had about 50 of these. Now using it. First, if you
want to test it out (not very reliable, but all I had available at the
time) then you can try xmlcatalog from
Daniel, though note it implements catalogs version 1.0! My use of
systemSuffix is from 1.1 so it doesn't help much.
Just to show it working though... If I add an entry into the catalog
<system systemId = "chapter" uri = "/files/erlang/dtds/chapter.dtd" />
I can now use the xmlcatalog application to see how it resolves
$ xmlcatalog -v erlang.catalog.xml chapter.dtd Resolve sysID chapter.dtd 0 Parsing catalog erlang.catalog.xml erlang.catalog.xml added to file hash No entry for SYSTEM chapter.dtd Resolve URI chapter.dtd No entry for URI chapter.dtd Catalogs cleanup Free catalog entry chapter Free catalog entry erlang.catalog.xml Free catalog entry
Which shows the resolution of the entity chapter.dtd
So, using XMLCatalogs, from ndw! I missed the parser when I first
used it, which blew it up quite badly. Eventually, I finished up with
a script ncats.sh which looked like:
java -cp .:saxon9.jar:xercesImpl.jar:xmlresolver.jar:xmlresolver-sunjaxp.jar \
-Djavax.xml.parsers.SAXParserFactory=org.xmlresolver.sunjaxp.jaxp.SAXParserFactoryImpl \
-Djavax.xml.stream.XMLInputFactory=org.xmlresolver.sunxml.stream.XMLInputFactoryImpl\
-Djavax.xml.parsers.DocumentBuilderFactory=org.xmlresolver.sunjaxp.jaxp.DocumentBuilderFactoryImpl\
net.sf.saxon.Transform -o $3 -x org.xmlresolver.tools.ResolvingXMLReader \
-y org.xmlresolver.tools.ResolvingXMLReader \
-r org.xmlresolver.Resolver \
-w1 $1 $2 "saxon.extensions=1" $4 $5 $6
Note you'll have to change the location of the jar files to match
your location! xercesImple.jar is from Apache, see apache
project.
xmlresolver.jar and
xmlresolver-sunjaxp.jar are from java.net. That's it. Use? Remember I'm doing an XSLT 2 transform, so it is:
$ncats xmlfile xslfile outputfile
Which transforms the xml file into the output file using the xslfile! The unsubtle difference being that for any system or web references, the resolver is used.
The next issue was processing a whole list of files together. my exemplar missed it, so I clearly didn't understand the spec. Looking at Mike Kays documentation I see I can use a file listing, in XML. This tells you about it. There is an alternative, but I wanted to filter only XML files, so I used bash first, collecting the file listing into xml.
#!/bin/bash
#
# Collect a list of all the modules in a source package
# 2009-08-10T14:10:32Z. Dave Pawson
base=/installation/erlang/otp_src_R13B01/lib/
dirs=`ls -Alp $base | grep '/'| gawk '{print $8 }'`
opfile=collection.catalog.xml
echo listing dirs in $base
echo "<collection stable=\"true\">">$opfile
for dir in $dirs
do
entry=`ls "$base""$dir"doc/src/*.xml`
for f in $entry
do
echo "<doc href=\"$f\"/>" >>$opfile
done
done
echo "</collection>" >>$opfile
This collects all the xml files. The root element (in my case) is sufficient to filter out only those xml files I want. This resulted in (truncated)
<collection stable="true"> <doc href="/installation/erlang/otp_src_R13B01/lib/appmon/doc/src/appmon_chapter.xml"/> <doc href="/installation/erlang/otp_src_R13B01/lib/appmon/doc/src/appmon.xml"/> <doc href="/installation/erlang/otp_src_R13B01/lib/appmon/doc/src/book.xml"/> <doc href="/installation/erlang/otp_src_R13B01/lib/appmon/doc/src/fascicules.xml"/> <doc href="/installation/erlang/otp_src_R13B01/lib/appmon/doc/src/notes.xml"/> ... </collection>
Which is used by XSLT 2, as follows
<xsl:template match="/">
<xsl:text>Generated: Dave Pawson </xsl:text><xsl:value-of select="current-dateTime()"/> &nl;
<xsl:for-each select="collection($catalog)/*">
<xsl:apply-templates select="/erlref"/>
</xsl:for-each>
</xsl:template>
Which runs through each of the files in the file list, and uses an imported xslt file which processes a single file!
Easy... once you've all the pieces together
Keywords: catalogs
Comments (View)Return to main index