oXygen XML Editor

Docbook tools

1. GUI stylesheet configuration
2. Docbook to PDF alternatives
3. Windows and xsltproc
4. Using PassiveTeX with docbook
5. Editors for docbook (or XML)
6. Openjade DSSSL tools
7. troff to Docbook
8. nroff to docbook
9. Docbook toolchain setup
10. Installing docbook
11. Docbook installation (Windows)
12. Linux SGML to pdf toolchain, comments
13. Docbook on Mac OSX
14. MathML in XML source
15. Olink cross referencing
16. IDE mode for XSLT on emacs?
17. Docbook to plain text
18. Docbook to plain text
19. HTML to plain text
20. Viewing the DTD in HTML
21. Training package in docbook
22. Formatting DocBook bibliographies
23. Changebars
24. XML Editor for docbook
25. Which FO engine
26. Docbook to Latex using perl
27. Tex setup to run PassiveTex
28. MathML in docbook
29. Glossary tool
30. Validating with nsgmls
31. Enable extensions for Saxon in XSL?
32. The docbook2man utility does not handle XML properly.
33. PassiveTex Installation
34. Lyx 1.2 for docbook
35. Looking for a docbook Word template
36. XHTML to Docbook
37. Word to docbook
38. Word to xsl-fo
39. SGML toolset
40. Quark Avenue
41. FrameMaker 7
42. Docbook and Math
43. Javadoc to docbook
44. XML to text
45. xinclude in docbook
46. OpenOffice to docbook project
47. Docbook Slides, in SVG
48. X to docbook, a tool selection
49. Documenting a DTD
50. manpage, troff etc to docbook tool
51. Windows docbook IDE
52. RefDB and JReference comparison
53. Docbook ide for Windows
54. RefDB and JReference comparison
55. Docbook and texinfo
56. Difference markup for Docbook
57. RTF to XML Commercial Offerings.
58. Docbook from HTML?
59. RTF (Microsoft Word) and other to XML Conversion
60. OpenOffice WYSIWYG DocBook
61. Docbook to pdf alternatives
62. Docbook XML to Latex
63. XSL Processor
64. Word to docbook.
65. Slides, animated
66. Docbook to texinfo
67. DSSSL based styling
68. Unicode character normalization tools
69. Alternatives tools for print output
70. Apple keynote presentations from docbook
71. docbook editing on emacs
72. Docbook and Word, again
73. Permuted Index creation
74. DTD flatten
75. Dopus
76. Multi-language documentation solution. docbook.sml
77. Literate programming and docbook
78. Building collaborative docbook documents
79. How to use Ant with Docbook

1.

GUI stylesheet configuration

Steve Whitlatch

Several months back, I had some discussion on this list about creating a Java GUI that would allow users a mostly point-and-click interface to creating customization layers for use with the DocBook XSL stylesheets. Well, I wrote the application and here it is.

Announcing the first public release of DocBook XSL Configurator! Actualy two releases, one for version 1.65.1 of the DocBook XSL stylesheets and one for version 1.66.1.

They are available for download from Sourceforge: Sourceforge

DocBook XSL Configurator versions 0.5.2_1651 and version 0.5.3_1661 are alpha, since they have not been widely tested, but I know of no bugs and consider the software fully operational.

You can get a good idea of what DocBook XSL Configurator is and what it does from the following:

DocBook XSL Configurator

DocBook XSL Configurator is a Java application used to create DocBook XSL FO customization layers. The application presents users with a tabbed pane containing several tables. Each row in each table contains several cells, one of which is editable and contains the text of the default setting for a specific DocBook XSL FO parameter. Users create projects containing paths to DocBook XML, common-customization XSL, an external XSLT processor, etc. Users then click through the tables, select DocBook XSL FO parameters they want to include in a customization layer, edit those parameters, include the customization layer in a project, write out the customization layer as an XSL file, and apply the XSL to the project's XML using the project's specified XSLT processor.

DocBook XSL Configurator version 0.5.3_1661 is an alpha release. It supports version 1.66.1 of the DocBook XSL FO parameter set. It does not yet support the DocBook XSL HTML parameter set.

Default FO parameter settings, help text, and guidelines for attribute sets ("property sets") are taken from the DocBook XSL package's FO documentation. Attribute set defaults are just guidelines.

DocBook XSL Configurator also includes a "From the Wild" table that provides users with nifty little snippets of XSL intended to help with formatting not implemented in the DocBook XSL FO parameter set. Currently, the number of these snippets is very small; however, the "From the Wild" snippet collection has the potential to grow very large and be very helpful.

Target Audience

If you are a beginner with DocBook XSL, DocBook XSL Configurator can help you a great deal by bringing all the DocBook XSL FO parameters together, with help, in a GUI. You don't have to switch windows seeking help, and you don't have to manually type out the file containing the XSL FO customization layer.

If you are an expert with DocBook XSL, this application may still be of use to you. You may benefit from the speed with which you can create and edit customization layers; you may find that DocBook XSL Configurator projects help you organize documentation sets; or, you may find the application useful for saving customization layers and associating them with specific DocBook XML instances.

Requirements and Use

DocBook XSL Configurator should work with any Java runtime environment compatible with Sun's Java Virtual Machine version 1.4.2 or later. However, each version of DocBook XSL Configurator needs a specific version of the DocBook XSL stylesheets. For example, DocBook XSL Configurator version 0.5.3_1661 needs DocBook XSL stylesheet version 1.66.1. Running a version of DocBook XSL Configurator with a version of the DocBook XSL stylesheets for which it was not intended could produce errors.

Running DocBook XSL Configurator requires no adjustments to your CLASSPATH, and the DocBookXSLConfigurator.jar file can be placed anywhere in the file system.

To use DocBook XSL Configurator, you first build a project. The project contains information DocBook XSL Configurator uses to help create a PDF or PS file from a valid DocBook XML file. The process would go something like this:

1) Select New Project from the File menu. A New Project dialog appears.

2) Navigate through the dialog, providing the following: the name of an XSLT processor, the entire option string to be passed to the xslt processor, location of the DocBook XSL stylesheet to use, location of a common-customization XSL file, an FO processor command string , a PDF viewer command string, a PS viewer command string

DocBook XSL Configurator uses the information provided to run the programs in your tool chain as external subprocesses.

3) Click through the tables, selecting check boxes for the parameters you want included in your XSL customization layer, and edit the parameter settings as necessary.

4) Save the project.

5) Select parameters and edit them as you like. Then, select Write XSL from the Execute menu. DocBook XSL Configurator presents you with a dialog. Choose a name and a location for the file to save. This is your XSL customization layer. Make certain that the filename and path match with those used when you created the project. Keep the same name and path of this file when you update it. Whenever you wish to change this customization layer, adjust your selections and edited parameters in the GUI, and then overwrite this file by selecting Write XSL from the Execute menu.

6) Select Process XML from the Execute menu. DocBook XSL Configurator runs the XSLT processor specified in your project using the options supplied. You should probably make certain that your XML is actually a valid DocBook XML instance first. While the XSLT processor is running, you can continue with your work. When it's finished, DocBook XSL Configurator presents a dialog box containing any messages produced by the XSLT processor. The dialog box is presented regardless of errors detected.

7) Select Process FO from the Execute menu. DocBook XSL Configurator executes the entire FO processor command string from your project settings. While the FO processor is running, you can continue with your work. It typically takes several minutes to complete. When it's finished, DocBook XSL Configurator presents a dialog box containing any messages produced by the FO processor. The dialog box is presented regardless of errors detected.

8) Select 'Display PDF' or 'Display PS' from the View menu. As with 'Process XML' and 'Process FO', DocBook XSL Configurator runs the project's command string as an external subprocess. When running the PDF or PS viewer command strings, DocBook XSL Configurator presents a dialog box containing messages returned from the subprocess only if it detects something went wrong.

Alone, DocBook XSL Configurator will work only partially. To make full use of it, you need the following:

- a valid DocBook XML instance. If you are new to DocBook XML, you should begin with Norman Walsh and Leonard Muellner's book, DocBook: The Definitive Guide, available online at: oasis open

- some understanding of how the DocBook XSL package works with DocBook XML. Bob Stayton's book, DocBook XSL: The Complete Guide, available online at: http://www.sagehill.net/docbookxsl/index.html, is probably the best resources available for learning DocBook XSL.

The project GPLed and open to additional developers. I'm currently the only developer. Send any bug reports to me, via the project.

2.

Docbook to PDF alternatives

Steinar Bang


> If FOP is still too immature, can anyone recommend another
> Docbook->something->PDF generator that is redistributable?  (We're
> hoping to avoid TeX and C++.)

Off the top of my head, I know of the following ways to transform DocBook XML into PDF, with open source/free/semi-free software:

Transform into XSL:FO, and then use one of the following to create PDF from the XSL:FO

FOP apache.org
PassiveTeX (TeX based) tei-c.org
xmlroff (written in C. Based on Gnome libraries) sourceforge

Use the DocBook SGML DSSSL style sheets, and jade/openjade, followed by TeX

Transform into RTF, by one of the following, and then use AbiWord or OpenOffice to create the PDF:

JFOR (Java) http://www.jfor.org/

XmlMind FO Converter (Java, free for personal use) http://www.xmlmind.com/foconverter/

DocBook SGML DSSSL style sheets and jade/openjade

Use DocBookInConTeXt (TeX. Takes DocBook XML source as input) http://www.hobby.nl/~scaprea/context/

Use the DB2LaTeX XSL stylesheets, and process with LaTeX http://db2latex.sourceforge.net/

Use doctranformer (written in Java) to transform DocBook XML into LaTeX or lout for processing into PDF sourceforge

Did I miss any?

3.

Windows and xsltproc

Igor Zlatkovic

Sorry if these instructions are a bit verbose, they are straight out of my tutorial (which is not available on the Internet yet)

(DaveP. They are just right!, just that I didn't have the courage to tell Daniel :-)

btw, the manual for using xsltproc can be found at xmlsoft.org

And the Windows information is at Igor's homepage

For windows:

The files you will need are available from fh-frankfurt.de

You will need to download:

The latest libxml download, at the time of writing: fh-frankfurt.de

The latest libxslt download, at the time of writing: fh-frankfurt.de

The latest iconv download, at the time of writing: fh-frankfurt.de

Extract the following files from the libxml

C:\docbook\xsltproc\ (or whatever)


libxml2.dll
xmllint.exe

Extract the following files from the libxslt archive and place in


C:\docbook\xsltproc\ (or whatever)

libexslt.dll
libxslt.dll
xsltproc.dll

Extract the following files from the iconv archive and place in

C:\docbook\xsltproc\ (or whatever)

iconv.dll

Finally, append C:\docbook\xsltproc\ (or whatever) to the PATH environment variable.

For solaris you can get the files here: http://garypennington.net/libxml2/

For other Unices: ftp://xmlsoft.org/

The official download page: garypennington.net

Ok, I think thats everything.

It was for me. Many thanks. xsltproc is getting a very good reputation! If you aren't a java fan it's the processor to have.

4.

Using PassiveTeX with docbook

Bob Stayton

Most problems with print output are in the FO processor, not the DocBook XSL stylesheets. PassiveTeX is another FO processor that formats verbatim elements properly, and it runs on Windows as well as Linux. I've written a draft of a short PassiveTeX Howto article that I've included below. If you would be willing to try it out and give me feedback, I'd appreciate it.

Getting started with PassiveTeX

This article describes how to install and run PassiveTeX.

Installing PassiveTeX

PassiveTeX is an extension of the TeX typesetting system that permits conversion of XSLFO files to PDF. More accurately, it is an extension of other extensions such as pdftex, pdflatex, and xmltex. For this reason, trying to add the necessary packages to an existing TeX installation can sometimes lead to mismatched components that don't work together. Because of the large number of files involved in a complete TeX installation, it can be difficult to sort out the problems.

Currently the easiest way to get PassiveTeX working is to install a new TeX system that supports it. Sebastian Rahtz, the author of PassiveTeX, has assembled such a system and made it available as part of the TeX Live CD distribution. Version 7 of TeX Live CD defines several "schemes", which are selections of TeX packages designed for particular purposes. One of those schemes is called "XML Typesetting". That scheme includes the PassiveTeX pieces needed to process XSLFO files generated by the DocBook XSL stylesheets.

The TeX Live CD has support for Windows, Linux, UNIX, and MacOSX systems. It also permits you to run your TeX processing almost entirely from the CD, installing just a minimum number of files. Since the size of a TeX installation for DocBook is over 200 MB, that may be a necessary option. But it does make the processing run more slowly because it has to read many files from the CD each time.

To install PassiveTeX from the TeX Live 7 CD:

1. Obtain a TeX Live 7 CD Instructions for obtaining a TeX Live 7 CD are available at: tug.org You can obtain the CD by joining the TeX User's Group (TUG), or you can download the CD image and burn your own CD. It is a huge download and requires a fast Internet connection.

2. Start the installation program On Windows, the CD may automatically start when you put the CD in the drive. If not, then run \bin\win32\TeXLive.exe to start it manually. Then select Install->TeXLive. That will start the setup wizard. On Linux or UNIX, run sh install-cd.sh in the top directory of the CD. The file itself is not executable, which is why you need to precede it with sh. That command presents a text interface for selecting installation options.

3. Select the XML Typesetting scheme As you step through the installation interface, you can select where TeX installs, what languages you want to support, and other options. The important step for DocBook is to select the "XML Typesetting" scheme. If you aren't offered a scheme selection as part of the options, then your TeX Live CD is older than version 7.

4. Run the installation After you have selected all the options and scheme, run the Install and sit back until it completes.

5. Adjust some TeX settings You need to increase the values of certain TeX parameters to process DocBook files. The settings are located in the following file:

Windows: \Program Files\TeXLive\texmf-var\web2c\texmf.cnf

Linux or UNIX: /usr/TeX/texmf-var/web2c/texmf.cnf

Make a backup copy of the original file, and then use a text editor to locate and change the following values to be at least these sizes:


    main_memory =    2500000 (that's 2,500,000)
    hash_extra =     50000
    pool_size =      500000
    save_size =      50000

    Confirm that the following values are set:
    param_size =     1500
    stack_size =     1500
    string_vacancies = 45000
    pool_free =      47500
    nest_size =      500
    max_strings =    65000
    buf_size =       200000

6. Set environment variables To use a TeX system, it needs to know where the commands are located (PATH) and where the TeX configuration files are installed (TEXMFCNF). So you need to set these two environment variables. On Windows they should be set automatically, but you should check them anyway.


    Windows:
    PATH=C:\Program Files\TeXLive\bin\win32;etc.
    TEXMFCNF=C:\Program Files\TeXLive\texmf-var\web2c

    Linux and UNIX:
    PATH=/usr/TeX/bin:$PATH check this path
    TEXMFCNF=/usr/TeX/texmf-var/web2c
    export PATH TEXMFCNF

You can test to see if your installation is working by typing at a command prompt:

pdftex -ini "&pdflatex" pdfxmltex.ini

That will generate a lot of status messages, but should result in a file named pdfxmltex.fmt.

Using PassiveTeX

PassiveTeX will convert a .fo file created using one of the XSLT processors into a .pdf file. Once the system is set up and working, you should be able to execute the following command on your .fo files:

pdftex --interaction nonstopmode "&pdfxmltex" myfile.fo
pdftex --interaction nonstopmode "&pdfxmltex" myfile.fo

You want to run the same command twice in order to resolve page numbers in the table of contents. You'll notice that it generates a lot of status messages. If you don't end up with a myfile.pdf file, then something went wrong. All the noise is recorded in a log file named myfile.log, which you can scan for clues as to what went wrong.

Dissecting this command:

* The pdftex command runs the version of the TeX formatting engine that generates PDF output instead of the original TeX DVI output.

* The --interaction nonstopmode option forces it to keep going even if it comes across something questionable. Generally the questions it asks can only be answered by a TeX expert.

* The "&pdfxmltex" argument tells the formatter to use the TeX format file named pdfxmltex.fmt that was installed with the XML Typesetting scheme. This is the PassiveTeX extension that parses the XSLFO file and applies TeX formatting.

Note

Tables processed by PassiveTeX currently require width specifications with units in colspec elements. Proportional width specifications do not work.

5.

Editors for docbook (or XML)

Gary Lawrence Murphy et al

For a collection of mini-reviews and links to various editors, try teledyn.com and click on the Editors subject link

Also added:

The free (or that can be used for a limited amount of time) tools I know:

- xemacs + psgml, if you are fond of emacs style editing
- gvim, that supports SGML hilighting
- XMLMind XML Editor (XXE) , a java tool that is quite interesting; the rendering is based on CSS stylesheets, the doc validity is checked, etc. The main drawback I see is that multi-parts docs are not supported, and changing the doc structure is not so easy.
- Morphon editor, a java tool quite similar to XXE, even if really slower (at least for me).
- epcEdit, that is interesting because you control exactly the file content. The rendering is based on stylesheets too, but it seems to be a specific stylesheet format. Pros: modifying the doc is easy. Cons: multi-parts doc is not supported.
- qemacs, a small editor a la emacs, not really mature but promising.
- LyX supports a DocBook mode; it's easy to use, but not all the tags are supported.

6.

Openjade DSSSL tools

Ian Castle

for Windows users without a suitable compiler, pre-compiled binaries of the OpenJade 1.3.1 tools and libraries are available.

Please note that these binaries are provided as a simple courtesy - no warranties, no support etc (but I would be interested to know if they don't work! - report problems on the sourceforge page). Its available on Sourceforge

Windows Source: openjade-1_3_1.zip Windows Binaries: openjade-1_3_1-bin.zip

The most recent release of Openjade, OpenJade 1.3.1 is now available to download at Sourceforge

OpenJade 1.3.1 is very much a maintenance release. It is fully described in the release notes which are available from the location above (and included in the release).

OpenJade 1.3.1 pulls together the various patches and fixes that have been made and tested since the previous release of OpenJade, back in 1999. In addition, the new release supports new platforms and the latest GNU and Microsoft compilers "out of the box".

7.

troff to Docbook

Eric Raymond

Eric S. Raymond has announced the release of a new utility, called "doclifter" that "lifts documents from troff source form (using the man, mdoc, ms, or me macros) into DocBook."

8.

nroff to docbook

Bob Stayton



> We want to translate the man pages (into 
> docbook)
> Don't ask me why :-).  the problem is that they are in nroff.  Is their
> >something out there that translates nroff?

They might try: sourceforge.net Haven't tried it myself, though.

9.

Docbook toolchain setup

Various

For win32, Markus Hoenicka offers

SGML for NT describes the setup of a SGML and XML editing and publishing system on Windows. You'll get step-by-step instructions to install and configure Emacs, PSGML, (Open)Jade, TeX, JadeTeX, xsltproc, Xalan, XT, Saxon, FOP, PassiveTeX, DocBook SGML and XML DTDs, DocBook DSSSL and XSL stylesheets. In brief, the whole toolchain to work with DocBook on either the SGML or XML side. You can choose between a plain Win32 setup or a setup using Cygwin, allowing you to work in a unix-like fashion.

The name for the tutorial is historical, as it started with a few simple HTML pages describing my setup of the SGML toolchain on Windows NT (XML was not yet an issue back then). I still think of XML being a part of SGML (not entirely true, but sortof). If someone can come up with a fresh catchy name for the tutorial *without slashes* to make clear it embraces both SGML and XML, I'll be happy to use it.

The focus on the Windows platform is not due to the fact that I'd love it so much but due to my personal experience that the SGML/XML stuff is so much more cumbersome to piece together on Windows as compared to e.g. the major Linux distributions (I guess my installation instructions could be compressed to one shell command on Debian with approximately the same result).

It is available at my home page

> >Really, the linux tools (RPMs from say, redhat, Mandrake et. al) are
> >getting to the point where they provide a good environment for docbook
> >processing "out of the box".

> Anyone know where these might be found please, 

For Debian  :
- docbook_4.1-6            SGML DTD
- docbook-xml_4.1.2-8      XML DTD
- docbook-dsssl_1.74b-1    DSSSL stylesheets
- docbook-xsl_1.45-1       XSL stylesheets (maybe newer version 
                           already available?)
- jadetex_3.11-2           JadeTeX
- jade_1.2.1-23            Jade
- psgml_1.2.2-3            psgml mode for (X)Emacs
- sgmltools-lite_3.0.3.0.cvs.20010909-6
- lib-saxon-java_6.4.4-1   Saxon                        
- xsltproc_1.0.4-1         xsltproc from libxml2/libxslt

David offers other sources:

Here are a few links to get you started (compiled from a recent thread on the docbook-apps list):

Linuxdoc For FreeBSD. A personal one, prepdoc

Julien LETESSIER adds

The fact is I've written a DocBook-app howto, er... Tutorial... Whatever, originaly with OS X in mind, but finally it's not at all OS X specific.

It describes a classic DB/XML processing chain using the Apache tools and gives lots of doc pointers.

It's a good introduction before using the comprehensive DB package I developed for projectomega.org.

Those interested can download it at in pdf, or browse it at this url (for now)

It will be available on projectomega.org as soon as it's completely finished; I'll repost then.

Sorry for the probable errors, I've only been speaking English for 10 years. Feedback welcome :)

10.

Installing docbook

Jonathan Marks

I wrote a Q&D Guide to install a docbook on RH 7.1. I've looked it over again. It looks simple to convert this process to Win$$. I'll include it below and annotate (>>>Win$$) suggestions for Win$$ conversion. Let me know how you fare.

This is a hybrid installation where the programs tetex and openjade are installed from RPM, and the definitions are installed from their source the remaining definitions installed from their source.

 >>>Win$$: Just get the openjade binaries for Windows from
 >>>Win$$: openjade.sourceforge.net. Do not worry about Tetex.

References: http://www.linuxdoc.org/authors/index.html#resources

 >>>Win$$:  Look over this page it is very useful.

1. INSTALLATION

1.1. Install Tetex Installed the latest Redhat version, 1.07-27 rpm. Openjade uses tetex macros. Installed all the tetex packages.

 >>>Win$$: Ignore Tetex.

1.2. Install Openjade. Installed the latest Redhat version, 1.3.-17 rpm.

1.3. Install DocBook DTD's Got DocBook SGML 3.1 and DocBook SGML 4.1 from oasis

Get DocBook XML 4.1.2

These arrived as zip archives, Unzipped them into
/usr/local/share/docbook/sgml-3.1/
/usr/local/share/docbook/sgml-4.1/
/usr/local/share/docbook/xml-4.1.2/
respectively.
 >>>Win$$:  substitute C:\docbook for /usr/local/share/docbook
 >>>Win$$:  If you are only interested in sgml, just get sgml-4.1

1.4. Install DocBook Entity Definitions Got the entities and un-tgz'd it repeatedly into each of the above DTD directories

1.5. Install DSSSL Stylesheets. Got the latest (1.72) style sheets and documentation The latest version is now 1.73.

Un-tgz'd the docs in /usr/share/doc/
 >>>Win$$: Put the documentation where you like.

Un-tgz'd the style sheets in /usr/local/share/docbook/,
it creates a sub-dir /docbook-dsssl-1.72 in both cases
 >>>Win$$: Put in C:\docbook. (version is 1.73 not 1.72).

1.6. Install the LDP Customizations

Copied this file into /usr/local/share/docbook/ldp
 >>>Win$$: copy ldp.dsl to C:\docbook\docbook-dsssl-1.73\html\. and
 >>>Win$$: C:\docbook\docbook-dsssl-1.73\print\.

2. CONFIGURATION

2.1 Create the following symlinks

 >>>Win$$ Do not worry about creating the symlinks.

cd /usr/local/share/docbook/
ln -s sgml-4.1 sgml
ln -s xml-4.1.2 xml
ln -s docbook-dsssl-1.72 dsssl
cd sgml-3.1
ln -s docbook.cat catalog
cd ../sgml
ln -s docbook.cat catalog
cd ../xml
ln -s docbook.cat catalog
cd /usr/share/sgml
ln -s openjade-1.3 openjade

2.2 Create the following file:

 >>>>>>>>>/usr/local/share/docbook/catalog
CATALOG "/usr/share/sgml/openjade/catalog"
CATALOG "/usr/local/share/docbook/dsssl/catalog"
CATALOG "/usr/local/share/docbook/sgml/catalog"
CATALOG "/usr/local/share/docbook/xml/catalog"
<<<<<<<</usr/local/share/docbook/catalog
 >>>Win$$: substitute C:\docbook for /usr/local/share/docbook
 >>>Win$$: if not interested in xml delete last line

2.3 Add the following env variables to startup file I use bash, so add the following to /etc/profile

export JADE_HOME=/usr/share/sgml/openjade
export SGML_SHARE=/usr/local/share/docbook
export SGML_CATALOG_FILES=$SGML_SHARE/catalog

 >>>Win$$: use SET in autoexec.bat - fix the JADE_HOME
 >>>Win$$: and SGML_SHARE=C:\docbook.

2.4 Apply the LDP customizations

 >>>Win$$ Ignore this step.


cd /usr/local/share/docbook/dsssl/html
ln -s ../../ldp/ldp.dsl .
cd ../print
ln -s ../../ldp/ldp.dsl .

Log out and in again for env variables to take effect.

 >>>Win$$:  The following applies until the next >>>Win$$ tag.

Get a docbook sgml template to verify the installation.

For multifile html (on one line). openjade -t sgml -d c:\docbook\docbook-dsssl-1.73\html\ldp.dsl#html <sgml-filename>

for single file html (on one line) openjade -t sgml -V nochunks -V rootchunk -d c:\docbook\docbook-dsssl-1.73\html\ldp.dsl#html <sgml-filename>

for a single rtf file. (0n one line) openjade -t rtf -d c:\docbook\docbook-dsssl-1.73\html\ldp.dsl#print -o <rtf outfile> <sgml file>

To get from rtf use Win$$'s wordpad, and print using adobe's Acorobat distiller.

 >>>Win$$$

3 Validating and Publishing xml or sgml

To validate sgml nsgmls -sv -c $SGML_CATALOG_FILES <file>.sgml

To validgate xml nsgmls -sv -c $SGML_CATALOG_FILES $SGML_SHARE/dsssl/dtds/decls/xml.dcl <file>.xml

The following script creates a <filename>.output subdirectory, into which it creates a dvi, tex, single html, chapterised html, pdf, ps, and txt files.

 >>>>>>>>>>/usr/local/bin/publish
#!/bin/sh

FILE=$1
IMAGES=$2
DSL_PRINT="${SGML_SHARE}/dsssl/print/ldp.dsl#print"
DSL_HTML="${SGML_SHARE}/dsssl/html/ldp.dsl#html"

if [ "`echo $FILE | grep \"\.xml$\"`" != "" ] ; then
EXT=xml
XTRA="${SGML_SHARE}/dsssl/dtds/decls/xml.dcl"
elif [ "`echo $FILE | grep \"\.sgml$\"`" != "" ] ; then
EXT=sgml
XTRA=""
else
echo "To publish docbook formatted sgml or xml."
echo "Usage: publish <filename>.{sgml,xml} [image-dir]"
echo "   where:"
echo " filename is the file to publish"
echo " image-dir is the relative dir path containing images"
echo ""
exit 1
fi

if [ ! -f $FILE ] ; then
echo "File: $FILE not found"
exit 2
fi
BASE=`basename $FILE $EXT`
mkdir ${BASE}output > /dev/null 2>&1
cd ${BASE}output
if [ "$?" != "0" ] ; then
echo "Cannot mkdir or cd to ${BASE}output"
exit 3
fi

rm -fr *
mkdir chaps
if [ -n $IMAGES ] ; then
mkdir images
mkdir chaps/images
cp ../${IMAGES}/* images/. > /dev/null 2>&1
cp ../${IMAGES}/* chaps/images/. > /dev/null 2>&1
fi

cd chaps

openjade -t $EXT-raw -E 0 -d $DSL_HTML $XTRA ../../$FILE
mkdir patch
for f in `\ls *.html` ; do
sed -e 's,<br/>,<br>,g' $f > patch/$f
done
cd ..
openjade -t $EXT-raw -E 0 -V nochunks -V rootchunk -d $DSL_HTML $XTRA 
../$FILE
sed -e 's,<br/>,<br>,g' index.html > ${BASE}html

lynx -dump -nolist ${BASE}html > ${BASE}txt

openjade -t rtf -E 0 -d $DSL_PRINT -o ${BASE}rtf $XTRA ../$FILE

openjade -t tex -E 0 -d $DSL_PRINT -o ${BASE}tex $XTRA ../$FILE

jadetex ${BASE}tex
jadetex ${BASE}tex
jadetex ${BASE}tex
dvips -o ${BASE}ps ${BASE}dvi

pdfjadetex ${BASE}tex
pdfjadetex ${BASE}tex
<<<<<<<</usr/local/bin/publish

Execute publish as follows to hide unnecessary warning clutter.

publish harold-config.xml 2>&1 | fgrep -v "DTDDECL catalog entries"

11.

Docbook installation (Windows)

Ian ?? and Bob Stayton


> Can someone please point me to a FAQ or page
> or something that will help me install DocBook,
> and the associated "stuff" 

Google suggested : this

I've not tried it ;-) but it looks like the answer to your prayers.

Bob Stayton provides the basics.

Here is one simple setup for DocBook on Windows. It has three pieces: the DocBook DTD, the DocBook XSL stylesheets, and the xsltproc stylesheet processor.

1. Download and unzip the DocBook XML 4.1.2 DTD from: http://www.oasis-open.org/docbook/xml/4.1.2/docbkx412.zip

2. Download and unzip Norm Walsh's DocBook XSL 1.45 stylesheets from: http://telia.dl.sourceforge.net/docbook/docbook-xsl-1.45.zip

3. Download the libxml kit to convert your XML to HTML from: http://www.fh-frankfurt.de/~igor/projects/libxml/index.html

You'll nee both the "libxml binaries" and "libxslt binaries" zip files. The kit includes xsltproc, an executable that applies an XSL stylesheet to an XML file.

Then you can generate HTML using:

xsltproc /path/to/docbook-xsl-1.45/html/docbook.xsl \
     yourfile.xml > yourfile.html

Yes, xsltproc on Windows accepts forward slashes. You'll need to get xsltproc in your PATH, or give a full path to the command. And your xml file will need to include a <!DOCTYPE> that indicates the path to the docbook XML DTD.

The nice thing is that this same setup also works on Unix or Linux (except you'll probably be downloading and compiling source instead of binaries for libxml).

12.

Linux SGML to pdf toolchain, comments

Ian Castle

Currently I am very happy with the SGML/XML -> PDF via jade/DSSSL toolchain. We (my company) don't really have any open issues. We are using the toolchain to produce release notes, user guides and, more recently, course notes (which are mainly being converted from MS word documents).

However, you will need to apply patches to the basic tools to get them to this working point.

The problem is that openjade and jadetex are interdependent. Jadetex expects openjade to be producing output consisting of various "macros". Changes in jadetex over the years have added macros - particularly in the support of double sided printing. The released 1.3 of openjade only really supports jadetex 2.7 (as does jade 1.2.x).

You need openjade 1.3 (the released version) + a bunch of fixes from Francis J Lacoste (this fixes bugs and fits nicely with jadetex 2.20 - 3.3 ) + a small patch from me which ties openjade to the recent releases of jadetex ( 3.4 - 3.11)

You want the latest DSSSL style sheets (>= 1.70 which have some key fixes for double sided printing and correct pagination of recto/verso pages) - more specifically, support OpenJade extensions required for correctly double sided printing). Of course, the latest version will do.

You want the latest (3.11) version of jadetex because this simply has the least bugs.

The recent versions of jadetex have better control of whitespace/layout, orphans/widows, tables + support for working two side printing (added in 3.4) + support for roman numerals in "front matter".

However, the DSSSL stylesheets have some problems with deciding what is front matter and when the pages start esp. when you have a complex mix of "parts" etc. There isn't really a correct generic fix. The fix that murray was talking about works for the situation where your book consists of front matter, table of contents, preface, part I, chapter ...., Part II etc... It won't work if you decide to stick a preface before each part for example. This is why it isn't in the official stylesheets - because it isn't really a solution - just reduces the problem for the most common applications.

You can find the BSD style sheet which contains this fix at freebsd website

Search for "Castle"...

I've submitted a patch for OpenJade against what was in CVS when I made the patch... this hasn't been applied, and in either case there has been no recent release of openjade. There needs to be - as openjade is being done a great disservice - when the released version contains so many flaws which have been fixed.

So, in my opinion, the best Linux DocBook SGML -> PDF toolchain is mine ;-). Fortunately, Camille included the patches to openjade + jadetex 3.11 + DSSSL stylesheets 1.72 in Mandrake 8.1. So this has the best tool chain.

If you have rpm2cpio on your debian system you can download the mandrake 8.1 RPM at the sunsite website

And extract the source + the patches. The patches you want are

openjade-1.3-features.patch.bz2
openjade-1.3-twosidestartonright.patch.bz2

[I can email these to you if you like - I don't want to bombard the list with them]

The other patches are mainly to do with the RPM build enviroment, final file locations and some fixes for the 2.96 compiler.

Further praise for Mandrake... I bought the 8.0 Boxed release and was pleased to see that the printed guide that comes with it was actually produced by openjade/jadetex - so they used a linux tool chain all the way through the production process (as this predated the roman numerals patch and a few others it looks a bit tatty..). This compares with Redhat 7.1 manuals (last version I looked at) which was authored in DocBook, produces the HTML with openjade - but used Arbor Adept/Epic (or whatever it is) running on Solaris for their printed manuals.

I hope Mandrakesoft have carried on this process in 8.1 as it is a good example of the "Dog Food" principle....

I also looked at using htmldoc, but it is inferior to the results that you get with a "fixed" openjade/jadetex toolset.

13.

Docbook on Mac OSX

Sasha Zucker

I am using OpenJade and DocBook XML 4.1.2 to develop documentation on my PowerBook running MacOS X 10.1. OpenJade can is easily installed with Fink, a Unix software package manager for OS X that is a Perl front-end to ported Debian package utilities (e.g., apt-get, dselect). I highly recommend fink to all MacOS X users. For more information, visit sourceforge.

Note that the OpenJade pacakge is in fink's unstable branch. The only problem with the package is that it throws away files included with the source (e.g., xml.dcl) that you need to use OpenJade. This problem is easily fixed by untarring and manually installing the source files after installation. I am going to work with the package maintainer to fix this problem.

14.

MathML in XML source

Allin Cottrell

This is a small set of files enabling literal pass-through of TeX math to jadetex, in the context of DocBook -- i.e. lets you use TeX math rather than MathML in the SGML/XML source file. Includes a utility to auto-generate PNG images from the texmath elements for use in HTML. Small addition to the DTD, a few DSSSL bits and pieces, plus some perl hackery.

Comments welcome.

15.

Olink cross referencing

Bob Stayton

I've been investigating external cross referencing with olink for some time, and have developed a new way of doing it using olink. I've assembled a customization kit for the new olink I've been working on. This kit permits anyone to form cross references between DocBook 4.1.2 documents and have them resolved by the XSL 1.45 stylesheets. Once it is set up, it is easy to use. Most of the effort is in setting it up.

I've put the kit (newolink.tar.gz) on my website> if you want to pick it up and test drive it. It includes the customizations, example files, and a sample Makefile to process the files (using xsltproc). It includes a HOWTO file with instructions for using the tools on your own files.

If you want to browse the toolkit files, I've also checked them into the DocBook SourceForge CVS tree. They are in:

   contrib/xsl-custom/newolink/tools
   contrib/xsl-custom/newolink/examples
   contrib/xsl-custom/newolink/OlinkHOWTO.html

You should be able to run the examples out of your working directory if you fix a couple of paths.

One of the examples (modular.xml) illustrates how you can use XInclude and this new olink to create modular content files, where each module is a valid XML document instead of a system entity.

This is the first release of this kit, and it doesn't have all the features I would like to eventually see. I'm also pursuing this with the DocBook Technical Committee to see how it fits in with future plans for DocBook.

Feedback is welcome.

16.

IDE mode for XSLT on emacs?

Chris Maden

Try the xslt-process minor mode:

17.

Docbook to plain text

Kevin Conder

To convert docbook HTML output to plain text I use lynx in Windows at work. It works like a charm: Command is "lynx -dump -nolist in.html > out.txt"

Get your lynx for Windows here:

18.

Docbook to plain text

Sam Steingold




> Does anyone know of any stylesheets that will convert DocBook
> documents into a plain ASCII format?

A way to deal with this is to convert to texinfo first and then use makeinfo to produce info text files.

The reason I think this is a good way is that texinfo format is (just? almost?) as rich as DocBook, so there should be no information loss in that conversion, and the makeinfo process of texinfo->text conversion is very well thought through, widely tested and used, and known to work.

19.

HTML to plain text

Bob Stayton

Various documentation forms are based on plain text. In order to obtain good plain text output from html, I've found html2text to be the best tool for the job.

DaveP. I've hosted it here since the author seems to have disappeared.

20.

Viewing the DTD in HTML

Bob Stayton

I wrote a tool called LiveDTD to assist me in writing an extensive customization layer for DocBook. It converts a DTD into a browsable hypertext document, and it made navigating the maze of parameter entities much quicker. You can see a browsable version of DocBook 4.1.2 and a customized version DocBook 3.0 at: Sagehill

(the page above that describes the LiveDTD interface). When I have a question that requires looking at the DTD, I always use the live version now.

Also from that website you can download the livedtd.pl Perl script that makes these things. It is very useful for developing and maintaining a customization layer for a DTD.

21.

Training package in docbook

Christopher R. Maden

The training package is currently available at my webpages, as a zip file >; it will probably move when I get around to integrating it into the site. It is entirely undocumented right now.

That zip includes:

Archive:  dbtrain.zip
   Length     Date   Time    Name
  --------    ----   ----    ----
       949  11-10-01 17:25   train.dtd
       562  11-06-01 21:30   files.xsl
     13273  11-07-01 02:12   handout.xsl
     10486  11-07-01 02:36   slides.xsl
       738  11-10-01 17:25   slides.css
      1071  11-10-01 17:24   left.gif
      1073  11-10-01 17:24   right.gif
  --------                   -------
     28152                   7 files

The DTD has a <class> element which should be used as the root. Each <unit> can have <slide>s at the start; slides.xsl will create HTML slides from them (with slides.css and left and right.gif). The files.xsl stylesheet will extract any <programlisting> with a filename attribute for use as sample files with the class. Finally, handout.xsl will generate PDF output for distribution to the class.

The files.xsl and slides.xsl stylesheets use the XSLT 1.1 draft with SAXON to do separate file output; this could be tweaked for other processors. The handout.xsl stylesheet uses the pre-REC version of XSL for use with older FOP; that would need to be changed for the most recent version, as well.

Someday I'll document all of this

22.

Formatting DocBook bibliographies

Markus Hoenicka

It is not my intention to bore anyone, but let me again tell you that the tool you wish to have already exists and is ready to use. Let's compare BibTeX and RefDB for the sake of clarity again:

- You enter your LaTeX references into a flat-file database in the BibTeX format

* You enter your references for a SGML or XML document into a SQL database. Input can be either RIS, DocBook, or BibTeX. As RefDB is Unix-style, you can write other import filters in any language that can send output to either stdout or to a file. Using a SQL database means better scalability for large collections and added benefit if you share your data with colleagues (think workgroups, departments, access control...)

- You use style files in the powerful but somewhat cryptic BibTeX format for your LaTeX documents.

* You specify the bibliography and citation styles for SGML and XML documents in XML files which are essentially templates for the sequence and appearance of bibliography and citation elements. These are also loaded into a database. This means they are pre-parsed which speeds up the formatting of bibliographies.

- In a LaTeX document, you specify with the \bibliography command which external bibliography file to use. You specify with \cite commands which references you want to cite (and appear in your bibliography). With the natbib package you gain other citation styles like textual citations: Miller et al. reported [4] that...

* In an SGML or XML document, you specify an external entity with the name of the SMGL or XML file that will contain your bibliography. In DocBook documents you specify with <citation><xref..></citation> constructs which references you want to cite. Parenthetical and textual citations are supported.

- You run latex on your LaTeX document. This will create an .aux file which contains (among other stuff) a list of all citations.

* RefDB uses a DSSSL script to extract a list of all citations from SGML or XML documents into an XML document (which you can edit to add other, not cited references)

- Then you run bibtex on your LaTeX document. This will use the bibliography style you specified in the document and will create a cooked bibliography in a .bbl file

* Then you run a RefDB app on your SGML or XML document. This will use the bibliography style you specified on the command line and will create a cooked bibliography in a SGML or XML file. It will also create a small stylesheet driver file specific for your bibliography style.

- Finally you run latex once or twice again to finalize your document

* Finally you run Jade or an XSLT processor on your document to transform it to the final output. This step uses the RefDB-created driver files to format the RefDB bibliographies (leaving alone potential other bibliographies) and the RefDB citations (leaving alone potential other citations). The stylesheet driver files essentially take care of character properties like font weight, posture etc. for various parts of the citation or bibliography. The citations are neatly hyperlinked with the references in the bibliography in all output formats that support this.

Please note that neither BibTeX nor RefDB do any "search-and-replace"-style mangling of your sources. The cooked bibliography is kept in an external file in both cases. This way it is easy to reformat your document for a different bibliography style without touching the document source. And the whole thing works for DocBook, TEI, and any other reasonable DTD (with a little stylesheet tweaking, that is).

Once again, more info is available at sourceforge Please visit this url to view example documents formatted for two different journals.

RefDB does use bibliomixed on purpose. I think the choice between raw and cooked is a compromise between philosophy and ease of implementation, speed of execution etc. I see the main purpose of auto-generating bibliographies not in creating beautiful and philosophically correct *source* documents, but to help users create correct *formatted* output. The intermediate bibliography element is a means to achieve this. The DocBook DTD explicitly defines the bibliomixed element to create bibliographic output that would be too tedious or complicated to create on the stylesheet level alone.

 > Has anybody created DocBook elements equivalent to BibTeX, and a 
 > transformation to DocBook biblioentry? And/or a conversion from 
 > BibTex DB files to DocBook? That would make it easier to maintain 
 > references as a BibTex-like SGML/XML collection.

RefDB can import BibTeX bibliographies and use these data to create DocBook bibliographies.

Taken together, RefDB seems to be the only available tool to actually format bibliographies and citations in the output created from SGML and XML documents. RefDB does this without modifications of the DocBook DTD, and without mangling the document source. RefDB is not limited to DocBook, but does TEI as well and can be extended to any other document type.

RefDB stores far more information for a bibliographic reference than just the bibliographic data. As RefDB was designed from the ground up as a collaborative tool, it is necessary to store a part of this information, like personal notes or availability information (i.e. a URL to an electronic offprint or a description where to find that paper copy), separately for each user. So the database is necessarily relational, which cannot be achieved with a flat file. Another reason for using a database server is of course the better scalability. A flat file is fine if you collect a few hundred references, but it is a bottleneck if your workgroup collects several thousands of references. Searching a flat file of that size with regular expressions and several search criteria is not very performant.

I don't know why you want to put your database under CVS control, but if you have to you can do similar things with RefDB. You can either dump your databases to a SQL script (which is a plain text file) or you can retrieve all references in RIS format (which again is plain text). Both formats would be suitable for CVS and can re-create the SQL database.

Future versions of RefDB will support different database servers (so from your point of view you can choose your poison), including an embedded SQL server which does not need an external database server.

23.

Changebars

Norm Walsh

Several people on this list have asked about DiffMk, my tool for generating changebars automatically for DocBook (and XHTML and XML Spec) documents.

The brave at heart will find a first beta of my Java rewrite of this program at my web site

It works better, including support for a UI and word-level diffing, but it's a lot less mature and stable than the Perl version. Still, I've been using it successfully for a while so...share and enjoy!

P.S. Uh, except for the share part. I'd be just as happy if it didn't get mirrored around the globe in its current crude state.

24.

XML Editor for docbook

Bob Stayton


> I was wondering what XML editors/suites are best suited to edit and "manage"
> a set of XML/XSL/... files that make up a "DocBook project".

In XMetal, you can do two things:

1. Create standard template files for each of the kinds of documents you want to create. Then you can add those templates to the File->New interface.

2. You can customize each element when it is inserted to include any children or structure you want to add. See Tools->Customizations->General->On Insert. The customizations are associated with the current DTD.

25.

Which FO engine

Bob Stayton




> I tried an evaluation version XEP from RenderX and it did a very good job
> converting XSL FO to PDF.

> It's a bit too expensive for me  so I'm looking for
> another good XSL FO to PDF engine.

You should try PassiveTex: Oxford I find it works better than FOP.

But getting it working can be a bear. I've never gotten PassiveTex working over an existing TeX installation. There were always fatal errors, of a type that didn't make enough sense to me to fix.

I have gotten PassiveTex working on both Win2K and Linux by doing the following:

1. Download the TexLive CD image and cut a CD.

2. Install the entire TeX distribution from the TexLive CD.

3. Follow Sebastian Rahtz's directions from the above webpage *to the letter* to install PassiveTex in the new TeX installation.

Like FOP, it isn't perfect, but it is useful.

26.

Docbook to Latex using perl

Michael Wiedmann

Jon Grov announced earlier today on Freshmeat his package "dblup" (DocBook to LaTeX using Perl) which might be of interest for some people:

"dblup (Docbook to LaTeX using Perl) enables easy and quick conversion from DocBook documents to valid, human-readable LaTeX markup. It only supports a very small subset of the DocBook-DTD."

See: sourceforge

27.

Tex setup to run PassiveTex

Holger R



>  The
> Win* boxen configured like a charm thanks to MiKTeX.  My FreeBSD box,
> however, has been no end of headaches.

I recommend that you use TeXLive instead of the teTeX that comes with FreeBSD. It's available from dante

(That's the German TeX Users' Group, you should probably pick a site closer to you).

Grab the file texlive-final-20010730.iso.bz2

This ISO image contains TeX setups for various Unix derivates (including FreeBSD) and for Win*. In my opinion, it's got the best setup for TeX stuff related to SGML/XML processing.

28.

MathML in docbook

Norm Walsh


| How do I embed MathML into a DocBook XML ? is their an example that I
| could have ?

See the oasis website

29.

Glossary tool

Fabian Mandelbaum

There is anonymous CVS access and also a mailing list available!

Please, check mandrake website for details about all this.

30.

Validating with nsgmls

Jirka Kosek


> nsgmls -wxml -s C:/programs/jade/xml.dcl wrapper.xml
> nsgmls:C:/programs/jade/xml.dcl:1:W: SGML declaration was not implied
> nsgmls:ko/glossary.xml:3:10:E: non SGML character number 154
> nsgmls:ko/glossary.xml:3:13:E: non SGML character number 150
> nsgmls:ko/glossary.xml:9:15:E: non SGML character number 149
> <...>
> nsgmls:I: maximum number of errors (200) reached; change with -E option

> SGML validation exited abnormally with code 1 at Mon Apr 01 11:20:53


> Is there anything else I should check or something I can do to
> troubleshoot my setup?

You must set environment variable SP_ENCODING to value "XML" or "UTF-8". In the first case, nsgmls will use encoding specified in XML declaration, in the second UTF-8 will be assumed.

31.

Enable extensions for Saxon in XSL?

Togan Muftuoglu



>How do I enable saxon extensions for docbook? I've added the extensions to
>my classpath and seem to remember that there was either a param or variable
>to set for docbook to use the extensions. Am I too far off the mark?

In your customization add the following

<xsl:param name="use.extensions" select="'1'"/>
<xsl:param name="saxon.extensions" select="1"/>

32.

The docbook2man utility does not handle XML properly.

Tim Waugh

For nroff output from XML, better to use db2man, an improved version of which is included in the xmlto RPM package. See http://cyberelk.net/tim/docbook/ for pointers to the packages you need.

You can get xmlto from here

33.

PassiveTex Installation

Tim Waugh



> Does anyone have step by step install instructions for PassiveTex. I've
> tried following the instructions on the website but got confused by the
> locations and instructions.

cyberelk has information about which RPM packages you need to install for Red Hat Linux.

34.

Lyx 1.2 for docbook

Gary Lawrence Murphy

Abstract

A quick 5-minute review of using LyX 1.2 for DocBook authoring. Keep in mind this is not meant to be an exhaustive review or be any sort of evaluation of LyX on its own merits, this is only a quick review of LyX with regard to editing DocBook articles in relation to the other free XML editing tools such as XXE and Emacs psgml-mode.

First Impressions

The first item that appears when trying to use the new experimental branch of the LyX word processor for DocBook is the appearance of the same familiar LyX interface; it's not totally obvious that this thing does DocBook at all. The import choices list the long defunct LinuxDoc, along with MsWord, but no DocBook. On closer examination, though, there is a DocBook Article template option.

Selecting that option, it's still not totally obvious. The filename is still a .lyx file. Highlighting text with the intention to mark it as a literal or some other DocBook tag is also not apparently obvious, and there are no interface objects which are obvious as DocBook markup.

First Commands

The first thing that does work as expected is to select a new section. This produces a proper section heading, numbered as one would expect.

Selecting a subsection, though, immediate makes the current paragraph into a subsection heading rather than opening a new heading; LyX apparently does not use a ??mark and select?? editing model, but instead uses the more confusing ??set mode and proceed?? model.

The issue remains, than, of how to introduce docbook markup.

Basic Markup

Adding basic markup like subsections follows as the usual LyX sequence of opening a new line, selecting the mode, and continuing to type, with a carriage return to end that mode and return to standard text. The other buttons, on the other hand, follow a select/op mode where text can be selected, and then the font changed via ! or the little man. The semantic meanings of these markup elements is not explained.

Entering tags

This part is pretty elusive; I'm not going to go to the manual for this because an interface should be self-revealing, but it expect this may be a place where you need to actually educate yourself as to the designer's intentions. Ditto for the buttons which, being xform based, have no tooltips --- it's anyone's guess what a ! or a little man might mean (unless you're an old-hand at LyX). The basic philosophy, then, is ??to do this, you got to know how?? but I always wonder if it is then true that since you had to learn something to use it anyway, why not simply learn the markup?

There's another button marked ??font?? with an arrow below it, but clicking this only fills the status line with ??(Changed) (font-free)?? ... whatever that means.

you do get itemized lists which repeat for each item until you back-space over the bullet

and then you can't type the next paragraph until you enter a bullet item and use the style-set to change it to standard. Most of the actions of the interface will happen from the style-select box. and this has no concept of what is valid and what is not; you can insert any tag inside any other regardless of the legality of it within the current DTD. Many of these options also have no real meaning for DocBook (such as ??paragraph?? which changes the font rather than introducing a paragraph break or something tangible like that).

I do see GUI buttons in the stylesheet, but I've been unable to discover how to add my own; IIRC, LyX lets you insert LaTeX commands inline in the text, and I expect the DocBook way is to use this facility to insert the special elements.

Special effects

There are some special effects in the style menu which may be useful, but it's not obvious what these do or how to use them; you just have to try them and see,

Description

this appears to be a description line which probably does the same as an itemized or enumerated list in creating a style that is

repeated

for subsequent lines and continued until you enter a new line and switch it back to the standard markup.

If there are hotkeys for these mode changes, there is no indication in the interface.

In addition to the style settings, LyX has a menu item caled ??insert?? which contains some special objects to put into the text.[1]to return from these objects, you have to move the cursor outside of the bounding box on the GUI display.a hyperlink insert is also a bit of a break from the flow; my biggest complaint about this mode of working is that it upsets the flow of the writing as you have to stop, fill out some sort of box (sometimes a modal dialog, sometimes an inline box) and then jump out of it to return to your prior mode.while you do this in the flow of the text, the formating given by LyX in the GUI display will become skewed as the new element will be inserted inline across the full width of the page; with the note attached to this paragraph, the line of text just prior to the note becomes justified to show only four words on the line equally spaced. This makes proof-reading a little difficult; I can click on the ??note?? icon to collapse it, but that also moves my cursor to the point before the icon, so I have to re-navigate back to the end of the sentence to continue. Again, there may be hotkeys for all this, and I do hope there are, but there are no clues as to what keys these might be.[2]Arrow keys appear to get you out of the boxes if you happen to be at the end of the box text. Arrows will also freely navigate within these displayed inline boxes as if they were part of the page.

First DocBook

With a bit of experimentation, it seems LyX saves its document in its native format, but offers an export function to save the current document as SGML DocBook; selecting this option flashes a brief message to say the file has been saved with the same name but with the .sgml suffix.

Working with Structured Docs

This is where LyX has always shined, and 1.2 does not disappoint: The Navigate menu shows the structure in an outline format allowing you to quickly move from section to section.

Printing DocBook

Producing the printout proved to be painless; the SGML produced is a little ugly but not unworkable (and you could probably use tidy to fix it); passing the exported SGML into xsltproc --html with the xsl-stylesheets produced this file as a beautifully formatted HTML rendering of a DocBook article.

Conclusions

For those who like LyX, I think you're going to be very happy, but for those who don't, be prepared for some oddities and frustrations. This is not an XML editor, it isn't a DocBook editor, it is a structured text editor that just happens to offer a DocBook export format (no idea if it imports DocBook).

For my own use, I don't think I will switch from psgml-mode, but I may use it as a means to produce simple documents or to rapidly flush out a document which I will later massage with Emacs psgml-mode. Just from this cursory look at the interface, it is apparent you can customize a lot of it, for example, it is preset with CUA keybinding which I expect can be changed to VI or Emacs (although, again, if the user can do this, the user could handle using Emacs in the first place).

35.

Looking for a docbook Word template

Ed Dixon


  > Take a look at YAWC (Yet Another Word Converter), www.yawcpro.com

Interesting! Does anyone have experience with this? Is the free lite version good enough for DocBook use? Or do you need the pro version?

36.

XHTML to Docbook

Mike



> has anyone developed any XSLT stylesheets to convert XHTML documents
> to Docbook or Simplified Docbook? I'd prefer to re-use, rather than
> re-invent such a tool...

There's DocParse: commandprompt.com

It's no free software, no source code provided and doesn't seem to be very configurable (according to the docs at least, it has exactly three configuration options -- to set the DOCTYPE on the resulting DocBook document). But looks like it's (relatively) inexpensive at least.

37.

Word to docbook

Unknown

Take a look at the following articles to learn about converting Word 2000 to sDocBook (simple Docbook):

Converting Word documents into Simple Docbook XML ibm.com

Converting RTF documents into HTML documents ibm.com

Converting RTF documents with graphics into HTML documents ibm.com

XML zone tips ibm.com

why don't you download the free or evaluation version of YAWC? This is a VBMacro Template, add-in for Word that will output Simple Docbook or XHTML provided you use the YAWC defined named styles. The YAWC site is yawcpro.com

38.

Word to xsl-fo

Chris Scott

There is Rtf2fo (RTF being word's ASCII format) at rtf2fo.novosoft.com. I've only used it a couple times but it seems like it should solve what you are trying to do. You'll have to pay for it though. I've also started a project merging JFOR (www.jfor.org) with FOP (http://xml.apache.org/fop/index.html) so that one can go back from FO and output documents in RTF (or pdf). It should make an appearance in the next few months. The best thing about JFOR and FOP is they are open source.

39.

SGML toolset

Yann Dirson

sgml2x 0.99.4 is out. This release adds more comfort:

- no more pollution with temporary files,
- automatic production of PDF bookmarks,
- full usage of DocBook stylesheets without passing extra flags,
- catches errors not completely reported by (open)jade,
- uses openjade by default,
- symbolic names for verbosity levels,
- renaming of HTML dirs as *-html instead of *.html.

Yes, this is not yet 1.0. I may add support for index generation before that, and even a couple of other things listed in the TODO list.

Homepage: alcove-labs.org Savannah: savannah.gnu.org Download:savannah.gnu.or

sgml2x allows to easily format a SGML or XML document using DSSSL style-sheets, and provides the following features:

- Multiple possible style-sheets per document class
- Easy specification of style-sheets using aliases
- Easy integration of new style-sheets by adding a simple new definition file in a configuration directory
- The caller can specify a PATH-like list of configuration directories, defaulting to a system-wide, a per-user, and a per-project configuration directories
- Automatic selection of a default style-sheet to be used, based on assigned priorities
- Pass arbitrary options to jade(1)
- Works in a temporary location so as not to pollute the working directory with temporary files

The document-class used to look for the style-sheets, and the output format, is for now only derived from the name with which the program is called, so you will want to call this program through symbolic links like docbook-2-pdf.

sgml2x is a implemented as a shell wrapper around jade(1) (or, preferably, openjade(1), although we use the generic name jade throughout this documentation), jadetex(1) and other tools.

40.

Quark Avenue

Marc Wiener

I've been using Avenue for about 2 weeks now with a pseudo docbook DTD. I just tried loading docbook (V4.1.2) and have been successful with making the root element book. On the other hand I had nothing but trouble with my small docbook based DTD. Avenue kept crashing on me consistently. I have a call open with Quark who were able to reproduce it, but haven't heard from them in a week. My impression of the software is not favorable. I've gotten it to work for us by making a DTD that has every Quark style as a child of book and then transform it to my DTD using Omnimark (and finally to the full docbook). It's far from elegant, but it works. If it was my choice and deadline pressure didn't loom I'd drop avenue and use Noonetime.

I'm using Quark 4.1 and AvenueQuark 1.01

41.

FrameMaker 7

Bob Stayton

For those who were interested in using FrameMaker with DocBook XML, it looks like FrameMaker 7.0 will do that. I just received my copy of FrameMaker 7.0, which combines the standard FrameMaker and FrameMaker+SGML into a single product that also does XML. It comes with a DocBook XML 4.1.2 application, which means an EDD (Frame's version of a DTD), read/write rules for moving between XML and Frame's .fm file format, and a set of paragraph and inline formats.

I was able to do File->Open on DocBook 4.1.2 XML documents, if they had the correct DocBook PUBLIC identifier in the DOCTYPE. The document becomes a Frame structured .fm file, which means you are working in a WYSIWYG formatted view. You can turn on the tag boundaries for editing elements, but be sure to turn them off before saving as PDF because they show up there. The only problem I've seen so far is that literallayout line breaks aren't always preserved.

When you open a <book> document, Frame creates a standard Frame book file and divides the chapters into individual Frame documents.

I think the formatting styles in the app need some work, but Adobe says that the app they provide is just a starting point for the user to improve upon. I can't comment on its behavior as an XML editor yet. I see a couple of oddities when I round-trip XML files through it, but that may just need some tweaking of the read/write rules.

But at least FM7 loads DocBook XML 4.1.2, which is a big improvement over FM6+SGML. If nothing else, there is now a new hardcopy formatting engine for DocBook, if you can afford it. More importantly, if it works as advertised, it will open up DocBook authoring to a large class of users who prefer a word processing environment.

Ed Nixon adds

Here's a link to the thread where this was discussed on the docbook-apps mailing list: redhat.com Bob Stayton and I both posted details there about bugs/deficiencies in the current DocBook support in Frame 7. Here's a summary of a few things we've noticed so far:

From Bob:

1. The DOCTYPE declaration is changed from one with a PUBLIC identifier to one with a SYSTEM identifier, pointing to the FrameMaker copy of the DocBook DTD. The PUBLIC identifier is lost.

2. Default attributes are actually filled in on every element that has them. So every <filename> becomes <filename moreinfo="none">, etc. It's certainly valid, but mostly a lot of clutter since the DTD provides the same information.

3. Many ASCII characters are inexplicably converted to character entities. "C++" becomes "C&plus;&plus;", and http://localhost becomes "http:&sol;&sol;localhost".

4. When Saving As XML, I got a long error report about the xml:space attribute not being declared for <literallayout> and other elements. Turns out that the export rules add that attribute. It appears the left hand is complaining about the right hand.

5. My <ulink> elements inside my <literallayout> elements disappeared.

6. Blank lines at the end of a <literallayout> or <programlisting> disappeared.

From me:

7. I get an error for any footnotes I insert within Frame 7. It exports them as invalid <Footnote> elements -- uppercase, instead of the lowercase they should be. Also, it doesn't display real lowercase DocBook <footnote> content as footnotes -- only the invalid uppercase <Footnote> content is displayed correctly.

8. It also seems to add invalid 'align = "acenter"' attributes/values on inserted <graphic> and <imagedata> elements.

9. I found that all of my <ulink> elements disappeared on import into Frame 7 -- not just the ones inside <literallayout>.

It does allow you to insert <ulink> content into documents, but doesn't actually turn that content into hyperlinks (it doesn't turn <link> content into hyperlinks either).

So as far as I can tell, the only mechanism for inserting URL hyperlinks is to use the native non-XML FrameMaker hyperlink markup (i.e., type "message URL http://foo"; in the Special>Hypertext dialog, just as you would in Frame 6). On export to XML, these native Frame hyperlinks show up as processing instructions:

       <?FM MARKER [Hypertext] message URL http://foo?>

that aren't going to be useful to any applications other than Frame.

There is an Element Tag submenu under Special>Hypertext, but <ulink> and <link> are not on it.

10. On the upside, it does let you use the native DocBook <xref> element for inserting cross references, and exports those to XML correctly.

42.

Docbook and Math

???



> Will docbook provide solid math-rendering + pdf output?

A nice way to integrate docbook and math-rendering via tex its DBTeXMath (ricardo.edu), maybe you should give it a try

43.

Javadoc to docbook

Michael Fuchs

I want to announce DocBook Doclet. The DocBookDoclet creates DocBook SGML 3.1 or 4.1 and XML 4.1.2 code from Java source documentation. It is helpful if you want to print reference handbooks of your API.

About: The DocBook Doclet creates DocBook SGML or XML from Java source documentation or HTML files. It is helpful if you want to print reference handbooks of your API. Normally it is used with the Javadoc tool but it can also be used as a standalone application to convert HTML to DocBook. Additionally it comes with a Swing application to manage documentation projects and to transform the resulting DocBook files to PDF.

Changes: This version is of special interest of those, who want to convert HTML to DocBook XML 4.2. There was a total rewrite of the tokenizer, the parser and the transformation engine. The rewrite is not finished yet and invalid DocBook Code might still be created, but the transformation should be much more robust than in prior versions. The HTML - DocBook converter is now deployed as an independant jar archive (html2db.jar). A solution for deeply nested tables is also provided: Tables which are nested more than one level are swapted out to the parent of the topmost table.

For more information, see freshmeat.net

44.

XML to text

Bob Stayton



> Is there an XSL/DSSSL stylesheet available for direct XML->TXT conversion,
> so I could possibly modify it for my needs?

I don't know of a stylesheet for text output, but you might look at html2text. It gives you control over indents and such. It is available from: ~mbayer

45.

xinclude in docbook

Bob Stayton



>I would like to be able to include fragments of other docbook 
>documents in the document i'm writing. (to save maintaining 
>information in two places etc. etc.).
>
>

So far i've managed to use Xinlcude and do things along the lines of:

<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML 
V4.1.2//EN" "docbookx.dtd" [
<!ELEMENT xi:include (xi:fallback)>
                  <!ATTLIST xi:include
                      xmlns:xi  CDATA #FIXED 
"http://www.w3.org/2001/XInclude"         
                      href       CDATA                        
          #REQUIRED
                      parse      (xml|text)                   
           "xml"
                      encoding   CDATA                        
          #IMPLIED
                  >
<!ELEMENT xi:fallback ANY>
                  <!ATTLIST xi:fallback
                      xmlns:xi   CDATA #FIXED   
"http://www.w3.org/2001/XInclude"
                  >
]>

<article>
  <articleinfo>
    <title>Test</title>
    <author>
      <firstname>Sagar</firstname>
      <surname>Shah</surname>
    </author>

  </articleinfo>
  <section id="MyFirstSection">
    <title>foo</title>
    <para>la la</para>  
    <xi:include 
href="docbook-tech-support-1.dbk#xpointer(id('OtherIssues'))"/> 
  </section>
</article>

This all works fine :-)

>
>This is interesting to me.  Can you please tell us what tool(s) you
>are using that have implemented XInclude?  

I don't know how mature it is, but the Xinlcude support in libxslt and libxml2 works great for me :-)

I'm also using xincludes with xsltproc with good success. To do so, you just add --xinclude to your xsltproc command line. I understand that Cocoon implements xincludes as well, but I've not tried it. I've looked but not found support in Saxon for xinclude.

I've kept my xinclude usage simple: a master book document that contains just a sequence of <xi:include> elements to pull in the chapter files. Many people set up their books using external entities this way. The advantage of xincludes is that each chapter file can be a complete XML document with DOCTYPE declaration. That means chapter files can be loaded into an XML editor and validated individually without doing DOCTYPE tricks.

With a simple book file like this, I never have to load my book document into my XML editor, where it would complain about the DTD not declaring <xi:include> elements. I did try creating a DTD customization, but decided I didn't need it with this simple setup. I just make sure my individual chapter files are valid.

xsltproc supports xincluding part of a document using the #id syntax to locate an element by id. I tried some other XPointer syntax, but couldn't get it to work. I also tried using nested xincludes, but the internal ones were not 'included'.

To validate my book, I use xmllint, which is included with xsltproc:

xmllint --xinclude --postvalid --noout book.xml

I've run into situations where I wanted to process my content with tools that don't support xinclude. For those situations, I preprocess my book file to resolve the xincludes first. I use xsltproc --xinclude and a trivial XSL stylesheet that outputs all elements and attributes. Then I run the results through the other processor.

46.

OpenOffice to docbook project

Unknown

There is a OpenOffice to Docbook converter (in French and in developpement) : chez.com

As soon as the 0.3 version is released, it may be translated in english and post to this list.

The source code may still be useful for a Docbook to OpenOffice converter.

47.

Docbook Slides, in SVG

Steve Ball

In an effort to live in a M$ free world, I've developed an XSL stylesheet for the DocBook Slides DTD v3.0b1 that produces an SVG rendition of a set of slides. The idea is to have a slideshow that mimics PowerPoint without having to use PowerPoint (IOW, it's "pointless").

Until recently I used HTML for slides. The problem with that is that the navigation elements are always visible and that the presenter must use the mouse to locate and click on the 'next' link. I always found that to be awkward.

Instead, these SVG slides use a mouse click to navigate to the next slide. At the end of the slideshow, the last mouse click returns to the title foil. On all foils except the title foil there are hidden navigation 'buttons' for the previous foil and to return to the TOC foil. Moving the mouse over these buttons reveals them (navigating backward is an infrequent task, so having to use the mouse to do that is OK for now, but see below).

Please note that SVG doesn't readily support scrolling, so it is up to the user to ensure that foil content will fit within the slide boundaries. Most SVG viewers will allow panning, but that's a poor substitute. Also, SVG will not wrap long lines. Again, it is up to the user to break up long lines into smaller, digestable lines.

The stylesheet hasn't been tested on foilgroups; I'll be implementing and testing them soon. Also, very few of the DocBook elements are supported - only para, itemizedlist and imageobject. However, I thought I'd share this early work ASAP to get feedback and suggestions.

I have put up a page on our website with the download and usage instructions: zveno.com

Future work (apart from handling foilgroups and more elements) will include:

* snazzy slide transitions (swipes, fades, etc)
* key navigation. I use a Keyspan IR USB remote control that generates key events, so the slides have to respond to those events.

48.

X to docbook, a tool selection

Pradeep Padala


>  Can anybody point me to something that will easilly convert
>  these to docbook, and preserve some/most of their current
>  formatting?

For conversion from tex and wiki to docbook, TLDP has some tools here http://tldp.org/downloads/ wt2db can also handle text to docbook conversion to some extent.

For conversion from man pages to docbook, you can use this tool. http://www.tuxedo.org/~esr/doclifter/ It has link to some what stable and old distribution. I wrote a patch for this to add some more functionality. You can get the patch and the devel version at http://www.cise.ufl.edu/~ppadala/projects/doclifter I am not distributing doclifter. It's for your convenience only. You should ask Eric for proper release info.

I also wrote a patch to tidy which does the conversion of html to docbook. Details here. http://www.cise.ufl.edu/~ppadala/tidy Let me know, if you need help on this. I have successfully used it on HTML documents submitted to TLDP.

49.

Documenting a DTD

Yann Dirson


> > Could anyone please tell me, how created documentation (The
> > Definitive Guide) for DocBook DTD ?

> This knows probably just Norm :-)
>  
> > I want to create documentation for some DTD, but not want
> > manually describe links between parent/child elements.

> There are tools usually named like DTDParse (I know Perl and Java one)
> which can take any DTD, parse it and expose it to you via some API. With
> this you can quite easily walk through DTD and inspect parent and child
> elements defined in content models.

There is also perlsgml. The code's quite old, and there are useful patches around, but it usually works fine, and gives a result similar to TDG. patches: utoronto.ca and ftp, debian site

50.

manpage, troff etc to docbook tool

Michael Smith

Eric S. Raymond recently released v1.0.0 of doclifter, a Python 2.2 utility that converts man pages and other troff/nroff/groff documents to DocBook XML and SGML. I wrote up a short news item about it:

Here's a summary of the main features:

* converts man, mandoc, ms, me, and TkMan page sources
* parses command/function synopses and converts them into DocBook markup (using Cmdsynopsis, Arg, Replaceable etc. elements)
* recognizes 'stereotyped' patterns of markup and content (such as the use of italics in a FILES section to mark filenames) and 'lifts' them into DocBook markup
* recognizes things such as URLs, email addresses, man page references, and C program listings, and lifts them into DocBook
* maintains a record of semantic 'hints' it picks up from analyzing source docs (especially from command/function synopses), and provides a means to edit, add to, and save that record

51.

Windows docbook IDE

Stefan Priebsch

Today, we released the e-novative DocBook Environment (eDE) 0.5.3, a ready-to-go environment to work with DocBook on Windows. You are welcome to download it for free at docbook.e-novative.de.

Any questions, comments and suggestions are most welcome.

52.

RefDB and JReference comparison

Markus Hoenicka and Egon Willighagen

for people interested in a comparison between JReferences and RefDB I'd suggest to go back in the list archives about 1 year. We had a funny discussion back then.

I haven't tested any recent releases of JReferences, but from what I've gathered from the current discussion I see the following main issues that set RefDB apart from JReferences:

- RefDB works with DocBook XML and SGML documents, TEI XML documents, and LaTeX documents (with somewhat limited capabilities). Other SGML/XML document types can be added without modification of RefDB itself (just some stylesheet hacking).

- RefDB does *not* require any changes in the DTDs. Documents remain valid against the stock DTDs, so using RefDB does not create an interchange issue.

In the next version (October 2002) JReferences will not require that too

- RefDB does not modify the SGML/XML source document. The bibliography is created as an external entity. The citations (which are actually xref elements) are rendered by pulling in the appropriate information from the bibliography element and are hyperlinked (in HTML and PDF output) to the corresponding references in the bibliography. The advantage of not modifying the source document is that reformatting the document for a different bibliography style is a snap.

JReferences does change the doc, but that is easy to change... Might be an advantage too sometimes...

- RefDB uses a bibliography style database to render citations and bibliographies according to a specific style of a publisher or of a publication. This includes aspects like the sequence of the elements (authors, year, title, journal, volume, issue, pages...), the rendering of the author names (FM Last, F.M. Last, F. M. Last, Last,F.M. and all other permutations), as well as the rendering in the output formats (volume bold or italics, journal names regular or italics etc). Both author/year and numeric citation styles are supported. The styles are defined as XML documents.

A tie?

(Markus comes back with)

I wouldn't say so. One important difference is that JReferences uses Java classes to implement bibliography styles (it apparently does not know citation styles at all), whereas RefDB uses XML documents that define both the citation and bibliography style of a particular journal or publisher for up to 26 publication types (article, book, chapter, unpublished, thesis etc). I can expect anyone writing DocBook documents to be able to create an XML style specification without knowing how the RefDB C code works, but I wouldn't bet that all DocBook authors know enough Java to create a new JReferences style.

- In addition to the document-based bibliography output, RefDB can generate raw bibliographies in DocBook SGML/XML, TEI XML, RIS, BibTeX and a few other formats.

JReferences supports BibTeXML, DocBook, BibTeX at this moment only.

- RefDB is a multi-user system. Users can share a common reference database and still maintain their personal info (notes, reprint status, availability etc).

- RefDB uses a SQL database to store the references. The current stable branch uses MySQL, the development branch in CVS uses MySQL or PostgreSQL. Support for an embedded SQL library is in preparation.

These two are the reason why I continued JReferences after that discussion about a year ago. I still think RefDB is an excellent program, but not if you do not have a MySQL server around...

- RefDB uses scriptable command-line clients as well as a web interface. This offers both a convenient graphical interface and the power of unix plumbing.

True. JReferences will have a editor in the future.

- RefDB can store far more information per reference than actually is used to display the reference. Additional information includes an unlimited number of keywords, an URL to an electronic offprint, personal notes, availability information (where is that paper copy?), abstracts etc. This greatly simplifies retrieving the proper references and maintaining a large collection of paper or electronic offprints.

JReferences supports this too by using BibTeXML file database... Many ways to do it...

- RefDB can directly import RIS (the lingua franca of Windows reference databases), BibTeX and DocBook (with a little stylesheet tweaking) references. TEI import is in preparation.

JReferences support import of RIS, BibTeX, DocBook and BibTeXML.

Therefore I wouldn't support the notion that RefDB and JReferences are "similar" but the decision is left to those who actually use and compare both apps.

To see the RefDB bibliography capabilities at work, I suggest to visit: sourceforge.net

You'll find links to a DocBook SGML document transformed to HTML for two different biomedical journals, as well as a link to a TEI XML document transformed to PDF.

More info: sourceforge

jreference sourceforge with an intro at sourceforge.net

Markus's final response is:

Another issue are multiple citations containing consecutive reference numbers. Some styles just list them, others convert them to ranges, i.e. (1,2,3) vs. (1-3). I couldn't find any support in JReferences for this.

Many journals use some sort of author/year citation styles with all its intricacies. This includes the same author name formatting issues as in the reference styles, plus the fact that in citations the number of cited authors usually is limited and that the first and subsequent citations of the same reference may be formatted differently

Another helpful thing are in-text citations, like Jones et al. wrote recently (2001). Again, I don't see JReferences support this.

The main weakness of JReferences appears to be the missing support for document transformations. As far as I can interpret the JReferences sources (some sort of documentation would be helpful!), the modified documents are rendered by the default stylesheets. This is clearly insufficient for all styles that require parts of the citations or of the references to be rendered e.g. in boldface or in italics. This can't be (and should not be) encoded in the SGML document. RefDB creates matching driver files (DSSSL or XSL for DocBook, XSL for TEI) based on the citation and reference style which helps to render the document properly in any of the available output formats. This is essential if e.g. your publisher accepts only formatted Word documents, which is quite common in the life sciences these days.

All this doesn't mean that JReferences is bad software, but you should be aware that a large number of the bibliography styles used in life sciences is currently beyond the scope of that software, whereas RefDB handles them gracefully.

53.

Docbook ide for Windows

Stefan Priebsch

I have created a ready-to-go DocBook Environment for Win32 systems that installs in few seconds, does not put any rubbish into the registry or Windows directory and makes single HTML, chunked HTML and PDF generation from DocBook XML a snap. HTML help support will be available soon, hopefully. I´ve put it up for download along with some documentation at docbook.e-novative.de

I am looking forward to your feedback, ideas, criticism. Anybody interested in participating please let me know.

54.

RefDB and JReference comparison

Markus Hoenicka and Egon Willighagen

for people interested in a comparison between JReferences and RefDB I'd suggest to go back in the list archives about 1 year. We had a funny discussion back then.

I haven't tested any recent releases of JReferences, but from what I've gathered from the current discussion I see the following main issues that set RefDB apart from JReferences:

- RefDB works with DocBook XML and SGML documents, TEI XML documents, and LaTeX documents (with somewhat limited capabilities). Other SGML/XML document types can be added without modification of RefDB itself (just some stylesheet hacking).

- RefDB does *not* require any changes in the DTDs. Documents remain valid against the stock DTDs, so using RefDB does not create an interchange issue.

In the next version (October 2002) JReferences will not require that too

- RefDB does not modify the SGML/XML source document. The bibliography is created as an external entity. The citations (which are actually xref elements) are rendered by pulling in the appropriate information from the bibliography element and are hyperlinked (in HTML and PDF output) to the corresponding references in the bibliography. The advantage of not modifying the source document is that reformatting the document for a different bibliography style is a snap.

JReferences does change the doc, but that is easy to change... Might be an advantage too sometimes...

- RefDB uses a bibliography style database to render citations and bibliographies according to a specific style of a publisher or of a publication. This includes aspects like the sequence of the elements (authors, year, title, journal, volume, issue, pages...), the rendering of the author names (FM Last, F.M. Last, F. M. Last, Last,F.M. and all other permutations), as well as the rendering in the output formats (volume bold or italics, journal names regular or italics etc). Both author/year and numeric citation styles are supported. The styles are defined as XML documents.

A tie?

(Markus comes back with)

I wouldn't say so. One important difference is that JReferences uses Java classes to implement bibliography styles (it apparently does not know citation styles at all), whereas RefDB uses XML documents that define both the citation and bibliography style of a particular journal or publisher for up to 26 publication types (article, book, chapter, unpublished, thesis etc). I can expect anyone writing DocBook documents to be able to create an XML style specification without knowing how the RefDB C code works, but I wouldn't bet that all DocBook authors know enough Java to create a new JReferences style.

- In addition to the document-based bibliography output, RefDB can generate raw bibliographies in DocBook SGML/XML, TEI XML, RIS, BibTeX and a few other formats.

JReferences supports BibTeXML, DocBook, BibTeX at this moment only.

- RefDB is a multi-user system. Users can share a common reference database and still maintain their personal info (notes, reprint status, availability etc).

- RefDB uses a SQL database to store the references. The current stable branch uses MySQL, the development branch in CVS uses MySQL or PostgreSQL. Support for an embedded SQL library is in preparation.

These two are the reason why I continued JReferences after that discussion about a year ago. I still think RefDB is an excellent program, but not if you do not have a MySQL server around...

- RefDB uses scriptable command-line clients as well as a web interface. This offers both a convenient graphical interface and the power of unix plumbing.

True. JReferences will have a editor in the future.

- RefDB can store far more information per reference than actually is used to display the reference. Additional information includes an unlimited number of keywords, an URL to an electronic offprint, personal notes, availability information (where is that paper copy?), abstracts etc. This greatly simplifies retrieving the proper references and maintaining a large collection of paper or electronic offprints.

JReferences supports this too by using BibTeXML file database... Many ways to do it...

- RefDB can directly import RIS (the lingua franca of Windows reference databases), BibTeX and DocBook (with a little stylesheet tweaking) references. TEI import is in preparation.

JReferences support import of RIS, BibTeX, DocBook and BibTeXML.

Therefore I wouldn't support the notion that RefDB and JReferences are "similar" but the decision is left to those who actually use and compare both apps.

To see the RefDB bibliography capabilities at work, I suggest to visit: http://refdb.sourceforge.net/examples.html

You'll find links to a DocBook SGML document transformed to HTML for two different biomedical journals, as well as a link to a TEI XML document transformed to PDF.

More info: sourceforge

jreference sourceforge with an intro at sourceforge.net

Markus's final response is:

Another issue are multiple citations containing consecutive reference numbers. Some styles just list them, others convert them to ranges, i.e. (1,2,3) vs. (1-3). I couldn't find any support in JReferences for this.

Many journals use some sort of author/year citation styles with all its intricacies. This includes the same author name formatting issues as in the reference styles, plus the fact that in citations the number of cited authors usually is limited and that the first and subsequent citations of the same reference may be formatted differently

Another helpful thing are in-text citations, like Jones et al. wrote recently (2001). Again, I don't see JReferences support this.

The main weakness of JReferences appears to be the missing support for document transformations. As far as I can interpret the JReferences sources (some sort of documentation would be helpful!), the modified documents are rendered by the default stylesheets. This is clearly insufficient for all styles that require parts of the citations or of the references to be rendered e.g. in boldface or in italics. This can't be (and should not be) encoded in the SGML document. RefDB creates matching driver files (DSSSL or XSL for DocBook, XSL for TEI) based on the citation and reference style which helps to render the document properly in any of the available output formats. This is essential if e.g. your publisher accepts only formatted Word documents, which is quite common in the life sciences these days.

All this doesn't mean that JReferences is bad software, but you should be aware that a large number of the bibliography styles used in life sciences is currently beyond the scope of that software, whereas RefDB handles them gracefully.

55.

Docbook and texinfo

Mike Smith


> I'm working on getting some documentation for Cygwin into texinfo format.
> It is currently (I think) in DocBook 3.0. I am quite a beginner with db,

[...]

> Now, we would like users to be able to read the User's Guide online with
> info, pinfo, emacs, etc. and so want to get some texinfo or at least info
> files out of our sgml. I've been trying for the last couple of weeks to get
> docbook2x to work, but it doesn't look like this is going to be easy. I don't
> think some of the perl modules it requires will work on Cygwin without major
> overhaul. 

> Anyone have any other suggestions?

Instead of converting your DocBook source to texinfo, you could try converting your generated HTML to texinfo instead. There are at least a couple of applications designed to do HTML-to-texinfo conversion:

Will Estes' html2texi (written in C) web

Michael Ernst's html2texi (Perl script) web

Will Estes' C app compiles and seems to run OK on Cygwin.

Michael Ernst's Perl script has a couple dependencies, but will work on Cygwin if you get those met. One is to install the HTML::TreeBuilder Perl module, which is in CPAN -- so you should be able to install it by:

1. Starting up CPAN shell: perl -MCPAN -e shell

2. At the cpan> prompt, type: install HTML::TreeBuilder

It also depends on "checkargs.pm" module that's not in CPAN; instead you need to grab it from: web ...and then manually install it somewhere in your Perl @INC path, e.g., /usr/lib/perl5/site_perl

Another solution might be to give up on trying to run the conversion on your local Cygwin system, and run it on a local or remote Linux system that you can get docbook2x installed on or that has docbook2x already installed. If you don't have a local Linux system, maybe you can get shell access to a remote one. For example, if you have a Sourceforge account, I think you can get SSH access to the Sourceforge "compile farm" (cf.sourceforge.net), which includes at least one machine (running Red Hat 7.3) that has the docbook2x stuff installed.

56.

Difference markup for Docbook

Norman Walsh

Several people on this list have asked about DiffMk, my tool for generating changebars automatically for DocBook (and XHTML and XML Spec) documents.

The brave at heart will find a first beta of my Java rewrite of this program at my website

It works better, including support for a UI and word-level diffing, but it's a lot less mature and stable than the Perl version. Still, I've been using it successfully for a while so...share and enjoy!

P.S. Uh, except for the share part. I'd be just as happy if it didn't get mirrored around the globe in its current state. (Dec 02)

57.

RTF to XML Commercial Offerings.

Peter Ring

There are some more or less commercial offerings that one might learn something from. In no particular order:

Logictran RTF Converter logictran.com

X-ICE turnkey.com

YAWC Pro yawcpro.com/

Word2XML itnisk.dk

Epic E-3 arbortext But don't email them, they can't be bothered to reply is my experience.

CambridgeDocs xDoc XML Converter xyztechnologies.com

RTF2XML xmeta.com

eXportXML schultz.dk

i4i Tagless Editor schema.de

B-Bop Xfinity Author wX b-bop.com

infinity-loop upCast infinity-loop.com (They have my vote!)

SEMA rtf2rdc be.sema.com

Liquent Xtent liquent.com

XyEnterprise WorX SE xyvision.com

DynaTag seems to have vanished, but good ol' RainBow can still be found at rainbow

Automated markup of loosely structured documents is a hard problem. Sometimes, the most effective solution is to let a human being make the decisions, augmented by a system. Some pointers to interesting papers and tools:

Bayesian Belief Networks 1

Nested Text-Region Algebra 2

Lightweight Structure in Text 3

LAPIS 4

58.

Docbook from HTML?

Peter Ring

There's a patch to tidy that will produce DocBook SGML or XML cise.ufl.edu The last time I tried it, it was actually rather nice, but needed a bit more patching.

59.

RTF (Microsoft Word) and other to XML Conversion

Peter Ring

The open source community has no good RTF 2 XML converter. The only one that I found (and I searched and searched and asked questions) was Majix, a java utility. This utility was very limited. It lost footnotes and missed basic markup such as italics.

There are some more or less commercial offerings that one might learn something from. In no particular order: Logictran RTF Converter X-ICE YAWC Pro Word2XML Epic E-3 CambridgeDocs xDoc XML Converter RTF2XML eXportXML Stellent Outside In XML Export SCHEMA MarkupKit i4i Tagless Editor B-Bop Xfinity Author wX infinity-loop upCast - My favourite (DaveP SEMA rtf2rdc Liquent Xtent XyEnterprise WorX SE DynaTag seems to have vanished, but good ol' RainBow can still be found at Rainbow

Automated markup of loosely structured documents is a hard problem. Sometimes, the most effective solution is to let a human being make the decisions, augmented by a system. Some pointers to interesting papers and tools:

Bayesian Belief Networks Nested Text-Region Algebra Lightweight Structure in Text LAPIS Topologi Collaborative Markup Editor - I have a copy; still waiting for some decent documentation. Handles partly marked up text well. Exegenix Conversion System

60.

OpenOffice WYSIWYG DocBook

Bob Stayton

You might be interested to know that OpenOffice Writer can serve as a WYSIWYG DocBook editor. The support at this stage is somewhat preliminary, but worth exploring (see below). OpenOffice.org can always use more help if you care to participate in the project.

Native OpenOffice files are themselves XML, but the suite also integrates support for DocBook XML. There is a mailing list, xml.openoffice.org

Native DocBook and PDF import/export will feature in the next major release of OpenOffice (a few months away).

DocBook is already possible in the current release 1.0.1, albeit not 'out of the box.' You must install add-on components.

OpenOffice is fully cross-platform and even runs on Mac. porting.openoffice.org

To a large extent OpenOffice can import legacy Word files. (Worst case: RTF can bridge from Word to OpenOffice.)

Since OpenOffice itself uses XML file formats, the conversion to and from DocBook involves XSL transformations. These we can inspect and change if need be.

OpenOffice is free and open-source.

Here is a link to the UserGuide info on the DocBook xml.openoffice.org

61.

Docbook to pdf alternatives

Steiner Bang


> If FOP is still too immature, can anyone recommend another
> Docbook->something->PDF generator that is redistributable?  (We're
> hoping to avoid TeX and C++.)

Off the top of my head, I know of the following ways to transform DocBook XML into PDF, with open source/free/semi-free software:

Transform into XSL:FO, and then use one of the following to create PDF from the XSL:FO

- FOP apache
- PassiveTeX (TeX based) tei-c
- xmlroff (written in C. Based on Gnome libraries) sourceforge
- Use the DocBook SGML DSSSL style sheets, and jade/openjade, followed by TeX

- Transform into RTF, by one of the following, and then use AbiWord or OpenOffice to create the PDF:

- JFOR (Java) jfor
- XmlMind FO Converter (Java, free for personal use) xmlmind
- DocBook SGML DSSSL style sheets and jade/openjade
- Use DocBookInConTeXt (TeX. Takes DocBook XML source as input) hobby.nl
- Use the DB2LaTeX XSL stylesheets, and process with LaTeX sourceforge
- Use doctranformer (written in Java) to transform DocBook XML into LaTeX or lout for processing into PDF sourceforge

Did I miss any?

62.

Docbook XML to Latex

DB2LaTeX team.

We are pleased to announce you the immediate availability of DB2LaTeX 0.7

DB2LaTeX is a set of XSL stylesheets that transforms DocBook XML Documents into LaTeX 2e sources, which can be compiled into PDF or PostScript using a standard LaTeX distribution as MiKTeX or teTeX.

DB2LaTeX 0.7 contains extensive revisions since previous DB2LaTeX releases. Due to these changes, users of previous DB2LaTeX releases may wish to revise any custom driver files. DB2LaTeX 0.7 is a a beta release and further work remains to be done before the release of DB2LaTeX 1.0. Comments and bugfixes/patches are always welcome and in such cases, we encourage users to send in specific DocBook XML examples.

Ramon Casellas James Devenish DB2LaTeX team.

Sourceforge URL sourceforge

63.

XSL Processor

Robin Cover

So, xmlroff is available at SourceForge now.

See: Cover pages or sourceforge

xmlroff is an XSL formatter. That is, it creates formatted output -- pages containing text in a variety of type styles and sizes -- from an input XML document and an XSL stylesheet. This processing model is defined in the XSL Recommendation [XSL] that was developed by the W3C.

xmlroff is written in C, and uses it uses libxml2 and libxslt plus the GLib, GObject and Pango libraries that underlie GTK+ and GNOME (although it does not require either GTK+ or GNOME). GLib is a general-purpose utility library, GObject is a flexible extensible object-oriented framework for C, and Pango is a framework for the layout and rendering of internationalized text. This combination made it easier to develop the formatter, makes it easier for current GTK+ and GNOME developers to also work on the formatter, and allows the formatter to use the internationalization support of Pango.

xmlroff currently produces PDF output using the PDFlib library. Other output formats can be added.

xmlroff is a command line program, but the bulk of the XSL formatting is implemented as a 'libfo' library that can be linked with any program that requires XSL formatting capability.

64.

Word to docbook.

Johann Richard

The stylesheet is already available from our website at <infinity-loop> and yes, it is perfectly hidden, I must confess. This is the latest stylesheet we have and should work with upCast 3.0.9 (it does not work with earlier versions).

65.

Slides, animated

Lars Trieloff

Some time ago here on the list came the question for an DHTML-enabled version of the XSL-Stylesheets for the DocBook-Slides-DTD. For a presentation I created a new XSL-Stylesheet for the Slides DTD with animated transitions betweeen foils, table of contents and autogenerated overview for foilgroups, navigateable via keypress, CSS for print (gives you nice handouts)

It currently works for Mozilla and other Geck-based browsers and Internet EXplorer.

You can find an example and needed code under : this URL

66.

Docbook to texinfo

Steve Cheng

I have just released docbook2X, a package to convert to man pages and Texinfo.

docbook2X is a software package that converts DocBook documents into the traditional Unix man page format and the GNU Texinfo format.

It is free software under a MIT-style license.

Notable features include table support for man pages, internationalization support, and easy customization of the output using XSLT. (Easy, because unlike other converters, the docbook2X stylesheets are written in a modular way, and the character escaping and whitespace issues with the man-page and Texinfo formats are encapsulated away from the user.)

This release brings one major feature: support for table markup for man pages. For a list of the other changes, please see its home page: <URL:sourceforge>

Now, it seems that a lot of people here use Windows for their DocBook work, and I have a request to get docbook2X to work on cygwin. Unfortunately I haven't been able to; so far libxslt refuses to build. Does anyone have any pointers? Thanks.

67.

DSSSL based styling

Yann Dirson

final sgml2x 1.0.0 is out.

This is a minor update which adds missing manpages, and adds a better --dssslproc synonym for --jade, which is now deprecated.

Homepage
Savannah
Download

sgml2x allows to easily format a SGML or XML document using DSSSL style-sheets, and provides the following features:

- Multiple possible style-sheets per document class

- Easy specification of style-sheets using aliases

- Easy integration of new style-sheets by adding a simple new definition file in a configuration directory

- The caller can specify a PATH-like list of configuration directories, defaulting to a system-wide, a per-user, and a per-project configuration directories

- Automatic selection of a default style-sheet to be used, based on assigned priorities

- Pass arbitrary options to jade(1)

The document-class used to look for the style-sheets, and the output format, is for now only derived from the name with which the program is called, so you will want to call this program through symbolic links like docbook-2-pdf.

sgml2x is a implemented as a shell wrapper around jade(1) (or, preferably, openjade(1), although we use the generic name jade throughout this documentation), jadetex(1) and other tools.

68.

Unicode character normalization tools

Simon St.Laurent

I'm looking around for Unicode character normalization tools, preferably with a command-line interface.

So far I have: Normalization Demo unicode.org

Charlint (Perl, UTF-8 only) Charlint

Normalizer (part of ICU, don't see a command line) oss.software.ibm

Any others?

This seems to be the primary description of what's involved: unicode.org

69.

Alternatives tools for print output

Steinar Bang


> If FOP is still too immature, can anyone recommend another
> Docbook->something->PDF generator that is redistributable?  (We're
> hoping to avoid TeX and C++.)

Off the top of my head, I know of the following ways to transform DocBook XML into PDF, with open source/free/semi-free software:

Transform into XSL:FO, and then use one of the following to create PDF from the XSL:FO

FOP http://xml.apache.org/fop/

PassiveTeX (TeX based) http://www.tei-c.org.uk/Software/passivetex/

xmlroff (written in C. Based on Gnome libraries) http://xmlroff.sourceforge.net/

Use the DocBook SGML DSSSL style sheets, and jade/openjade, followed by TeX

Transform into RTF, by one of the following, and then use AbiWord or OpenOffice to create the PDF:

JFOR (Java) http://www.jfor.org/

XmlMind FO Converter (Java, free for personal use) http://www.xmlmind.com/foconverter/

DocBook SGML DSSSL style sheets and jade/openjade

Use DocBookInConTeXt (TeX. Takes DocBook XML source as input) http://www.hobby.nl/~scaprea/context/

Use the DB2LaTeX XSL stylesheets, and process with LaTeX http://db2latex.sourceforge.net/

Use doctranformer (written in Java) to transform DocBook XML into LaTeX or lout for processing into PDF http://doctransformer.sourceforge.net/

Did I miss any?

70.

Apple keynote presentations from docbook

Steve Ball

A XSL stylesheet to produce an Apple Keynote presentation from a DocBook slide document is now available from Zveno; go to: zveno.com to download the stylesheet and read instructions on how to generate a slide presentation.

71.

docbook editing on emacs

Mike Smith

I've put together a sort of convenience menu intended for people who work with DocBook documents and the DocBook XSLT stylesheets from within GNU Emacs 20.7 or later.

It's available from the DocBook Open Repository download page: http://sourceforge.net/project/showfiles.php?group_id=21935sourceforge

What it does, basically, is add a hierarchical, customizable "ready reference" sort of menu designed to provide quick access to a variety of DocBook-related documentation directly from within Emacs, and to files in the DocBook XSLT stylesheets distribution.

There are some screenshots of it at: logopoeia.com

By default, it provides the following submenus:


  DocBook: The Definitive Guide
  DocBook: The Definitive Guide (HTML Help)
  DocBook: Element Reference (Alphabetical)
  DocBook: Element Reference (Logical)
  ----
  DocBook XSL: The Complete Guide
  DocBook XSL: Parameter Reference - FO
  DocBook XSL: Parameter Reference - HTML
  ----
  DocBook XSL: Stylesheet Distribution
  DocBook XSL: Stylesheet Documentation
  ----
  DocBook FAQ
  DocBook Wiki
  DocBook Mailing List Search Form
  ----
  Customize DocBook Menu

Please give it a try and let me know if you run into any problems with it or have suggestions for improvement.

72.

Docbook and Word, again

Christian Roth



>A direct XSL transformation from Docbook to WordML must (in theory) be a
>more reliable process that relying on XML-->FO-->RTF-->.doc given that
>(in my experience) the RTF conversion doesn't support all features of RTF
>and RTF in itself doesn't support all the features you need to build a
>decent, structured Word Doc.

RTF includes essentially all structuring possibilities you also have in a .doc file, though honestly, I don't know which additional structuring layers Word 2003 resp. WordML introduces. What RTF is missing is some meta-info, password protection and macros.

That said, XML-->FO-->RTF-->.doc is a problematic workflow since semantic information (essentially style names in Word) is lost in the XML->FO step. This is why we have created for our products a DTD that's similar to WordML meaning that it expresses the structural capabilities of the Word application resp. RTF, but in a very concise, non-verbose way (the upCast DTD): on our website

It does not include layout properties as elements, as these are attached to the basic, structuring elements using plain CSS if desired.

Our product downCast is an upCast-DTD+CSS to RTF converter. The advantage in an

  arbitrary XML-->upCast-DTD (or similar)-->RTF 

workflow (as opposed to FO as intermediate format) is that:

(1) style names (and therefore, semantic information) is preserved in the resulting Word document, which makes converting it back to some reasonable XML - once edited - a lot easier,

(2) it still suffices to write the RTF generator only for one DTD

(3) you don't need to change the XML-->upCast DTD conversion step (i.e. in general the XSLT processing sheet) to change basic layout, since that comes from an external CSS stylesheet most users are familiar with and confident in editing.

Another advantage is that with RTF, in contrast to WordML, you are targeting not only Word 2003, but also legacy applications like Word 97, Word 2000, and many custom components using the reasonably documented RTF format as their storage/interchange format (like Mac OS X's TextEdit and the underlying engine).

73.

Permuted Index creation

John Shipman

I have a working permuted index, but it's not related to DocBook. Documentation is at: nmt.edu Follow the link for `StylIndex' near the bottom of the page. An example of an index built with this system is at: nmt.edu Click on the `Index' link on the top line of the page.

I have attached Python source code for the module that does permuted indexing. Feel free to borrow or adapt it.



================================================================
"""kwic.py:  Module for Key Word In Context (KWIC) indices.
    $Revision: 1.4 $  $Date: 2003/08/06 20:28:39 $

    KWIC indexing is a technique for finding occurrences of keywords
    in a set of strings.  It is a venerable technique dating back to
    the 1960s.  Definitions:

        KEYWORD:  A contiguous string of keyword characters that
            is not in the exclusion list.  Typically, keyword
            characters include letters, digits, and possibly
            pseudo-letters such as "_" and "-".

        EXCLUSION LIST:  A set of words that should be omitted
            from the index, such as `a', `and', and `the'.

        EXCLUSION FILE:  The file containing the exclusion list.
            The format is free: each contiguous clump of
            keyword characters is considered an excluded word.

    The retrieved strings may be presented in either of two forms:

      - UNPERMUTED FORM:  The string is divided into the PREFIX
        (everything up to the keyword), the keyword, and the
        SUFFIX (everything after the keyword.  Suppose the string
        is `Driving Miss Daisy'. Assuming none of those words are
        in the exclusion list, this string would appear in three
        index entries:
            [ "Driving Miss ", "Daisy", "" ]    # Keyword "Daisy"
            [ "", "Driving", " Miss Daisy" ]    # Keyword "Driving"
            [ "Driving ", "Miss", " Daisy" ]    # Keyword "Miss"

      - PERMUTED FORM:  All strings are padded to the maximum length
        of any string in the index, plus a GUTTER consisting of
        a few spaces, and then rotated so that the keyword always
        starts at the same point, the PERMUTE POINT.

        Here is an example of how the above three index entries
        might appear in a permuted form if the longest string in
        the index, plus the gutter, is 40 characters, and the
        character "=" is inserted before the beginning of each
        string to show its original beginning.  Also suppose
        that the `permute point' is 0.25, meaning that the
        keywords line up on column 10.  Further suppose that
        the string `Harold and Maude' has also been indexed,
        and the word `and' is in the exclusion list.  Here is
        how this index would look in permuted form:

            |          +-- Permute point              |
            |          v                              |
            |0         1         2         3         4|
            |01234567890123456789012345678901234567890|
            |-----------------------------------------|
            |ving Miss Daisy                      =Dri|
            |         =Driving Miss Daisy             |
            |         =Harold and Maude               |
            |arold and Maude                        =H|
            | =Driving Miss Daisy                     |

  Exports:
    class KwicIndex:  Represents a set of strings.
        KwicIndex(keyCset=None, exclusionFileName=None, permutePoint=None,
                  gutterSize=None, startMark=None, endMark=None,
                  keyPrefix=None, keySuffix=None):
          [ (keyCset is a Cset defining the keyword characters, defaulting
              to DEFAULT_KEY_CSET) and
            (exclusionFileName is the name of an exclusion file, defaulting
              to no exclusions) and
            (permutePoint is the permute point in [0.0,1.0), defaulting
              to DEFAULT_PERMUTE_POINT) and
            (gutterSize is the gutter size as a positive integer,
              defaulting to DEFAULT_GUTTER_SIZE) and
            (startMark is a string to be inserted before the index entry,
              defaulting to DEFAULT_START_MARK) and
            (endMark is a string to be inserted after the index entry,
              defaulting to "") and
            (keyPrefix is a string to be inserted before the keyword,
              defaulting to "") and
            (keySuffix is a string to be inserted after the keyword,
              defaulting to "") ->
                if exclusionFileName is not None and does not name a
                readable file ->
                  raise IOError
                else ->
                  return a new, empty KwicIndex object with those values ]
        .keyCset:           [ as passed to constructor, read-only ]
        .exclusionFileName: [ as passed to constructor, R-O ]
        .permutePoint:      [ as passed to constructor with defaulting, R-O ]
        .gutterSize:        [ as passed to constructor with defaulting, R-O ]
        .startMark:         [ as passed to constructor with defaulting, R-O ]
        .endMark:           [ as passed to constructor with defaulting, R-O ]
        .keyPrefix:         [ as passed to constructor with defaulting, R-O ]
        .keySuffix:         [ as passed to constructor with defaulting, R-O ]
        .add ( s, value=None ):
          [ if s is a nonempty string ->
              self  :=  self with s added to its set of strings and
                value associated with it ]
        .genRefs ( startKey=None, stopKey=None ):
          [ (startKey is a string or None) and
            (stopKey is a string or None) ->
              generate a sequence of KwicRef objects in ascending order
              by (keyword+prefix+suffix), ignoring case; if startKey
              is provided, all keywords < startKey are omitted; if
              stopKey is provided, all keywords >= stopKey are omitted ]
        .maxLength():
          [ returns the length of the longest string in self ]
        .permute(ref):
          [ ref is a KwicRef ->
              return ref permuted according to self's parameters ]
        .permuteLink(ref, linkifier):
          [ (ref is a KwicRef) and
            (linkifier is a procedure with calling sequence
                linkifier(ref, linkText)
              and intended function
                [ (ref is a KwicRef) and (linkText is a string) ->
                    return a string containing an HTML hyperlink
                    whose link text is (linkText) and whose href
                    attribute is derived from ref ]
              then ->
                return a string consisting of the ref, permuted
                according to self's parameters, and with 
                (self.keyPrefix+keyword+self.keySuffix) turned into
                a link by linkifier (or as much of that string as
                fits between the permute point and the end of the
                permuted line) ]
    class KwicRef:      Represents an occurrence of a keyword in a string.
        KwicRef(s, startPos, endPos, value):
          [ (s is a nonempty string) and
            (startPos and endPos define the position of the keyword
              as a nonempty slice of s in the usual Python convention
              of s[startPos:endPos]) and
            (value is any object) ->
              return a new KwicRef object with those values
        .s:             [ as passed to constructor, read-only ]
        .startPos:      [ as passed to constructor, read-only ]
        .endPos:        [ as passed to constructor, read-only ]
        .value:         [ as passed to constructor, read-only ]
        .show():
          [ return a triple (prefix, keyword, suffix) where each
            element is a string, prefix may be empty, and suffix may
            be empty ]
        .prefix():      [ return the prefix from self ]
        .keyword():     [ return the keyword from self ]
        .suffix():      [ return the suffix from self ]
        .__str__(self): [ return self.s ]
        .__cmp__(self,other):
          [ other is a KwicRef ->
              if self.value is lexically before other.value ->
                return -1
              else if self.value is lexically after other.value ->
                return 1
              else -> return 0 ]
    class ExclusionList:  Represents the exclusion file.
        ExclusionList ( exclusionFileName=None, keyCset=None ):
          [ ( exclusionFileName names the exclusion file, or None
              to start with an empty exclusion set ) and
            ( keyCset is a Cset enumerating the keyword characters,
              defaulting to DEFAULT_KEY_CSET ) ->
                if  exclusionFileName names a readable file ->
                  return a new ExclusionList object containing all
                  the unique keywords in that file
                else -> raise IOError ]
        .__contains__(self, x):     # `x in self' Python operator
          [ x is a string ->
              if self contains x, case-insensitive ->
                return 1
              else -> return 0 ]
"""

#================================================================
# IMPORTS
#----------------------------------------------------------------

from __future__ import generators       # Allow 2.2 generators
import string                           # Standard Python string library

#--
# Library routines from /u/john/tcc/python/lib 
#--
    
from set import *           # String set object
from cset import *          # Character set object
from log import *           # Error logging object
from scan import *          # Stream scanning object
from skiplist import *      # SkipList ordered container class


# - - -   f i n d K e y w o r d s   - - -

def findKeywords ( s, cset ):
    """Generates slice indices of all contiguous strings in cset.

      [ (s is a string) and
        (cset is a Cset object defining all the keyword characters) ->
          generate (startx, endx) tuples such that each slice
          s[startx:endx] is a contiguous clump of keyword characters in s ]
    """

    #-- 1 --
    startx  =  None

    #-- 2 --
    #   startx  :=  index of the last keyword character not
    #               preceded by a keyword character
    #   generate (startx, endx) slice indices defining all keywords
    #   that have non-keyword characters following ]
    for  x in range ( len ( s ) ):
        #-- 2 body --
        # [ if (startx is None) and (s[x] is a keyword character) ->
        #     startx  :=  x
        #   if (startx is not None) and (s[x] is a keyword character) ->
        #     I
        #   if (startx is None) and (s[x] is not a keyword character) ->
        #     I
        #   if (startx is not None) and (s[x] is not a keyword character) ->
        #     yield (startx,x)
        #     startx  :=  None ]
        if  cset.has(s[x]):     # s[x] is a keyword character
            if  startx is None:
                startx  =  x
        else:                   # s[x] is not a keyword character
            if  startx is not None:
                yield (startx, x)
                startx  =  None

    #-- 3 --
    if  startx:
        yield ( startx, len(s) )


# - - -   d e f a u l t e r   - - -

def defaulter ( value, defaultValue ):
    """Implements the "?:" operator

      [ if value is None ->
          return defaultValue
        else -> return value ]
    """
    if value is None:
        return defaultValue
    else:
        return value


# - - - - -   c l a s s   K w i c I n d e x   - - - - -

class KwicIndex:
    """Object to represent a complete index.

      State/Invariants:
        .__breakPoint:  [ 1.0 - self.permutePoint ]
        .__exclusionList:
          [ an ExclusionList object containing the set
            of words from self.exclusionFile, if any ]
        .__keyList:
          [ a SkipList object containing all references as KwicRef objects ]
        .__maxLength:
          [ if self.__keyList is empty -> 0
            else -> length of the longest string in self.__keyList ]
    """
    DEFAULT_KEY_CSET        =  Cset ( string.letters + string.digits + "_" )
    DEFAULT_PERMUTE_POINT   =  0.3      # Default for permutePoint
    DEFAULT_GUTTER_SIZE     =  2        # Default for gutterSize
    DEFAULT_START_MARK      =  "="      # Default for startMark


# - - -   K w i c I n d e x . _ _ i n i t _ _   - - -

    def __init__ ( self, keyCset=None, exclusionFileName=None,
                   permutePoint=None, gutterSize=None,
                   startMark=None, endMark=None,
                   keyPrefix=None, keySuffix=None):
        "Constructor for KwicIndex"

        #-- 1 --
        # [ self  :=  self with all parameter invariants in place ]
        self.keyCset            =  defaulter ( keyCset,
                                               self.DEFAULT_KEY_CSET )
        self.exclusionFileName  =  exclusionFileName
        self.permutePoint       =  defaulter ( permutePoint,
                                               self.DEFAULT_PERMUTE_POINT )
        self.gutterSize         =  defaulter ( gutterSize,
                                               self.DEFAULT_GUTTER_SIZE )
        self.startMark          =  defaulter ( startMark,
                                               self.DEFAULT_START_MARK )
        self.endMark            =  defaulter ( endMark,   "" )
        self.keyPrefix          =  defaulter ( keyPrefix, "" )
        self.keySuffix          =  defaulter ( keySuffix, "" )

        #-- 2 --
        # [ self.__breakPoint    :=  as invariant
        #   self.__keyList       :=  an empty SkipList object that allows
        #                            duplicates
        #   self.__maxLength     :=  0 ]
        self.__breakPoint   =  1.0 - self.permutePoint
        self.__keyList      =  SkipList ( allowDups=1 )
        self.__maxLength    =  0

        #-- 3 --
        # [ if self.exclusionFileName is None ->
        #     self.__exclusionList  :=  a new, empty ExclusionList
        #   else if self.exclusionFileName names a readable file ->
        #     self.__exclusionList  :=  a new ExclusionList object
        #         containing all keywords in that file
        #   else ->
        #     raise IOError ]
        self.__exclusionList  =  ExclusionList ( exclusionFileName,
            self.keyCset )


# - - -   K w i c I n d e x . a d d   - - -

    def add ( self, s, value ):
        "Add string s to self"

        #-- 1 --
        # [ self.__keyList  +:=  KwicRef objects for all occurrences
        #     of keywords in s ]

        #-- 1 head --
        # [ for every keyword character that is not preceded by a
        #   keyword character ->
        #     startx  :=  index of that keyword character
        #     endx    :=  index of the first character after startx
        #                 that is not a keyword character
        #     <loop body> ]
        for  startx, endx in findKeywords(s, self.keyCset):
            #-- 1 body --
            # [ self.__keyList  +:=  a KwicRef object for string s
            #       and slice [startx:endx] for value=value ]
            self.__addWord ( s, startx, endx, value )

        #-- 2 --
        # [ self.__maxLength  :=  as invariant ]
        self.__maxLength  =  max ( self.__maxLength, len(s) )



# - - -   K w i c I n d e x . _ _ a d d W o r d   - - -

    def __addWord ( self, s, startx, endx, value ):
        """Add one entry in self, for one occurrence of a keyword.

          [ (s is a string) and
            (0 <= startx <= endx < len(s)) ->
              if  uppercased s[startx:endx] is in self.__exclusionList ->
                I
              else ->
                self.__keyList  +:=  a KwicRef object for string s
                                     and slice [startx:endx] ] and
                                     value=value ]
        """

        #-- 1 --
        # [ keyword  :=  uppercased s[startx:endx] ]
        keyword  =  s[startx:endx].upper()

        #-- 2 --
        # [ if keyword is in self.__exclusionList ->
        #     return
        #   else -> I ]
        if  keyword in self.__exclusionList:
            return

        #-- 3 --
        # [ self.__keyList  +:=  a KwicRef object for string s and
        #                        slice [start:endx] ]
        ref  =  KwicRef ( s, startx, endx, value )
        self.__keyList.insert ( ref )


# - - -   K w i c I n d e x . g e n R e f s   - - -

    def genRefs ( self, startKey=None, stopKey=None ):
        "Generate all the references in self, in index order."

        #-- 1 --
        # [ if self.__keyList is empty ->
        #     ref  :=  None
        #   else ->
        #     ref  :=  first element of self.__keyList ]
        ref  =  self.__keyList.first()

        #-- 2 --
        # [ ref             :=  None
        #   self.__keyList  :=  self.__keyList advanced to end
        #   yield all elements of (ref + self.__keyList) that
        #   are in the range [startKey,stopKey), case-insensitive, in order ]
        while  ref is not None:
            #-- 2 loop --
            # [ if ref is in the range [startKey,stopKey) ->
            #     yield ref
            #   else -> I
            #   In any case ->
            #     ref, self__keyList  :=  first(self.__keyList),
            #                             last(self.__keyList) ]

            #-- 2.1 --
            # [ if ref is in the range [startKey,stopKey) ->
            #     yield ref
            #   else -> I ]
            key  =  ref.keyword().upper()
            if ( ( ( startKey is None ) or ( startKey <= key ) ) and
                 ( ( stopKey is None ) or ( key < stopKey ) ) ):
                yield ref

            #-- 2.2 --
            # [ if  self.__keyList is at end ->
            #     ref  :=  None
            #   else ->
            #     ref  :=  next element of self.__keyList
            #     self.__keyList  :=  self.__keyList advanced one ]
            ref  =  self.__keyList.next()


# - - -   K w i c I n d e x . m a x L e n g t h   - - -

    def maxLength ( self ):
        "Return the maximum length of self's strings."
        return self.__maxLength


# - - -   K w i c I n d e x . p e r m u t e   - - -

    def permute ( self, ref ):
        "Produce the permuted form of a reference."

        #-- 1 --
        # [ padLen  :=  self.__maxLength + self.gutterSize -
        #       (length of ref) ]
        padLen  =  ( self.__maxLength + self.gutterSize -
                     len ( ref.s ) )

        #-- 2 --
        # [ text  :=  self.keyPrefix + ref's keyword + self.keySuffix +
        #             ref's suffix + self.endMark + (padLen spaces) +
        #             self.startMark + ref's prefix ]
        text  =  "".join ( [ self.keyPrefix, ref.keyword(), self.keySuffix, 
                             ref.suffix(), self.endMark, " "*padLen,
                             self.startMark, ref.prefix() ] )

        #-- 3 --
        # [ return text, broken into two pieces at a position dictated
        #       by self.__breakPoint, and those pieces swapped ]
        breakPos  =  1 + int ( self.__breakPoint * len(text) )
        return text[breakPos:] + text[:breakPos]


# - - -   K w i c I n d e x . p e r m u t e L i n k   - - -

    def permuteLink ( self, ref, linkifier ):
        "Make self into a hyperlink."

        #-- 1 --
        # [ padLen  :=  self.__maxLength + self.gutterSize -
        #               (length of ref.s) ]
        padLen  =  self.__maxLength + self.gutterSize - len(ref.s)

        #-- 2 --
        # [ head  :=  ref's keyword decorated with keyword prefix & suffix
        #   tail  :=  ref's suffix followed by ref's keyword, decorated
        #             with start and mark, with padLen spaces inserted
        #             at the wraparound point ]
        head  =  "".join ( [ self.keyPrefix, ref.keyword(), self.keySuffix ] )
        tail  =  "".join ( [ ref.suffix(), self.endMark, " "*padLen,
                             self.startMark, ref.prefix() ] )

        #-- 3 --
        # [ breakPos  :=  the position where the breakpoint would fall ]
        breakPos  =  1 + int ( self.__breakPoint *
                               ( len(head) + len(tail) ) )

        #-- 4 --
        # [ if breakPos > (size of head) ->
        #     I
        #   else ->
        #     head  :=  head with characters past breakPos removed
        #     tail  :=  (characters from head past breakPos) + tail ]
        #--
        # Note: This step handles the case where the keyword is
        # too long to fit between the permute point and the end of the
        # permuted line.  When this happens, we move characters from
        # the head to the tail so that linkifier() only wraps a link
        # around the part that will fit.  Without this precaution,
        # we run the risk of placing the closing </a> tag before its
        # corresponding <a href=...> tag.
        #--
        if  breakPos <= len(head):
            tail  =  head[breakPos:] + tail
            head  =  head[:breakPos]

        #-- 5 --
        # [ return (characters from tail past breakPos) +
        #       (head with a link wrapped around it) +
        #       (characters from tail up to breakPos) ]
        return "".join ( [ tail[breakPos:],
                           linkifier ( ref, head ),
                           tail[:breakPos] ] )



# - - - - -   c l a s s   K w i c R e f   - - - - -

class KwicRef:
    "Represents one occurrence of a keyword in its context."


# - - -   K w i c R e f . _ _ i n i t _ _   - - -

    def __init__ ( self, s, startPos, endPos, value ):
        "Constructor for KwicRef"

        self.s         =  s
        self.startPos  =  startPos
        self.endPos    =  endPos
        self.value     =  value


# - - -   K w i c R e f . s h o w   - - -

    def show ( self ):
        "Return a (prefix, keyword, suffix) triple"
        return ( self.prefix(), self.keyword(), self.suffix() )


# - - -   K w i c R e f . p r e f i x   - - -

    def prefix ( self ):
        "Return self's prefix."
        return self.s [ : self.startPos ]


# - - -   K w i c R e f . k e y w o r d   - - -

    def keyword ( self ):
        "Return self's keyword."
        return self.s [ self.startPos : self.endPos ]


# - - -   K w i c R e f . s u f f i x   - - -

    def suffix ( self ):
        "Return self's suffix."
        return self.s [ self.endPos : ]


# - - -   K w i c R e f . _ _ s t r _ _   - - -

    def __str__ ( self ):
        "Return self's entire context string."
        return self.s


# - - -   K w i c R e f . _ _ c m p _ _   - - -

    def __cmp__ ( self, other ):
        """Compare two KwicRef objects lexically.

          The effective key value of an entry is the keyword,
          followed by one space, followed by the suffix and then
          the prefix as tiebreakers.  The space causes all the
          lines with the same keyword to group together; this
          was added pursuant to a bug found 1996-10-13 in the
          Icon version.

          Example: the left-hand column shows the (keyword, suffix)
          and the right-hand column shows how they'd sort without
          the extra space:
                ("icon", "setting",...)     ICONSETTING...
                ("icons", "ileaf",...)      ICONSILEAF...
                ("icon", "text",...)        ICONTEXT...
          but that second line should be with "icons", not with "icon".
        """

        #-- 1 --
        # [ keyA  :=  self's keyword + " " + self's suffix + self's prefix,
        #             upshifted
        #   keyB  :=  other's keyword + " " + other's suffix + other's
        #             prefix, upshifted ]
        format  =  "%s %s%s"
        keyA  =  format % (self.keyword(), self.suffix(), self.prefix())
        keyB  =  format % (other.keyword(), other.suffix(), other.prefix())

        #-- 2 --
        return  cmp ( keyA.upper(), keyB.upper() )



# - - - - -   c l a s s   E x c l u s i o n L i s t   - - - - -

class ExclusionList:
    """Represents the exclusion file, listing keywords not to be indexed."

      State/Invariants:
        self.__set:     [ a Set object containing self's keywords ]
    """

# - - -   E x c l u s i o n L i s t . _ _ i n i t _ _   - - -

    def __init__ ( self, exclusionFileName=None, keyCset=None ):
        "Constructor for ExclusionList.  Reads the exclusion file."

        #-- 1 --
        # [ self.__set      :=  a new, empty Set object ]
        self.__set      =  Set()

        #-- 2 --
        if  exclusionFileName is None:
            return

        #-- 3 --
        # [ if exclusionFileName names a readable file ->
        #     exclusionFile  :=  that file
        #   else -> raise IOError ]
        exclusionFile  =  open ( exclusionFileName )

        #-- 4 --
        # [ self.__set  +:=  clumps of characters in keyCset
        #       found in exclusionFile, upshifted ]
        for  line in exclusionFile:
            #-- 4 body --
            # [ self.__set  +:=  clumps of characters in keyCset
            #       found in line ]
            self.__readLine ( line, keyCset )

        #-- 5 --
        exclusionFile.close()


# - - -   E x c l u s i o n L i s t . _ _ r e a d L i n e   - - -


    def __readLine ( self, line, keyCset ):
        """Extract the words from one line of the exclusion file.

          [ line is a string ->
              self.__set  +:=  elements from keywords found in line,
                               upshifted ]
        """

        #-- 1 iteration --
        # [ startx,endx  :=  the starting and ending slice indices of
        #       each clump of characters in keyCset found in line,
        #       upshifted, in turn ]
        for  startx, endx in findKeywords ( line, keyCset ):
            #-- 1 body --
            # [ self.__set  +:=  line[startx:endx], upshifted ]
            self.__set.add ( line[startx:endx].upper() )


# - - -   E x c l u s i o n L i s t . _ _ c o n t a i n s _ _   - - -

    def __contains__ ( self, x ):
        "Does self contain x, case-insensitive?"
        return  x.upper() in self.__set

74.

DTD flatten

Mauritz Jeanson, Michael Smith


> Is there a 
> simple way to 
> produce a flattened DTD from a modular one using commonly 
> available tools?

You can use the dtdflatten script in the DTDParse package (sourceforge).

Along with the ones already mentioned, there's one that Scott Hudson posted a note about to the list It's a Java app -woodstox.codehaus.org It seems to work well and provides some useful options -

  $ java -jar  dtd-flatten.jar
  Usage: class com.ctc.wstx.tools.DTDFlatten[flags] [DTD file]
   flags:
     --output-comments (default)
     --strip-comments
     --output-conditional-sections
     --strip-conditional-sections (default)
     --output-pe-decls
     --strip-pe-decls (default)
     --output-whitespace:<mode> (mode: all/compact/minimum; default 'compact'
     --help [displays full help]

And it can handle entities that have system IDs with remote URIs. (Though I think it just slurps those down from the net, rather than trying to do catalog resolution).

75.

Dopus

Torsten Uhlmann

I'd like to introduce our new Docbook framework to you: Dopus. Dopus is a Java based framework which combines the various available Docbook tools into one easy to use toolchain.

Features: Java based. Can be used on Windows and Unix flavors. Supports Catalog resolving of XML System ID's. Supports Xinclude od modularized documents. Contains a flexible Customization layer that can be adapted globally or per document some more

The Dopus framework (Ant scripts and the customization layer) is distributed under the GNU GPL. You can download Dopus in the freeware section at cms.agynamix.de

Dopus is build upon Java and Apache Ant and uses freely available components like Apache FOP, Saxon and Apache Xerces. The components are put together using Apache Ant and a generator.[bat|sh] script which makes generating output a snap.

Dopus can be run on Windows (primarily testing environment) or Unix systems that support Java. The download archive includes a Windows JRE to make installation as painless as possible. Just copy to a directory and you're set!

Dopus supports the following features (amongst others):

Resolving of XIncluded documents Catalog resolving: the SYSTEM ID in an xml file can point to the URI, yet it is resolved against the locally stored docbook DTD. very flexible customization layer: can customize globally (all documents) or on a per document basis. flexible build mechanism: can plug in own Ant tasks that do some work on a per document basis

It can create the following document types:

html : create chunked html. singlehtml: create a single page html file. htmlhelp: create a Windows Help (CHM) file. eclipse: create an Eclipse Help plugin (together with toc.xml and plugin.xml). javahelp: Create a Jar file containing the help suitable to view with JavaHelp. pdf: Create a PDF document. validate: Validate the input docbook file. distribute: export the Docbook input files into a ZIP archive

76.

Multi-language documentation solution. docbook.sml

Joerg Moebius

I have posted my multi-language documentation solution at sourceforge

What is DocBook.sml?

sml stands for single-source multi-language. DocBook.sml provides an easy and secure way to maintain multilingual documentations. The core idea of DocBook.sml is keeping/assembling all documentation content of identical semantic, but of different language together in/into one document and deriving from such a 'documentation repository' all desired documentation artefacts.

Features

 
- Consistent maintenance of multilingual documentation.
- Unified documentation structures for all projects
- Customizable input and output templates and documentation procedures
- CAT (Computer aided Translation) Support
- Runs as part of an IDE and/or standalone
- Concentrates all necessary resources at one location
- Decouples the functional resources from subsystem versions
- Comprehensive example for the usage of DocBook XSL
- Free of charge.

77.

Literate programming and docbook

Dave Pawson

I've tried using xweb, from Norm Walsh in the past and been quite impressed by it. It's an example of literate programming. Linked from here as a separate file, since it's written using xweb! Comments appreciated. Perhaps via the docbook-apps mailing list?

78.

Building collaborative docbook documents

Camille Bégnis


In short Calenco allows to:
- Store modular DocBook 5 files (book, chapter, section, etc.) using
Xinclude, and images, on a central repository
- Store translations in specific language directories
- Allow authorized people to browse the repository through a Web browser
and upload/update files
- Store XSL customization layers
- Run a compilation (PDF, HTML, HTML chunked; for now) using standard or
customized XSL, all through the Web interface


You will find more information at: calenco.com Calenco is Free Software released under the AGPL license.

If you want to test it live: Connect to calenco.com login: guest@calenco.com password: invite

79.

How to use Ant with Docbook

Benjamin de Dardel

A project to process xml docbook to html and pdf, using ant scripts. SeeSourceforge