1. | Which stylesheets do what? | ||||||||||||||||
Now there are only two stylsheets for HTML: docbook.xsl and chunk.xsl. Autoindexing is now integral part of both of them, there is no need to use separate autoidx.xsl as in older releases. onechunk.xsl is just abnormality which was added upon request for some user, and it looks like that this stylesheets is broken now.
These files are main entry points of DocBook XSL stylesheets. Normal user doesn't need to directly call any other stylesheet. I hope that I listed all top-level stylesheets. | |||||||||||||||||
2. | How to use a CSS stylesheet with docbook | ||||||||||||||||
No need to edit the output files. You can create a separate CSS stylesheet file that contains all the style information, and then associate that stylesheet with all of your html files. If you look at the HTML output from DocBook, you'll see a lot of <div class="element"> tags, where "element" is the DocBook element that produced that <div>. You write your css stylesheet to associate CSS styles with those div class tags (see a good css a reference to learn how to do that). To connect it with your files, you set the 'html.stylesheet' parameter in either DSSSL or XSL to the name of your stylesheet file. You do that in a stylesheet customization, which is described inthe faq. That parameter causes an HTML element to be inserted into each generated HTML file that associates your css stylesheet with that HTML file. Then just make sure the stylesheet file gets copied to each HTML output directory. It's a nice system because you can control all the formatting for all of your output from a single css file. | |||||||||||||||||
3. | Produce html and pdf via fop. | ||||||||||||||||
Here is a *nix script to produce both html and pdf using fop.
| |||||||||||||||||
4. | Changing the output encoding for website | ||||||||||||||||
At first, you should upgrade at least to Saxon 6.4.3. Previous version contained some serious bugs making DocBook chunking unusable. It depends on website usage. If you generete website using Makefile and make program, you must create customization stylesheet like this: <?xml version="1.0" encoding="iso-8859-2"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:saxon="http://icl.com/saxon" extension-element-prefixes="saxon"> <xsl:import href="file:///path/to/website/xsl/tabular.xsl"/> <xsl:output method="html" encoding="iso-8859-2" saxon:character-representation="native"/> </xsl:stylesheet> Now use this stylesheet instead of supplied one. If you are not using makefiles and dependency checking and multiple file generation is left to XSLT extensions, you should create following customization: <?xml version="1.0" encoding="iso-8859-2"?> Output encoding in meta tag of generated HTML files is inserted automatically by XSLT processor. This encoding can be changed by mechanism described above. | |||||||||||||||||
5. | literallayout and programlisting | ||||||||||||||||
Here is what I found about 1. If the markup includes only text inside of 2. If the elements have a There are three ways it works okay from my perspective. 1. If there is no space between the elements then it works but the source looks ugly.
2. If you put the 3. Leave an extra empty line in the file. This makes extra space in the result and is really undesirable.
I really don't understand the stylesheets well enough to play around for
different approaches but I feel this is a bit confusing. Most people
understand the idea of literal text like | |||||||||||||||||
6. | Using Namespace in XSL to generate XSL from Docbook | ||||||||||||||||
If you want to limit the namespace declarations in the output, put the xmlns: on the xsl:template that processes xrefs, not at the top level: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> That usually limits the placement of namespace decls in the output to appropriate places. Note, however, that the XSLT Rec requires only that processors output sufficient namespace declarations to form a legal XML+Namespaces document. | |||||||||||||||||
7. | Grouping Glossary entries. | ||||||||||||||||
Off the shelf, DocBook provides a "subject" attribute on Glossdef, so you could use that to associate each of Glossdef with a product. But... it seems like what would be more useful is a flag on each Glossentry container instead, so maybe something like this: <glossentry id="gloss.foo" role="product_one"> That way, you could then use Jirka Kosek's profiling stylesheet in the XSLT stylesheet distribution (tools/profile/profile.xsl) to generate product-specific versions of the glossary. Just set the value of the XSLT "attr" parameter to "role" and the value of the "val" parameter to the product categories you want; for example, using Saxon, to generate a glossary containing just entries you've flagged with "product_one" and "product_three", you'd do something like this: java com.icl.saxon.StyleSheet \ And to make things easier on yourself, I think you could save the trouble of typing your product names each time if you worked from a DTD customization layer in which you set the value of the "role" attribute for Glossentry to an enumerated list of your product categories, which you could do by redefining the glossentry.role.attrib parameter entity like this: <!ENTITY % glossentry.role.attrib | |||||||||||||||||
8. | Image sizing | ||||||||||||||||
I think it breaks down like this. Imagine that "image.eps" is 6in wide and 10in deep. <imagedata Renders the image at a size of 3in by 5in. (I think specifying only the width or depth should have the same effect; specifying width="3in" depth="3in" should scale the figure anamorphically.) <imagedata Renders the 6in x 10in image in a 3in x 5in space (we don't have an attribute to control whether the image would be cropped or overprinted, so you'd get whatever the stylesheets say). <imagedata Renders a 7.2in x 12in image. <imagedata Renders an image scaled to a width (with properly scaled depth) of 100% of the available area. (What constitutes the available area depends on the context, of course. In print, it would usually be the column width.) <imagedata fileref="image.eps" width="100%"/> Renders the 6in x 10in image in space equal to the width of the available area. (we don't have an attribute to control whether the image would be cropped or overprinted, so you'd get whatever the stylesheets say). | |||||||||||||||||
9. | Styling html output | ||||||||||||||||
This is most easily done with a CSS stylesheet associated with your HTML output. You can do that with the 'html.stylesheet' parameter. If you look in the generated HTML output, you'll see each element is output in a <div class="elementname">, which makes writing css stylesheets that style specific elements easy. If you put your source code in <programlisting> elements, and create a CSS stylesheet that styles div.programlisting, then you can get what you want. | |||||||||||||||||
10. | Implementing Cals Tables | ||||||||||||||||
For anybody thats interested, this in an implementation of Norman Walsh's docbook tables that handles <spanspec> (using some code from David McNally). I have stripped out as much non-essential code as possible, so if anyone needs to add functionality they shouldnt have too much trouble following the code (well, less trouble than I had ;) If anybody adds to this to make it closer to the CALS spec, please let me know so I can update mine.
| |||||||||||||||||
11. | Graphic file format in XSL stylesheets | ||||||||||||||||
It appears that TIFF files are not supported in the current DocBook XSL stylesheets. The test for graphics format looks like this in graphics.xsl: <xsl:template name="is.graphic.format">
<xsl:param name="format"></xsl:param>
<xsl:if test="$format = 'PNG'
or $format = 'JPG'
or $format = 'JPEG'
or $format = 'linespecific'
or $format = 'GIF'
or $format = 'GIF87a'
or $format = 'GIF89a'
or $format = 'BMP'">1</xsl:if>
</xsl:template>So these are the file formats presently supported. | |||||||||||||||||
12. | No en localization errors | ||||||||||||||||
> No "en" localization of "TableofContents" exists; using "en". This message comes from the system of text templates for localizing generated text, such as "Table of Contents". It comes from common/l10n.xsl, based on the text templates in common/en.xml and such. For some reason, your setup isn't finding the right text templates. Are you sure you are using the 1.48 or later stylesheets? That redundant part of the message was fixed in version 1.48, so it should read: No "en" localization of "TableofContents" exists. | |||||||||||||||||
13. | Breaking long strings such as url's | ||||||||||||||||
One should also be able to insert ­ (which is defined in the character entity files that are part of the DocBook DTD--or you can just use ­) to insert a discretionary hyphen. This will cause a hyphen (dash) character to be printed at the end of the line if a break is taken here. One can also insert ​ which is the "zero width space" character which allows a line break here without the insertion of a dash. (This character may not be implemented as such in all formatters.) In both cases, if the line isn't broken at that spot, your output doesn't have an introduced space as would be the case with solution 1. | |||||||||||||||||
14. | Using changebars and chunk.xsl together | ||||||||||||||||
Make your own copy of changebars.xsl (e.g. changebars-chunk.xsl) and change <xsl:import href="docbook.xsl"/> to <xsl:import href="chunk.xsl"/>. It should (could;-) work. | |||||||||||||||||
15. | Does stylesheet processing validate the source document? | ||||||||||||||||
That's correct, xsltproc does not validate. It only uses the DOCTYPE to resolve any entities it finds in the content. I don't think Saxon validates either. XSLT processors are designed to handle well-formed documents in addition to valid documents. So it really isn't their job to determine if a document is valid. | |||||||||||||||||
16. | The Unicode Bidirectional Algorithm | ||||||||||||||||
The Unicode Bidirectional Algorithm in a nutshell... Bidirectional types Unicode characters have a "bidirectional type". There's lots of types, but they're divided into three categories: strong, weak, and neutral. Characters with a strong bidirectional type really know their directionality. For example, the characters in most alphabets are "strongly" left-to-right, and the characters in the Hebrew and Arabic alphabets (and some others) are "strongly" right-to-left. Characters with a weak bidirectional type determine their directionality according to their proximity to other characters with strong directionality. Characters with a neutral bidirectional type determine their directionality from either the surrounding strong text or the embedding level. Embedding levels The Unicode Bidirectional Algorithm works in terms of "levels" of right-to-left text embedded with left-to-right text, and vice versa. Even levels (0, 2, 4...60) are left-to-right. Odd levels (1, 3..61) are right-to-left. Text at an even level is rendered left-to-right. Text at an odd level is rendered right-to-left. The Unicode Bidirectional Algorithm works on paragraphs, so the first step is dividing text into paragraphs. You determine the "paragraph embedding level" by finding the first character in the paragraph with a strong bidirectional category. If the character is strongly left-to-right, the paragraph embedding level is 0, otherwise (i.e. if the character is strongly right-to-left), the embedding level is 1. Embedding goes on from there: contained text with the opposite directionality is at the next embedding level, and text with the original directionality that is contained by the text with the opposite directionality is at the next lowest embedding level. Explicit bidirectional formatting Unicode includes characters for fudging the embedding level:
The example in the Unicode Standard shows RLM being used with an exclamation mark (i.e., '!') that is between some left-to-right text and some neutral text, all of which is within some right-to-left text. Without the RLM, the ! is treated as part of the span of left-to-right text. With the RLM between the left-to-right text and the !, the ! is treated as part of the right-to-left text, which changes on which end of the left-to-right text it is rendered. - LRM, Left-to-Right Mark, is a zero-width (i.e. it doesn't print) character that is used as an invisible spot of strong left-to-right directionality to coerce neighbouring weak and neutral characters into behaving the way you want. This doesn't change the embedding level. RLM and LRM are good if you know what you're doing, you probably have an editor that lets you represent them, and you're worried about conserving embedding levels. For the rest of us, the other five characters represent the brute-strength and ignorance approach that we're more comfortable with. Bidirectional conformance Systems do not need to support any explicit directional formatting codes. The "implicit bidirectional algorithm" can be taken as handling bidirectionality based solely on embedding levels and the characters' bidirectionality types and without any overrides. There isn't an "explicit bidirectional algorithm" as such. The explicit codes distort the embedding levels compared to what they would ordinarily be, but after they've been taken into account, the "implicit" algorithm, based on embedding levels and characters' types, is what finally determines which text is rendered in which direction. Higher-level protocols The "permissible ways for systems to apply higher-level protocols to the ordering of bidirectional text" are:
HTML, CSS, and XSL do the first and third only. HTML HTML has a "dir" attribute for indicating the direcionality of text. The allowed values are RTL and LTR. I.e., "dir" overrides the paragraph embedding level and replaces the embedding codes. HTML also has a <bdo> element that is used for overriding the effects of the bidirectional algorithm on a span of text. I.e., it replaces the override codes. The HTML Recommendation warns against mixing its controls with explicit bidirectional override characters. Hardly surprising. CSS2 As Paul noted, CSS has a "direction" property with values "ltr" and "rtl" (and "inherit"). It specifies "the base writing direction of blocks and the direction of embeddings and overrides for the Unicode BIDI algorithm." I.e., it overrides the paragraph embedding level for blocks (i.e., for what Unicode considers paragraphs) and it's also used for replacing the bidirectional overrides and embedding codes. The "unicode-bidi" property is the other half of how CSS2 replaces the bidirectional overrides and embedding codes. The allowed values are "normal", "embed", "override", and "inherit". 'unicode-bidi: normal' doesn't do anything, which is why 'normal' is the default value. 'unicode-bidi: embed' is equivalent to RLE (when 'direction: rtl') or LRE (when 'direction: ltr') at one end of a span of text and a PDF of the other. 'unicode-bidi: override' is equivalent to RLO (when 'direction: rtl') or LRO (when 'direction: ltr') at one end of a span of text and a PDF of the other. XSL XSL has "direction", "unicode-bidi" and "writing-mode" properties, although they don't all apply to all the same formatting objects. "writing-mode" applies to the formatting objects that set up a "reference-area", i.e., to the big-picture formatting objects that specify the page, the regions with the page, to tables, and to table cells. It affects how you sequence blocks of text, but it also overrides the "paragraph embedding level." "direction" and "unicode-bidi" apply only to the "bidi-override" formatting object. They behave pretty much like in CSS2, except that the inital value of "direction" is derived from the current "writing-mode" value rather than being explicitly "ltr". (Determing the initial value of "direction" this way probably means fewer surprises when formatting a purely right-to-left document, but the "direction" description does read like it was written for "direction" to apply to more formatting objects than just "bidi-override".) Conclusion 1. If using markup to control bidirectionality, you need a way to set the paragraph embedding level (i.e., set whether the paragraph starts out right-to-left or left-to-right) as well as a way to override the implicit bidirectional algorithm (the algorithm that works w.r.t. the characters' bidirectional types). 2. Markup that overrides the implicit bidirectional algorithm should support both overrides (RLO and LRO equivalent) and embeds (RLE and LRE equivalent). 3. Include strong words against mixing markup-based bidirectionality controls and the explicit bidirectionality characters. 4. Consistency with existing standards is a GOOD THING. Compatibility with the Unicode Bidirectional Algorithm is essential. 5. Work out whether every inline can affect bidirectionality (CSS style) or whether there's one special-purpose element (HTML and XSL style, although I don't expect XHTML to stick to that and it doesn't matter for HTML anyway if you're also using CSS). 6. A politically correct default direction is hard to determine. CSS2 uses 'ltr', and XSL lets the XSL processor have a default. | |||||||||||||||||
17. | Seperating footnotes from body text | ||||||||||||||||
this is simple and standard.
<fo:static-content flow-name="xsl-footnote-separator">
<fo:block>
<fo:leader leader-pattern="rule"
leader-length="100%" rule-thickness="0.5pt"
rule-style="solid" color="black"/>
</fo:block>
</fo:static-content>
| |||||||||||||||||
18. | XHTML table output | ||||||||||||||||
There are two XHTML 1.0 DTDs, Transitional XHTML and Strict XHTML. By default, the DocBook stylesheets produce Transitional XHTML, and it should be valid if you supply alt attributes on all your images. Instances of output that is not valid Transitional XHTML should be reported as bugs. Strict XHTML is harder. There is currently no parameter to turn on Strict XHTML output from the DocBook XSL stylesheets. It is possible to specify the Strict DTD in the output DOCTYPE, but the output may not be valid. It somewhat depends on what attributes you use in your XML files, and what parameters you use during processing. | |||||||||||||||||
19. | Insert date into output | ||||||||||||||||
Add following code into your customized template user.head.content:
<meta name="date">
<xsl:attribute name="content">
<xsl:call-template name="datetime.format">
<xsl:with-param name="date" select="date:date-time()"/>
<xsl:with-param name="format" select="m/d/Y"/>
</xsl:call-template>
</xsl:attribute>
</meta>
Don't forget to declare date namespace as xmlns:date="http://exslt.org/dates-and-times" and to either choose a processor that has the exslt built in, or get the code from exslt.org | |||||||||||||||||
20. | Admonition styling | ||||||||||||||||
I asked Norm about this a couple of months ago. The DSSSL stylesheets always did one or the other. But the XSL stylesheets have had both the word and the graphic (when turned on) from their first release. He thought it was probably a bug, but since it has been in use for so long, some people expect the word to be there too, so just turning it off seems like a bad idea. That's as far as we got, and no changes were made. Perhaps there should be a new parameter that would turn off the text label. Maybe 'admon.textlabel' with 1 (on) as the default to retain the current behavior. In the mean time, you could try the following crude workaround in a customization layer:
<xsl:template match="note|important|warning|caution|tip"
mode="object.title.markup">
</xsl:template>
The 'graphical.admonition' template processes the note element in object.title.markup mode to get either the explicit <title> element if it has one or the generated text label "Note". Making this an empty template like this will deliver no text at that point. It is kind of a crude solution, and I don't know if there would be side effects. I don't think that mode is used in other places for admonitions because their titles don't appear in the table of contents, and it would be rare to xref to one. | |||||||||||||||||
21. | CSS styling on tables | ||||||||||||||||
Put this in your css to make tables inherit your font settings:
/* Rule to fix quirks-mode inheritance behavior */
table, caption {
font-size: inherit;
font-weight: inherit;
font-style: inherit;
font-variant: inherit;
}
For the explanation see: devedge.netscape.com Off topic, but I think setting font size absolutely is a bad idea. It's not nice for unfortunate souls who use IE and want to set their screen resolution to 1600x1200. Well, that trick works for Opera and Mozilla, but not for IE. Still, the difference you're seeing is the difference between compliance mode and compatiblity/quirks mode. The website output is xhtml and the presense of the doctype puts newer browsers in compliance mode, so font-size inherits to table. The other output is just html, so newer browsers go into quirks/compatility mode and font-size isn't inherited by table. Lots of stuff written about it: article What you do depends on what browsers you need to support etc. | |||||||||||||||||
22. | Including SVG in docbook | ||||||||||||||||
Really simple, if you use external files. Just do exactly what you would do with a PNG file. If however you want to embed the svg into a docbook document, you need to use Norms SVG extensiom (see http://docbook.org). Note, hoever, that e.g. although FOP and Batik works quite nicely together, you will not yet be able to use all SVG elements. Partly because the integration isn't perfect yet, and partly due to limitations in the PDF format. I have no experience with XEP/Anthenna House and SVG yet. (ed. SVG is supported in XEP) As always, check Bob Staytons book (Bobs pages) in chater 15, which normally has an answer to everything you ever want to ask about docbook. | |||||||||||||||||
23. | Segmented list decoration | ||||||||||||||||
You want to customize this template in lists.xsl: <xsl:template match="seglistitem"> <xsl:apply-templates/> </xsl:template> For HTML output, it could be: <xsl:template match="seglistitem">
<!-- rule above for first one only -->
<xsl:if test="not(preceding-sibling::seglistitem)">
<hr/>
</xsl:if>
<xsl:apply-templates/>
<hr/>
</xsl:template>
| |||||||||||||||||
24. | Processing docbook, catalogs, Xinclude and XSLT processor | ||||||||||||||||
Actually, this bit made it into my book at the last minute. See: sagehill.net | |||||||||||||||||
25. | Localisation in seperate files | ||||||||||||||||
You can't. A better way to approach this problem is probably to use a separate XML document with the translations that you then get as needed, rather than using statically-defined variables. For example, if you name your files with the language code, then you can do something like this: <xsl:variable name="langCode" select="/*/@xml:lang"/>
<xsl:variable name="boiler-plate-item-one"
select="document(concat('trans/boilerplate_',
$langCode,
'.xml'),.)/boilerplate[@id = 'item-01']/>This will select the <boilerplate> element with the ID "item-01" from the file "./trans/boilerplate_zh-CN.xml" [relative to the current document, leave out the ",." parameter of document() if you want the URL to be relative to the stylesheet] if the value of the root element's xml:lang= attribute is "zh-CN". | |||||||||||||||||
26. | Docbook to text. Method 1. | ||||||||||||||||
Btw., here's my version of that. It currently relies on a saxon extension to perform a second pass on the output and cleanup a problem with extra lines after bullets and numbers in lists. 1. Use Saxon to run html2txt.xsl (cleanup.xsl must be in the same directory) on the docbook file. 2. Run "links -dump filename.html | tr -d '\000' > filename.txt" (or links -dump filename.html | tr -d '\000' | unix2dos > filename.txt if you expect people to use notepad to open the file.) The main annoyance left is with programlistings. If you have a programlisting in a listitem, the programlisting is flush left even tho the listitem is indented. I don't do anything with formatting of inlines and can't remember what happens with ulinks, but maybe this can help you get stared. =============================================================== html2txt.xsl =============================================================== <?xml version="1.0" encoding="US-ASCII"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:saxon="http://icl.com/saxon" version="1.0"> <xsl:import href="http://docbook.sourceforge.net/release/ xsl/1.60.1/xhtml/docbook.xsl"/> <xsl:output method="xml" encoding="us-ascii" saxon:next-in-chain="cleanup.xsl"/> <xsl:param name="appendix.autolabel" select="1"/> <xsl:param name="chapter.autolabel" select="1"/> <xsl:param name="part.autolabel" select="1"/> <xsl:param name="preface.autolabel" select="1"/> <xsl:param name="section.autolabel" select="1"/> <xsl:param name="admon.graphics" select="0"/> <xsl:param name="callout.graphics" select="'0'"/> <xsl:template match="index|figure|informalfigure|mediaobject" priority="1000"></xsl:template> <xsl:param name="generate.toc"> appendix toc article toc book toc chapter toc part toc preface toc qandadiv toc qandaset toc reference toc section toc set toc </xsl:param> </xsl:stylesheet> cleanup.xsl: <?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
version="1.0">
<xsl:output
encoding="us-ascii"
method="xml"
indent="yes"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="xhtml:li/xhtml:p[1]">
<xsl:apply-templates select="node()"/>
</xsl:template>
</xsl:stylesheet>
| |||||||||||||||||
27. | Table of Content | ||||||||||||||||
Here is a basic outline of how a book TOC works, and I hope it helps you see how you might customize the process to add a new element. 1. The template with match="book" calls a template named 'division.toc' to start the process. 2. The division.toc template in autotoc.xsl selects into a variable named 'nodes' which elements are to appear at the top level of the book toc. These include chapter, appendix, glossary, index, etc. Then it generates the TOC titlepage (basically just the "Table of Contents" title at the top of the page). 3. Then it applies templates to the selected elements using mode="toc". So you need to find how each element is processed in mode="toc" to see what happens next. 4. Also in autotoc.xsl, there is a template with match="preface|chapter|appendix|article" with mode="toc". It processes the current element by calling the template named 'toc.line', which generates a single line in the TOC. 5. Then it selects which of its child elements are to be subheadings under the current element and stores them in a 'nodes' variable. If there are such nodes to be processed, it sets an indent and applies templates to those nodes, also in mode="toc". This process continues to the next level and then the next, to the limit of the sections that are designated to be included in the TOC. As you can see, it is a recursive process that generates the TOC, descending down through the hierarchy of elements to create a similar hierarchy in the TOC. To include other elements not already accounted for in the TOC, you need to add them to the 'nodes' selection at the appropriate point. This would most likely be in the templates that match on section elements in mode="toc". So you would copy that template to your customization layer and change which nodes are selected. Then you need to supply a new template that matches that element name and in mode="toc". It should be similar to the others in that mode. Since your para element doesn't seem to have children that should appear in the toc, calling the toc.line template may be enough. To actually generate the text for the element in the toc, you should look at the toc.line template. It works by applying templates to the current element in mode="titleabbrev.markup". Such templates already exist for any titled elements in Docbook, but your paragraph elements wouldn't have titles. So you need to add a template in that mode that matches your selected para elements. You are correct that the TOC machinery calls a lot of templates, but it does that because it is designed to be modular and recursive, instead of a single monolithic procedure. The key is to find the appropriate entry points and modular templates to modify, and that lets you minimize the amount of code in your customization layer. |