XML 2002 paper on XSL-FO
This paper reflects the state of the world as of late October 2002. Since I wrote it, the following FO implementations have been released or announced:
- Adobe Document Server. Includes an FO-to-PDF component based on top of Framemaker. Doesn't implement as many features as XEP or XSL Formatter, but lets you use Frame-specific functionality from the FO, such as Frame templates. Provides some extensions, including a revision bar mechanism.
SUN's xmlroff open-source FO implementation. Written in C. Provides internationalization features, including support for right-to-left writing modes. Feature support is "almost basic conformance". Should be released Real Soon Now (just waiting for final approvals).
3B2's to-be-named "free or almost-free" FO implementation, built on top of the 3B2 formatting engine.
Antenna House announced Unix and Linux support in 1Q2003 (they are doing final release testing now).
IBM's XFC FO implementation was released as part of a larger AFP product. IBM assures me that it is under active development.
It was standing room only by the time I was done presenting my talk, which is pretty good considering I was up against Norm Walsh. From the questions I got asked, it was clear that a lot of enterprises are starting to investigate the use of XSL-FO to do sophisticated page composition.
What does fo: stand for
The abbreviation "fo" (or "FO") refers to "formatting-object", which is the general case. (Which incudes all objects defined in the XSL specification [as distinct from the XSLT specification].) Flow objects are a subset of FOs that are the objects within an fo:flow or fo:static-content object.
Bookmarks in pdf
> is there a possibility to generate pdf-bookmarks
There is no standard way that I know of. However, the XSL Formatter product from Antenna House (www.antennahouse.com) provides an extension element that lets you generate bookmarks. I would be surprised if RenderX' XEP product did not have a similar feature. I don't know about FOP.
Font size depending on section level
Nikolai at RenderX once helped me with this smart template for choosing font sizes on headers at different levels:
<xsl:template name="header-font-size"> <xsl:variable name="level" select="count(ancestor::section)"/> <xsl:attribute name="font-size"> <xsl:choose> <xsl:when test="$level=1">18pt</xsl:when> <xsl:when test="$level=2">14pt</xsl:when> <xsl:when test="$level=3">10pt</xsl:when> <xsl:otherwise>10pt</xsl:otherwise> </xsl:choose> </xsl:attribute> </xsl:template>
XHTML to XSL-FO Stylesheets
For XHTML -> XSL-FO: Antenna House
For XSL-FO -> HTML: RenderX
mixed page layout and column orphans/widows
>mixed page layout:
In XSL-FO 1.0 you can only span one column or all columns, not only some of the columns. And the act of spanning all columns reflows and rebalances previous page content across all the columns in the area before the span.
Orphans and widows are only triggered when you have more than one line in a block. If you know at XSLT time the line is too short (which you probably don't if it is typical prose text, but you probably do if it is just a heading line), you could play with the keep-with-next.within-column= property.
>vertical column balancing/distribution:
Not with XSL-FO ... you can't fill the last column to the length of the page. However, you *can* balance all the columns on the last page to be close to the same length (just not the full length of the page) by flowing an empty block at the end of your content with span="all". This will trigger the reflowing and rebalancing I describe above. Your columns might end up being 80%, 80%, 79%, instead of 90%, 90%, 70% ... but I can't help you get 90%, 90%, 90%.
≤ and ≤ refer to the same character. 8804 is in decimal notation, 2264 is in hexadecimal. That aside, the problem is that the font you are using does not include a ≤ character. Helvetica and Times neither one do. Of the standard Adobe fonts, I believe only the Symbol font has the appropriate character.
With XEP, this is relatively easy to fix. If you specify multiple font families in the fo:font-family attribute (as in font-family="Helvetica, Symbol") XEP will look down the list when it encounters a character that doesn't exist in the first font. (This is a feature they added in version 3.1, if I remember correctly.) If you do this, you'll probably also want to set the font-selection-strategy to "character-by-character" on your fo:root element. Otherwise, XEP will switch to the Symbol font for that character and all following characters (until it comes to a character not in the Symbol font.)
AFAIK, FOP does not support multiple font families in this manner. I think it just uses the first font it can find for all characters in the document.
Putting a border round a title
For XSL Formatting Objects (http://www.w3.org/TR/xsl), then it is easy.
<?xml version="1.0"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="my-page"> <fo:region-body border="thin solid blue" margin-left="1in" margin-right="1in" margin-top="1in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="my-page"> <fo:flow flow-name="xsl-region-body"> <fo:block space-before="48pt" start-indent="24pt" end-indent="24pt" space-before.conditionality="retain" border="thick solid black"> <fo:block padding-before="-24pt" font-size="32pt" font-weight="bold" text-indent="48pt"> <fo:inline background-color="white">Border Title</fo:inline> </fo:block> <fo:block font-size="24pt">This text has a title, and the title is placed on the border of the framed block that surrounds the text. </fo:block> </fo:block> </fo:flow> </fo:page-sequence> </fo:root>
Tested with RenderX XEP (http://xep.xattic.com/).
How can I give a mail:to link in XSL FO.
It is the same as with any external link, but use the mailto protocol:
... <fo:block>Please contact us at</fo:block> <fo:block> <fo:basic-link external-destination='url("mailto:email@example.com")'> firstname.lastname@example.org </fo:basic-link> </fo:block> <fo:block>for instructions on</fo:block> ...
A single click on my Windows system first brings up a browser window and the browser sees the "mailto" and brings up the mail application.
What is preflight
"preflight" is a general term of art refering to any processing or preparation need to make the files you deliver to the print house ready for printing. Most print workflows are PDF based, although some are still PostScript based.
Preflight actions can be many and varied, but include:
See Web sites like PDFZone and PlanetPDF for more information about PDF-based preflight workflows and tools.
Most preflight processing is necessitated by the use of various word processing and desktop publishing tools that give wide latitude in the details of how they produce PDFs. With a more controlled system like XSL-FO, it should, in theory, be possible for an XSL-FO implementation to produce PDFs that require no additional preflighting in order to be printer ready. However, that is not the case today, at least with the FO implementations I've worked with, and also varies depending on what your print vendor's requirements are. For example, many print vendors prefer to do color conversion and separations themselves, others do not.
Unless there is a Java-based preflight tool that I am unaware of, I don't think there is a pure-java solution to this problem at this time (my clients have essentially the same requirements).
XEP with CMYK support comes close but you also need spot color support.
Unfortunately, EnFocus PitStop, which is the clear leader in PDF preflight tools, only provides a C++ API, so getting to it from Java would be a bit of a chore.
I have successfully integrated PitStopServer into an otherwise Java-based XML-to-PDF process, but it is a very loose integration--essentially my Java code just puts things into a PitStop hot folder and then waits for the processed PDF to show up in the output folder. Crude but it works. I'm using PitStop to do RGB-to-CMYK color conversion for PDF's generated by XSL Formatter (I have a Thai language requirement which XEP cannot yet satisfy, otherwise I would have explored XEP for this customer).
Right to left or left to right? Text directionality
This should not be your problem, but the problem of the browser used to render the generated HTML documents. Modern browsers handle bidirectionality automatically; Unicode contains all data necessary to correctly display Hebrew, Arabic and other scripts.
The only case when you would want explicit direction marks is when you want something to be transcribed in an unusual direction, that is, English or Georgian in RTL, or Arabic in LTR. But it does not seem to be your case.
So just output your text without thinking about writing-modes, a good browser (Mozilla for example) will handle it just right. The same holds for XSL FO output.
Convert XSL-FO to HTML
The need for transforming from XSL-FO to XHTML means that your system data flows are not properly designed. Both these formats are data presentation. So they should be generated from the original data (XML). Otherwise the conversion might distort the data. XHTML is a web presentation while XSL-FO is a print one. There are no page-headers, footnotes, even and odd page layouts, page numbers, retrieve markers and etc in XHTML. Thus the conversion will lose the information stored in these parts of the stylesheet and you wouldn't even know about it.
I recommend writing an additional stylesheet for converting your original XML data directly to XHTML. Do not convert XSL-FO to XHTML. This is a bad design and will cause you a lot of problems for you.
A comparative Feature list for XSL-FO
Drawing horizontal or vertical rules can be done in several ways with FO. All the useful FO implementations also support the use of SVG to draw anything SVG can draw.
Antenna House XSL Formatter includes extensions for doing rounded box corners. These are not quite as complete as 3B2's but close.
> b). adjusting left, right, top and bottom margins.
Not sure what you mean by this. Page margins can be varied on a per-page basis by using different page masters, although they cannot be arbitrarily changed outside of the definition of a page sequence (that is, there's nothing you can put in a flow that would directly affect the page margins on a given page).
> c). auto generating footnotes.
XSL-FO has a footnote feature (automatic placement) and both XSL Formatter and XEP provide extensions for doing column footnotes (standard FO only provides full-page footnotes).
However, neither FO nor the commercial tools provide the full range of footnote placement features that 3B2 provides (especially through things like footnote control streams and rendition-time macros).
> d). placing figures and table near to their callouts in the running > text.
FO has some features for this, including top and side floats (but not bottom floats unless you use footnotes (and don't have any real footnotes on that page)).
Neither FO nor any implementations provide the equivalent of 3B2's anchor control streams, which let you do very sophisticated automated placement of floats.
> e). frames with unequal column widths.
Not directly. In FO columns must be of equal width.
However, in FO 1.1 you can simulate this with multiple flows. Also, both XEP and XSL Formatter provide extensions that make it possible to achieve major/minor type layouts where the minor column only holds single-page marginalia items.
> f). colour-gradient styles.
You can use any background graphic, including gradients. You can also use inline or external SVG to do gradients (assuming your FO implementation supports SVG gradients).
> g). recto-verso style variation etc.
Yes, you can have different page masters for even and odd pages, and so on. FO 1.0 fails to provide inside/outside options for some items, but this has been corrected in FO 1.1. Both XEP and XSL Formatter provide extensions for inside/outside placement.
> Does XSL:FO fullfil such requirements in electronic typesetting.
This is always a difficult question to answer because it is highly dependent on the details of the requirements for a particular publication or class of publication.
Because XSL-FO systems tend to be so much less expensive than systems built using proprietary composition systems, the better question is often "can I reduce my requirements to the set supported by XSL-FO and its implementations in order to recognize the savings and benefits of using XSL-FO?" If the answer is "yes" then FO is the answer. If the answer is "NO" then it's not. Of course that still leaves open the question of what XSL-FO can or can't do. This is an easy question to answer in the context of specific layout sample, hard to answer generically.
That all said, FO was always targeted at the requirements of technical documentation and not high-end documents like textbooks, magazines, and the like.
In Innodata Isogen's practice as a development of publishing systems and provider of composition services, we have found that the FO standard and current FO implementations, while quite powerful and clearly appropriate for most technical documentation applications, are simply not up to the task of rendering more demanding publications such as textbooks and magazines, what I tend to refer to as "highly-designed documents".
Also, all XSL-FO implementations are geared for lights-out, batch operation. This means that there is really no opportunity for interactive modification of the rendered result the way there is in 3B2, Quark, InDesign, or XPP. While you could, in theory, generate XSL-FO instances and then tweak them, there are no user interfaces for doing this.
Also, the abstract XSL-FO processing model explicitly lacks feedback from the pagination stage back to the FO generation stage, which means that there are no features in XSL-FO for doing what FO calls "layout-driven" formatting, that is formatting that depends on knowledge of where a given object falls on a page relative to other objects. This kind of feedback can be implemented using a two-pass process, but I don't know of anyone whose done this in any general way (Ken Holman has published some work he's done to do a sort of 1/2 send pass to do index generation).
So if you have requirements that are defined in terms of where something falls in the pagination stream or in terms of how much space it takes relative to other things, then XSL-FO is probably not going to work, at least not without significant additional implementation effort.
Also, XSL-FO provides few features for automatic copyfitting, which is something that tools like 3B2 can do reasonably well. XSL Formatter provides some "make it fit" features but they are limited compared to 3B2's copyfitting features.
In the interest of full disclosure, I should mention that Innodata Isogen is currently developing and marketing a new composition offering built around what we are calling the "tool-agnostic layout system" (TALS), which uses a generic style sheet mechanism to then generate renderers that can then generate the input into different composition systems, including 3B2 and similar systems.
The style sheet mechanism is proprietary to Innodata Isogen but strongly informed by XSL-FO and intended to be a completely neutral repository of all the formatting information for a given schema-to-layout-design binding. While the style sheet design is proprietary we are treating it as though it were a standard--that is, our business intent is not to achieve propietary lock-in of our customers but to provide them with a system that has the characteristics of a standard, namely providing a neutral data format that protects them from the downstream tools as much as possible and lowers the overall cost of developing XML-based publishing systems (by reducing engineering costs, enabling practical re-use of style specifications, and automating composition with high-end composition tools as much as possible (reducing the amount of hand work needed to paginate documents).
Clients of this system own their style definitions and therefore have the right to use a different implementation (we see our secret sauce being the implementation of the renderer generators, not the style sheet language itself). As in the XSL-FO market, we should be competing on value, not trying to develop a proprietary monopoly.
We originally developed this approach in order to serve the needs of our own professional services practice--we wanted to eliminate duplication and redundancy in the development of format analysis reports and the XSLT transforms that come out of them, but quickly realized that there was a large opportunity to serve additional needs of publishers. Our orignal plan to was to generate FO renderers (that is, renderers that generate XSL-FO output, which is the bulk of what we develop in our professional services practice). However, we got early interest from publishers that were trying to get control over their use of 3B2-based composition vendors to compose publications from XML source. We realized that if we could better automate the generation of the input into tools like 3B2, we could help publishers get more control over their composition process, get more consistency in their results, and, hopefully, lower the overall cost of publishing a given title, and, possibly, reduce the time it takes to produce a publication (by significantly reducing the time required to go from manuscript to final pages).
Note that the intent of this system is absolutely *not* to compete with existing composition tools, whether XSL-FO-based or proprietary. Rather, we are trying to make it easier for clients to use different composition tools and lower the cost of getting from XML to published pages, which, we hope, will increase the market for composition tools (and, as a side effect, encourage vendors of proprietary composition systems to compete more on value (which they already do of course, but there is significant proprietary lock-in for a tool like 3B2 or InDesign or XPP once you've made the investment in skills and tools and data for using it).
In the context of XSL-FO systems, the only potential downside from our system is that we might lower the cost (both in dollars and in risk of proprietary lock in) of using high-end compositions systems to the point where they become competitive with XSL-FO-based systems where the XSL-FO system would otherwise satisfy the composition requirements. It's certainly not our intent to develop a technology disruptive to the XSL-FO market but it may be an unavoidable consequence. Of course, this might also drive the FO development community to work harder at extending the FO specification so that it is more applicable to high-end composition needs....
Our initial implementation efforts have been on generating 3B2 input, which is why I happen to know pretty much precisely how XSL-FO and its implementations relate to the specific features of 3B2.
Changing font size
<xsl:template match="para"> <fo:block xsl:use-attribute-sets="normal.para.spacing"> <xsl:attribute name="font-size"> <xsl:choose> <xsl:when test="@size = 'Less'">85%</xsl:when> <xsl:when test="@size = 'Small'">75%</xsl:when> <xsl:when test="@size = 'Smaller'">65%</xsl:when> <xsl:otherwise>100%</xsl:otherwise> </xsl:choose> </xsl:attribute> <xsl:call-template name="anchor"/> <xsl:apply-templates/> </fo:block> </xsl:template>