Pretty Printing XML

1. Indent XML doc or Pretty Printing XML
2. Pretty printing, second version
3. XML Pretty Printer
4. How can I reindent an XML-file for readability?
5. DTD pretty printing

1.

Indent XML doc or Pretty Printing XML

Joshua Allen and Eric van der Vlist

Has anyone built an XSLT transform that indents an XML file based on spaces (e.g. 3 spaces per level)? I'm just confusing myself here. The closest I have come is modifying the identity transform to be as included below:

Problem is, this indents only opening tags, not closing tags, and doesn't remove any indents that might already be in the source document.

P.S. The reason I'm doing this is so I can include a file (pretty.xslt) with my XSLT test tool and allow pretty (indented but not different infoset) formatting of the text. I could write some code to do it, but if it can be done as an external XSLT file it makes things easier to customize.

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" />
<xsl:variable name="indent" select="3" />
<xsl:variable name="spaces" select="'                 '" />


<xsl:template match="*">
   <xsl:variable name="varspaces"
select="substring($spaces,1,count(ancestor::node()) *$indent)"/>
   <xsl:value-of select="$varspaces" />
   <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:choose>
	<xsl:when test="*">
          <xsl:apply-templates />
   	  <xsl:value-of select="concat('&#x0A;   ', $varspaces)" />
	</xsl:when>
	<xsl:otherwise>
	  <xsl:value-of select="normalize-space()"/>
	</xsl:otherwise>
      </xsl:choose>
   </xsl:copy>
</xsl:template>

<xsl:template match="comment()|processing-instruction()">
   <xsl:copy />
</xsl:template>

</xsl:stylesheet>

Nikolai Grigoriev offers

Try the following code. It differs only slightly from yours, yet indents both starting and closing tags and has no limits on element nesting depth.

  
  <xsl:stylesheet version="1.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml"/>
  <xsl:param name="indent-increment" select="'   '" />
  
  <xsl:template match="*">
     <xsl:param name="indent" select="'&#xA;'"/>
  
     <xsl:value-of select="$indent"/>
     <xsl:copy>
       <xsl:copy-of select="@*" />
       <xsl:apply-templates>
         <xsl:with-param name="indent"
              select="concat($indent, $indent-increment)"/>
       </xsl:apply-templates>
       <xsl:value-of select="$indent"/>
     </xsl:copy>
  </xsl:template>
  
  <xsl:template match="comment()|processing-instruction()">
     <xsl:copy />
  </xsl:template>
  
  <!-- WARNING: this is dangerous. Handle with care -->
  <xsl:template match="text()[normalize-space(.)='']"/>
  
</xsl:stylesheet>

2.

Pretty printing, second version

John Mongan

I made a minor improvement to one of the examples and I hoped I could contribute it as thanks for all the useful information I've found. The above offers a pretty printer for XML. I found it to work quite well, with one flaw: it changes empty elements from single tag empty elements to two tag empty elements. For instance:



      <foo/>
becomes
      <foo>
      </foo>

This is easily solved by adding an if to the part that indents the close tag. Here is my modified version:


   <xsl:stylesheet version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml"/>
   <xsl:param name="indent-increment" select="'   '" />

   <xsl:template match="*">
      <xsl:param name="indent" select="'&#xA;'"/>

      <xsl:value-of select="$indent"/>
      <xsl:copy>
        <xsl:copy-of select="@*" />
        <xsl:apply-templates>
          <xsl:with-param name="indent"
               select="concat($indent, $indent-increment)"/>
        </xsl:apply-templates>
        <xsl:if test="*">
          <xsl:value-of select="$indent"/>
        </xsl:if>
      </xsl:copy>
   </xsl:template>

   <xsl:template match="comment()|processing-instruction()">
      <xsl:copy />
   </xsl:template>

   <!-- WARNING: this is dangerous. Handle with care -->
   <xsl:template match="text()[normalize-space(.)='']"/>

</xsl:stylesheet>

3.

XML Pretty Printer

Etienne Posthumus, Oliver Becker

I use Tidy, great piece of software. Can be found at: tidy

Oliver Becker Adds, grinning,

It's not Tidy, it's XSLT :-) Take a copy stylesheet with indent="yes".

<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes" /> <!-- here's the trick -->

<xsl:template match="*">
   <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
   </xsl:copy>
</xsl:template>

<xsl:template match="comment()|processing-instruction()">
   <xsl:copy />
</xsl:template>

</xsl:stylesheet>

Saxon, LotusXSL and Oracle do the job quite well. XT doesn't indent.

4.

How can I reindent an XML-file for readability?

Michael Kay


> I think that there must be a simple XSL to reindent an XML 
> for readability
<xsl:stylesheet ...>
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
  <xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

Thomas B. Passin suggests:

Use HTMLTidy. When set for xml input and output, it will do a nice job of pretty-printing. You would post-process your transform with Tidy.

Tidy is available on the W3C web site, www.w3c.org. It is used in several very good editing applications, including HTML-kit and XML Cooktop. With these, you can run it in an editing environment instead of from a command line if that's easier for you.

5.

DTD pretty printing

Warren Hedley

Here's my entry to the (unofficial) XSLT Hall of Shame, for posterity. A set of stylesheets that can "pretty print" the text of a DTD that has been preprocessed by escaping all less-thans. I'm confident this must rank pretty highly amongst the most inappropriate uses for XSLT out there.

The documentation and download is here:

If you want to skip straight to some examples, here's the XHTML 1.0 Transitional DTD as HTML and PDF (the latter generated from LaTeX)

http://www.physiome.org.nz/xslt_tools/dtd_pretty_printer/xhtml _1_transitional_dtd.html

http://www.physiome.org.nz/xslt_tools/dtd_pretty_printer/xhtml _1_transitional_dtd.pdf

The documentation (particularly the comments in the stylesheets themselves) are somewhat sparse -- someday I may get around to adding more.

Note that an XML Pretty Printer is also available on this website -- it's similar to Oliver Becker's (I think we were developing them at the same time), but mine has some additional formatting features.

The terms of use are completely open, so feel free to go ahead and use these yourself.

If anyone's wondering "why?", these tools were developed as a means to decently document another language I'm working on: CellML. The final version of the specification for that will be out shortly. The current spec, from 18 May, is available in it's entirety here: