Chapter 2. Xproc basics

Revision History
Revision 0.12008-12-05T16:31:22ZDave Pawson
Initial Issue

Table of Contents

Basics

Basics

Hello World

So what is the hello world of Xproc? Well, ndw provides a very simple pipeline which reads

Example 2.1. Hello World

  <?xml version="1.0"?>
<p:declare-step 
 xmlns:p="http://www.w3.org/ns/xproc">          1
 <p:input port="source">                        2
  <p:inline>
     <doc>                                      3 
       Congratulations! You've run your first pipeline!
     </doc>
  </p:inline>
 </p:input>
 <p:output port="result"/>                       4

<p:identity/>                                    5
</p:declare-step>

1

A 'step' in the Xproc namespace

2

input specification - here included inline

3

The input content

4

The output specification

5

This is the identity step


And that's it! The input (to the pipeline) is ... nothing. p:input both specifies and provides the input to this pipeline. The output is a simple piece of XML as per the step. p:output specifies that this is the output. Except that it doesn't output 'Hello World'. Feel free to right that wrong if you so wish!

In order to use Calabash, I downloaded it from here, then followed that by adding saxon9-s9api.jar to the classpath. That gave me a script (calabash.sh)something like

#!/bin/bash
# 
cp=/files/xproc/calabash/lib/calabash.jar: \
   /myjava/saxon9-s9api.jar

if [ $# -lt 1  ]
  then
    echo usage "$0 params"
    echo Where params are the calabash parameters
  exit 2

fi

java -cp $cp com.xmlcalabash.drivers.Main $*


or for Windows

set cp=/files/xproc/calabash/lib/calabash.jar; \
   /myjava/saxon9-s9api.jar

java -cp %cp com.xmlcalabash.drivers.Main %*

which works for me.

Run that, with


  $ calabash.sh pipe.xpl

(or  for Windows)
>calabash.bat pipe.xpl

... and you should see an output of

<doc>
Congratulations! You've run your first pipeline!
</doc>

Which is nearly as good as Hello World. For Calabash, the documentation is a little sparse. Try

  $calabash.sh --help

to see all the command line options. I'll leave it at that for the moment. Now back to Example 2.1.

Looking at the pipeline as a black box there is no input (it is provided explictly with the p:inline). The output is a well formed XML instance and the step is the identity step which simply makes a copy of its input available on its output. Those elements form the basis of Xproc. An input, a step and an output. Now to take this just a little further.

The input in Example 2.1 is defined internally to the pipeline. The next obvious step is to take input from an external file. Example 2.2 pipe2.xpl shows this

Example 2.2. A pipeline with input from an external entity

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc">
  <p:input port="source" primary='true'/>  1
  <p:output port='result'/>
  <p:identity/>
</p:declare-step>  

1

Source specification


The only difference is that the simple XML instance is now an external file. It has similar content to the p:inline content in Example 2.1. The step action is the same, the identity step, copying the input to the output. The other difference is that Calabash will be driven in a slightly different way, to cause the output to be directed to a new XML file. The command to do this is

  $calabash.sh -i source=doc.xml -o result=op.xml pipe.xpl

This specifies doc.xml as the input, with a port name of 'source', and links the output port 'result' to a named file, op.xml.

This causes Calabash to link the source port of the pipeline to the external document and to connect the result port of the pipeline to the output file op.xml

Running that command should produce an output file which is identical to the output from the earlier exercise

Take a look at associating ports with documents in the CR. There you can see how the link is made. Example 2.2 shows one example, Example 2.3 (pipe3.xpl) shows a variant on this.

Example 2.3. A pipeline with input from an external entity

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc">
  <p:input port="source" primary='true'>  1
   <p:document href="doc.xml"/>
  </p:input>

  <p:output port='result'/>
  <p:identity/>
</p:declare-step>  

1

Source specification now references a URI


In this example, the source is specified explicitly within the pipeline hence the command line parameters to calabash are modified accordingly.

  $ calabash.sh  -o result=op.xml pipe2.xpl

The output is now written to the file op.xml.

These two are the inputs and outputs of the pipeline. By naming other ports and associating them with URIs (or local files), further XML instances can be brought into play.

This introduces the vocabulary of Xproc. A step executes some processing of an input port to produce an XML output at another port (most commonly the result port).

Any step can obtain its input from a port specified in one of the four ways mentioned in the CR. By a URI (or file), by referencing the output of another step (called the source), then the (almost) trivial but useful options of specifying that the input port is 'empty', or by specifying some XML inline, as per the first example, Example 2.1

A step is named using an attribute name, so for an identity step named 'identity' we might see

  <p:identity name="identity">
  <p:input port="source">
    <p:document href="doc.xml"/>
  </p:input>
  <p:output port="result"/>
</p:identity>

This shows a step which may be referenced by its name in another step, as per Example 2.4.

Example 2.4. Utilising another steps name

  <p:xinclude name="xinclude1">
  <p:input port="source">
  <p:pipe step="identity" port="result"/>
  </p:input>
</p:xinclude>


This shows a step identified as 'xinclude1' taking its input from the 'result' port of another step named 'identity'. This is the way in which steps are connected.

The less obvious step which can be named is the pipeline itself. This can take a name attribute, which can be referenced when connecting a step to the top level input.

The remaining aspect of Xproc basics is to connect two steps using the named port as above. This is done in Example 2.5 where a simple XSLT transform is added to the identity step

Example 2.5. Two steps

<p:declare-step 
       xmlns:p="http://www.w3.org/ns/xproc">
 <p:input port="source">
   <p:document href="doc4.xml"/>
  </p:input>
<p:identity/>                    1
    <p:store href="op.xml"  />   2
</p:declare-step>

 

1

Source specification

2

The identity step sends its output to the result port, where the p:store instruction connects the result to a named file

Note that p:input and p:output are asymmetrical! In order to connect the output to a named file (or URI) the p:store instruction is used.

Of note here is the sequential ordering of the contents of the step.The actual process (the identity step) follows the input definition and precedes the p:store element. This defines the implied flow

[Note]Note

The default, implied flow of a step follows the sequential ordering of a step unless otherwise controlled

More on this subject in Connections