Table of Contents
Having established that Xproc is all about steps and connecting them, it is worth spending time understanding how they are made, what defaults and implied connections exist and how the implementor fits in with external connections.
James puts it like this
For Xproc XML data "flows" into a pipeline and between its steps through a series of connected ports. The two most common and important ports are the primary input port (usually called "source") and the primary output port (usually called "result"). That sets up the model to use when thinking about XML data (not binary, not anything else, this is an XML pipeline) and its processing by the steps of your pipeline
It is likely, though not required, that any implementor will enable the primary input and output to be specified via the interface to the program, whether this is a command line parameter or via a GUI. That will provide an input and output connection at the two outer edges. Except you need to be aware that these are named source and result.
Calabash does this using the -i and
-o switches. For example, to pass in to the
pipeline as primary input document input.xml, the switch would be
$calabash.sh -i source=input.xml somepipeline.xpl
This would make an association between that input file and the input port whose name is source.
Similarly, if the final output of a pipeline is required to be
written to some file named output.xml, then for
Calabash the command line might be
$calabash.sh -i source=input.xml -o result=output.xml somepipeline.xpl
For those used to Unix terms, stdin becomes the default source for the main input. The final output is delivered to stdout. These are the defaults unless you mess with them. That seems to make sense and is implied, i.e. you don't need to specify that in your pipeline
If the requirement is to specify absolutely, within the pipeline document, the external input and output documents, another technique must be used. For the primary input (the equivalent of stdin) use
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc">
<p:input port="source">
<p:document href="doc4.xml"/>
</p:input>
For this step, this declares that input will come from an
external document (doc4.xml). However, don't be
tempted to do the same for output! To specify
op.xml as the output of the pipeline, use
<p:store href="op.xml" />
No, it isn't obvious is it. It will become clearer as you gain more experience of using Xproc, honest!
![]() | Important |
|---|---|
p:input and p:output are not symmetrical! Just keep that in mind! |
Example 3.1 shows this in use for an identity step.
Example 3.1. Explicit input and output
That is how connections are specified between the pipeline and
the environment. Remember that the href
attribute can take a URI, so you can use this to draw on data from the
internet.
The CR says
All
p:pipelinepipelines have an implicit primary input port named “source” and an implicit primary output port named “result”. Any input or output ports that the p:pipeline declares explicitly are in addition to those ports and may not be declared primary.
Whereas, for a p:declare-step element,
A
p:declare-stepprovides the type and signature of an atomic step or pipeline. It declares the inputs, outputs, and options for all steps of that type.
The implications of this I found rather subtle. Since as a very general rule we can use either, when should choose one over the other?
Note that the former has implicit input and output ports,
whereas declare-step may provide them? So, if you want to specify an
input URI or file, then you must use your implementers way of getting
XML to the source port of the pipeline. Likewise if you want to
specify a fixed output URI/file, you must use your implementers
method. Example 3.2 shows the use of a
pipeline with calabash to collect the system properties.
Example 3.2. Using a pipeline with parameters from the application
<?xml version="1.0"?> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"> <!--<p:input port='mysource' primary="false"><p:document href="sysprop.xml"/> </p:input> --> <p:string-replace match="/doc/episode/@value"> <p:with-option name="replace" select="concat('"',p:system-property('p:episode'), '"')"/> </p:string-replace> <p:string-replace match="/doc/language/@value"> <p:with-option name="replace" select="concat('"',p:system-property('p:language'), '"')"/> </p:string-replace> <p:string-replace match="/doc/product-name/@value"> <p:with-option name="replace" select="concat('"',p:system-property('p:product-name'), ' "')"/> </p:string-replace> <p:string-replace match="/doc/product-version/@value"> <p:with-option name="replace" select="concat('"',p:system-property('p:product-version'), '"')"/> </p:string-replace> <p:string-replace match="/doc/vendor/@value"> <p:with-option name="replace" select="concat('"',p:system-property('p:vendor'), '"')"/> </p:string-replace> <p:string-replace match="/doc/vendor-uri/@value"> <p:with-option name="replace" select="concat('"',p:system-property('p:vendor-uri'), '"')"/> </p:string-replace> <p:string-replace match="/doc/version/@value"> <p:with-option name="replace" select="concat('"',p:system-property('p:version'), '"')"/> </p:string-replace> <p:string-replace match="/doc/xpath-version/@value"> <p:with-option name="replace" select="concat('"',p:system-property('p:xpath-version'), '"')"/> </p:string-replace> <p:string-replace match="/doc/psvi-supported/@value"> <p:with-option name="replace" select="concat('"',p:system-property('p:psvi-supported'), '"')"/> </p:string-replace> </p:pipeline>
Norm suggests, if you happen to want a pipeline that:
| 1. Has a single non-sequence input |
| 2. Has a single non-sequence output |
| 3. And allows parameters |
then p:pipeline is a convenient syntactic shorthand for the p:declare-step that would provide the same features.
We expect this to be a common case, so I'd probably suggest that most people start with p:pipeline most of the time.
If you want to be more explicit (have a little more control),
then use p:declare-step
That provides the external links to the pipeline. So how to do the same from within a pipeline? The implication is that the pipeline will do a fixed job, hence doesn't need any command line parameters. I think this needs explaining because of the defaults in place, put there for good reason I'm sure, good time saving devices; but I found them confusing at first.
As a good example of making good use of the defaults, take a look at example 1 in the CR/REC (duplicated here for convenience)
Example 3.3. A linear pipeline example
<p:declare-step
xmlns:p="http://www.w3.org/ns/xproc"
name="xinclude-and-validate">
<p:input port="source" primary="true"/>
<p:input port="schemas" sequence="true"/>
<p:output port="result">
<p:pipe step="validated" port="result"/>
</p:output>
<p:xinclude name="included">
<p:input port="source">
<p:pipe step="xinclude-and-validate"
port="source"/>
</p:input>
</p:xinclude>
<p:validate-with-xml-schema
name="validated">
<p:input port="source">
<p:pipe step="included"
port="result"/>
</p:input>
<p:input port="schema">
<p:pipe step="xinclude-and-validate"
port="schemas"/>
</p:input>
</p:validate-with-xml-schema>
</p:declare-step>
Main source - defaults to stdin | |
An ancilliary input, for the xsd schema | |
An output, from the result port of a step called 'validated' | |
The xInclude step takes its input from the pipeline (check the name?) hence from stdin | |
the validation step takes its input from the result port of the 'included' step (note this is a pipe connection between two steps) | |
This is an ancilliary input for the schema, taking its input from the schemas port of the pipeline |
It's worth spending time getting your head round that one. Draw out the steps and the connections between them if it helps, or talk them through with yourself. You'll get the feel of it after a while. It's just strange at first.
Figure 3.1 shows this graphically
Notice from this the explicit connections in the pipeline, shown in the diagram.
This pipeline could have been written less verbosely, but it is nice to see how explicit connections can be named and used in more complex pipelines. Xproc CR shows this in the abbreviated form, using all defaults.
James Sulak has a post on his site which gives another view of internal connections
Given a pipeline where everything uses defaults, flowing one into another and finally to the default result port, it may become necessary to interrupt that flow for some reason. In order to 'bridge' the gap produced, it is necessary, on one step, to create a link back to a previous step manually.
This technique may also be used to link back to earlier steps. The general principle is shown below
pipeline
step1
step2
(insert wanted here)
step3
becomes
pipeline
step1
step2
xxxxx - inserted step
step3
input port='source'
pipe
step='step2' (or some earlier step)
port='result'
This creates a pipe back from within step 3, over the inserted step back to step 2. This is illustrated in Figure 3.2
Syntactically this is shown in Example 3.4
Example 3.4. Bridging across steps
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" name="props">
<p:input port="source" kind="document" >
<p:document href="sysprop.xml"/>
</p:input>
<p:input port="parameters" kind="parameter" primary="true"/>
<p:variable name="product-name" select="'Fribble Widgets'" />
<p:string-replace match="/doc/episode/@value">
<p:with-option name="replace"
select="concat('"',p:system-property('p:episode'),
'"')"/>
</p:string-replace>
<p:string-replace name="sr3"
match="/doc/language/@value">
<p:with-option name="replace"
select="concat('"',
p:system-property('p:language'),
'"')"/>
</p:string-replace>
<!-- This 'breaks' the default flow -->
<p:identity/>
<p:string-replace match="/doc/product-name/@value" >
<p:with-option name="replace"
select="concat('"',
p:system-property('p:product-name'),
'"')"/>
<!-- This re-joins the link,
between step sr3 and this step -->
<p:input port="source">
<p:pipe
step="sr3"
port="result"/>
</p:input>
</p:string-replace>
<p:store href="op.xml"/>
</p:declare-step>
The earlier step needs an identifying | |
The | |
Within the succeeding step, add an input specification,
using | |
Select the approprate step as the input to this one, | |
And select the |
This shows how a step is connected to one other than the immediate preceding-sibling step.
The basic statement for Xproc is that inputs precede steps and steps flow content sequentially from one to another in the sequence in which they are written. To model a pipeline of 3 steps
pipeline
step1
step2
step3
/pipeline
Without explicit connections, the input from outside the pipeline is connected to the pipeline top level input (source port) which in turn is connected to the first step (step1) source port. The result port of step1 is connected to the source port of step 2 and so on, until the output (result port) of step3 is connected to the pipeline top level output which then shows as stdout on the diagram.
These two terms that are used are primary readable input (source port) which is the pipeline main input and top level output port which shows as stdout on the diagram above.
If this default action matches your needs, then you don't need to add explicit connections. Simply place the steps in an appropriate sequence within the pipeline.
One final word on connections. It is an error to create a loop (see Xproc 2.4) by connecting inputs and outputs such that a loop is created.
James blogged about default and explicit links with a good example.
This is a collation of information I obtained when I tried
to understand how input ports and parameter ports are used for p:xslt.
Of note is the fact that a parameter port is a special kind of port. It has different binding rules from input ports
Firstly don't mix up "Parameter port" which is like a pipe and "parameter" which is data travelling inside the pipe
For example, taking the xslt 'step'
Example 3.5.
<p:declare-step type="p:xslt" xml:id="xslt">
<p:input port="source" sequence="true" primary="true"/>
<p:input port="stylesheet"/>
<p:input port="parameters" kind="parameter"/>
<p:output port="result" primary="true"/>
<p:output port="secondary" sequence="true"/>
<p:option name="initial-mode"/>
<p:option name="template-name"/>
<p:option name="output-base-uri"/>
<p:option name="version"/>
</p:declare-step>
All the inputs are 'required'. Although some can be defaulted, although the conditions under which that happens isn't straightforward.
The primary data port is first The stylesheet data port is next The parameter port is third.
![]() | Note |
|---|---|
Note. The primary parameter input of the step, whilst not marked as primary, is the only parameter input port on the step, hence it becomes implicitly the primary parameter port! |
The Xproc processor always attempts to find a default binding based on so called "default readable" port. For normal inputs, it is usually the output of the preceding step, but for parameter input ports it is the primary parameter input port of the containing pipeline. If that container is the pipeline... then the association is implementor defined!
If the processor fails to find or manufacture a default binding, you will get an error.
![]() | Important |
|---|---|
Important: Even if you don't want to use parameters, you must satisfy the requirement for sourcing the parameter port! |
Note, from Xproc
If no binding is provided for a primary input port, the input will be bound to the default readable port. It is a static error (err:XS0032) if no binding is provided and the default readable port is undefined.
With regards to p:xslt, you don't technically have to have any child elements as long as the default readable port is defined and you don't mind if all your inputs are bound to that.
If you leave a parameter input port unbound, there are default rules for that too. And if there's nothing for the default to bind to, that's an error.
So, one of the following must be true:
You declared a parameter input port on your top-level pipeline.
You used p:pipeline to declare your top-level pipeline (this satisfies point 1 by default)
You provided an explicit binding for the 'parameter' input port on your p:xslt step.
Here's how it works for parameter ports.
If you don't specify a binding for the 'parameter' port, then it binds by default to the parameter port of the pipeline that contains it. This way, parameters you pass to the pipeline automatically get passed to the steps that can use them.
If there is no binding for a parameter input port on the top level pipeline (the one that you start executing first), then it effectively is bound to an empty sequence.
Parameter input ports always accept a sequence, so if you don't pass any documents to it, that's just an empty sequence. But that's not exactly the same as binding it to p:empty.
If you declare your pipeline with <p:pipeline>, you get a parameter input port by default and things "just work".
If you declare your pipeline with <p:declare-step>, then you have to either remember to provide a parameter input port explicitly:
<p:declare-step ...> <p:input port="parameters" kind="parameter"/> <p:input port="source"/> ...
Or you have to remember to explicitly provide a binding when you use the XSLT step:
<p:xslt>
<p:input port="parameters">
<p:empty/>
</p:input>A step could define more than one parameter input port (some standard steps do). The defaulting rules for the primary parameter input port (if there is one) and the non-primary ones are a little different. The primary one gets bound back to the pipeline parameters; the non-primary ones just get an empty sequences if undefined
So what about the primary parameter input port of the containing pipeline? If there is no binding there, is it bound to an empty sequence, or not?
It's bound to whatever the implementation decides to bind it to. How inputs are connected to XML documents outside the pipeline is implementation-defined.
In Calabash, if you pass a binding for that port on the command line, that's what it gets bound to. If you pass parameters on the command line, Calabash manufactures a c:parameter-set with those parameters and that's what it gets bound to. If you do neither of those, it gets bound to an empty sequence.