For some uses, initial validation checks are required. Unless these checks are OK, it is sometimes not worth carrying on with the remainder of the validation. Schematron addresses this requirement by using phases. For instance if our input document has no chapters, then it seems of little value to start checking all the (non-existant) chapters. So a simple early phase check would be to ensure that the root element has at least one chapter. That could be the first phase of validation. For very complex validation processes, this can save time by reducing the analysis of output records. The standard describes this usage of phases as progressive validation, which is a good description of what is happening
As an example of this see Example 6.1, “A Schematron file showing two phases ” which shows a Schematron file with two phases. The first phase carries out document level checks, in this case ensuring that the document element has a title and isbn element as children.
Example 6.1. A Schematron file showing two phases
<?xml version="1.0" encoding="iso-8859-1"?> <iso:schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:iso="http://purl.oclc.org/dsdl/schematron" xmlns:sch="http://www.ascc.net/xml/schematron" queryBinding='xslt2' schemaVersion="ISO19757-3" defaultPhase='#ALL' > <iso:title>Test ISO schematron file. Introduction mode </iso:title> <phase id="docs" > <active pattern="doc.checks"/> </phase> <phase id="chaps"> <active pattern="chap.checks"/> </phase> <iso:pattern id="doc.checks" > <iso:title>checking an XXX document</iso:title> <iso:rule context="doc"> <iso:report test="chapter">Report date.<iso:value-of select="current-dateTime()"/></iso:report> <iso:report test="title and isbn" >Report for book with ISBN <iso:value-of select="isbn"/></iso:report> </iso:rule> </iso:pattern> <iso:pattern id="chap.checks"> <iso:title>Basic Chapter checks</iso:title> <iso:p>All chapter level checks. </iso:p> <iso:rule context="chapter"> <iso:assert test="title">Chapter should have a title</iso:assert> <iso:assert test="count(para) >= 1">A chapter must have one or more paragraphs</iso:assert> <iso:assert test="*[1][self::title]"><iso:name/> must be have title as first child </iso:assert> <iso:assert test="@id">All chapters must have an ID attribute</iso:assert> </iso:rule> </iso:pattern> </iso:schema>
In the document element, the attribute defaultPhase is set to the (case sensitive) string #ALL. This ensures that if nothing else is specified from the command line parameters, then all phases are run. This is generally a sensible fallback position. The overall objective is to enable runtime flexibility. When I want to specify a particular phase, I can do so via the command line parameter. This should be set as follows.
java -mx250m -ms250m -cp \
.;\myjava;\myjava\saxon8.jar;\myjava\xercesImpl.jar net.sf.saxon.Transform \
-x org.apache.xerces.parsers.SAXParser -w1 -o tmp.xsl \
%1.sch iso_svrl.xsl "generate-paths=yes" "phase="docs"
This passes the parameter 'phase' through to
iso_svrl.xsl
which is then used by that
stylesheet to select one of the phases specified in the Schematron
file. Just below the title
element are the two
phases in this example. The first does document level checks, the
second does chapter level checks. Make sure you don't confuse the pattern
id values when specifying the phase,
that is not what is being referenced. The values (in this instance)
which are possible are either docs or
chaps. In the modified command line
above the docs phase is
selected.
In order to improve these tests, the input file looks like Example 6.2, “An input file to use. input.phases.xml ”
Example 6.2. An input file to use. input.phases.xml
<?xml version="1.0" encoding="utf-8" ?> <doc> <title>Book title</title> <isbn>12345678901</isbn> <chapter id="c1"> <title>chapter title</title> <para>Chapter content</para> </chapter> <chapter id="c2"> <title>chapter title</title> <para>xx</para> <para>yy</para> <para>zz</para> </chapter> <chapter id="c3"> <para>Invalid first child of chapter</para> <title>chapter title</title> <para>xx</para> <para>yy</para> <para>zz</para> </chapter> </doc>
The only addition is the document title and an ISBN number. These are used as part of the document level checks.
Running these together produces an output like Example 6.3, “The output file running the docs phase ”"
Example 6.3. The output file running the docs phase
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:sch="http://www.ascc.net/xml/schematron" xmlns:iso="http://purl.oclc.org/dsdl/schematron" xmlns:xsd="http://www.w3.org/2001/XMLSchema" title="Test ISO schematron file. Introduction mode " schemaVersion="ISO19757-3" phase="docs"> <svrl:active-pattern name="checking an XXX document" id="doc.checks"/> <svrl:fired-rule context="doc"/> <svrl:successful-report test="chapter" location="/doc[1]"> <svrl:text>Report date.2007-01-23T11:36:16.546Z</svrl:text> </svrl:successful-report> <svrl:successful-report test="title and isbn" location="/doc[1]"> <svrl:text>Report for book with ISBN 12345678901</svrl:text> </svrl:successful-report> </svrl:schematron-output>
The output report
element indicates
the ISBN of the document being tested. Note that the phase in use is
not reported in the output? In order to run all phases, simply omit
the phase command line parameter.
So now you can add more phases, select them from the command line … and generally be more selective in your Schematron validation. With additional control (say from a script or a Java program), the phases could be progressively run to fully validate the input document.