In order to start processing files using Schematron, you're going to need a few files on your system. You will find the XSLT files on Schematron.com. The ones you want are:
iso_svrl.xsl |
iso_schematron_skeleton.xsl |
The remaining files you can type in, adding to them them as needed. Our file needing validation starts off very simply. It has no DTD or Schema (we have Schematron!). It represents a book. Quite boring and very simple. Example 2.1, “File input.xml, the simplest input document ” shows this file. We can add complexity when we need it to show Schematron features. So let's look for the constraints we want to apply.
Example 2.1. File input.xml, the simplest input document
<?xml version="1.0" encoding="utf-8" ?> <doc> <chapter id="c1"> <title>chapter title</title> <para>Chapter content</para> </chapter> <chapter id="c2"> <title>chapter 2 title</title> <para>Content</para> </chapter> <chapter id="c3"> <title>Title</title> <para>Chapter 3 content</para> </chapter> </doc>
Now for the constraints. What rules do we want to apply? As you may imagine, I'm going to pick some that may be odd, primarily to demonstrate the functionality of Schematron. I'll try and keep them reasonably sensible.
The first rule is to check that each chapter has a title. Before defining that rule in the Schematron file we need to know something of the outline Schematron file that will be used in all the examples.
Since this file is testing the file input.xml
, I'm
going to name it input.sch
. Example 2.2, “File input.sch, an empty Schematron file.” shows this file. I'm using .sch as the filename
extension simply as a reminder that it is a Schematron file.
Example 2.2. File input.sch, an empty Schematron file.
<?xml version="1.0" encoding="utf-8"?> <iso:schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:iso="http://purl.oclc.org/dsdl/schematron" xmlns:dp="http://www.dpawson.co.uk/ns#" queryBinding='xslt2' schemaVersion='ISO19757-3'> <iso:title>Test ISO schematron file. Introduction mode</iso:title> <iso:ns prefix='dp' uri='http://www.dpawson.co.uk/ns#'/> <!-- Your constraints go here --> </iso:schema>
The general heading for a Schematron file. Note the namespaces in use. Add them as normal for any XML file. | |
The required constraints go in the body of the file | |
Do you think Schematron knows all about your namespaces? No. For each one specific to you, that you need, add it here as a |
Not much to look at. The document element is in the schematron namespace. The Schematron namespace http://purl.oclc.org/dsdl/schematron is associated with the prefix iso, as it is in all these examples. Previous versions of Schematron used the sch prefix. You can choose what prefix you want. Just make sure which namespace you want to associate it with.
The queryBinding attribute
specifies which version of XSLT we are going to use to process the
rules. The title is used in the final output as …
surprisingly, a document title! The only other content is a foreign
namespace definition. I've included it here simply to show how it's
done. We'll use it later. If your input document is namespaced,
you'll need to add the namespace in two places, as a declaration in
the document element, and as a ns
element. That's it!
Now to add the constraints at the place marked in Example 2.2, “File input.sch, an empty Schematron file.”
Starting with the pattern
element. This is basically a grouping wrapping. For example, we may
choose to group all related chapter level checks within one
pattern. Within a pattern
element there is
one rule
element. There could be many rules
within a single pattern. It is good practice to restrict the number of
rules such that the group is coherent and can be quickly
understood.
The rule
element is at the heart
of Schematron. This expresses a rule that you want to run against
the input document. Two points to note here. Firstly the
context
attribute. This may be viewed in the same way as the match attribute
on the xsl:template
element in an XSLT
stylesheet. The key point is that this specifies the context (used in
just the same way as a context is used in XSLT) in which the rules
will be applied. So for this case, the rule will be applied where the
context is the chapter
element in our
input.xml
document. Again note that the rule
element has just one child, an assert
element, though as before, it may have
many child elements, though the context will remain that specified by
the context
attribute.
A word of caution. Some rules are said to be
abstract. This is defined to be the case when the
abstract
has a value of true. If a rule
has a context
attribute, then it cannot
have an abstract
value set to
true. More on this later, see Chapter 9, The extends
element. The grammar for the rule element is, using pseudo DTD syntax:
element rule
Either
attributes: abstract[true], id
children: Let*, (Assert | Report | extends)+
or
attributes: abstract[false]?, context, id?
children: Let*, (Assert | Report | extends)+
So a rule is either abstract or has a context. The latter use is the more common one.
Finally, the assert
element.
We need a clear understanding of this element, so please slow down a
little reading this paragraph! Two aspects are key. Firstly the test
attribute, which acts in just the same
way as the test
attribute on the xsl:when
element in XSLT. It's a boolean test
returning either true or false. It is executed within the context of
the parent rule
(the chapter element in
this case). So if we look at the input document for which we are
writing the rules, for each chapter element, we are making an
assertion that the chapter
element has a
title
element as a child. That can only be
either true or false. A chapter has a title element as a direct child,
or it doesn't. That's the syntax. Now the semantics.
This is an assert statement. See section ¶ 5.4.2 in 2. An assert statement is (sort of)
negative. What I mean by that is that if the test passes, the
assertion is said to succeed. The text content of the assert statement
(A chapter should.....) is the message you want to be output if the
assertion fails. What this means in fact is, if the test passes the
asssociated message is not output. If
the test fails, the message will be output! Now re-read this
paragraph. I know it made my head hurt. This is why it matches our
test for a title
child element. If the
title is there, no message is output. If the title is missing, then
test fails and the message is output to the report file.
It becomes easier to accept when you see its inverse, the report
element.
To recap. We want to check if each chapter element has a
title. The context
attribute on the rule
element is set to chapter
. The assert
statement uses the test
attribute to test
that such a child element exists. If the test fails, then the text contained within the assert
element is output in the report! That
completes the description of the first element. A little tedious, but
I hope worthwhile.
Before moving on to other elements, we should check that it all
works in practice. If all the tests pass, this is really quite
boring. It works on the principle that no news is good news, so that a
test which passes does nothing? So the assert
should not report anything since our input file is
compliant to our single constraint!