Validation of Markdown with Schematron

Posted on February 16, 2020 by Rick Jelliffe

The latest 22.0 version of the Oxygen XML editor (or really, IDE) has a feature I have not seen before, which is to validate Markdown with Schematron.

Of course, it has always been possible to transform from non-XML to XML and then validate that. And Schematron of course allows diagnostic messages in terms of the original notation. However, the XPaths generated will be in terms pf the XML, not the original markup. I am not sure which a[[roach Oxygen takes, but I can imagine several methods:

The forward transformation includes a back link reference in the XML (XHTML) to the Markdown (e.g to line and column numbers);
The XHTML DOM data structure has pointers to the Markdown data structures;
A version of XPath is made to operate directly on Markdpwn (unlikely);
only fragments are offered for validation, so the code knows what the original locus was regardless of how it was transformed.

I guess the key point here is that the markdown to XML transformation is very simple. But still it is really interesting issue: if we have a long pipeline of say 20 different transformations through three or four schemas and wrappers/splitters, how can we tie the result of validation to the original input? How much can be done statically by code analysis? How much can be done by piecing together logs? How much can be done by decorating the XML with trace or provenance info?