Topics

Getting a handle on DITA-OT extension points #DITA-OT #XSLT


Chris Papademetrious
 

Hi everyone,

It's time to learn how to develop proper DITA-OT content-processing plugins! I need to replace some of my very ugly CSS hacks with proper XSLT-based content adjustments. In addition, I would like to explore how the processing pipeline works so I can implement (and share!) some sort of cross-book link solution.

The DITA-OT documentation lists plenty of extension points. However, how do I learn what each of these points does? Some questions:

  • Is there a way to dump the content that flows through a given extension point, so I can see what content is being operated on?
  • Is there a way to list all the extension points in application order, so I could apply the previous item to do a before/after diff to see the changes between each extension point?

I always thought extension points provided additive content processing, where you apply additional XSLT code. However, when Radu provided me with a plugin to save keyref information in an HTML5 attribute, I saw that his plugin copied the entire template of

<xsl:template name="commonattributes">
...
</xsl:template>

out of

<DITA-OT>/plugins/org.dita.html5/xsl/topic.xsl

then added a few lines at the end for the new attribute. Now I am confused, which leads to more questions!

  • Will I always be copying some part of an existing DITA-OT template to augment it?
  • Or are there times when I just provide the additional template functionality by itself?
  • How do I know which XSLT files in the DITA-OT directory correspond with which extension point, to know what to copy?

I don't mind running experiments to learn. I really want to "instrument" the processing pipeline to dump out the content state at each step, then set up an interactive diff where I can see what each stage does, where information appears and disappears. This would make it easier to understand where best to wedge into the pipeline. Are the hooks there to do this?

Thanks!

 - Chris


Radu Coravu
 

Hi Chris,

Please see some suggestions below:

Is there a way to dump the content that flows through a given extension point, so I can see what content is being operated on?
You can try something like this:

<xsl:template match="/">
<xsl:message>ENTIRE CONTENT <xsl:copy-of select="."/></xsl:message>
<xsl:next-match/>
</xsl:template>
That " <xsl:next-match/>" should make it possible to continue the default processing but I cannot 100% guarantee that logging the entire root element will work, it depends on how the XSLT stylesheets include each other.
Any xsl:message should appear in the DITA OT console view.
Usually (and this works) I log smaller parts of the XSLT stylesheet, I add the template with my match and see on what element it gets applied:

<xsl:template match="*[contains(@class, ' topic/image ')][@outputclass='test']">
<xsl:message>IMAGE CONTENT <xsl:copy-of select="."/></xsl:message>
</xsl:template>
You can also set the "clean.temp" parameter to "no" and after publishing look in the transformation temporary files folder, it contains the pre-processed content on which the XSLT stylesheets will be applied.

Will I always be copying some part of an existing DITA-OT template to augment it?
Or are there times when I just provide the additional template functionality by itself?
How do I know which XSLT files in the DITA-OT directory correspond with which extension point, to know what to copy?
Usually I look in the Oxygen Attributes view for the @class attribute for a particular element that I want to style differently. For example the DITA <image> element has @class="- topic/image " and then using the last token in the class attribute value (in this case 'topic/image') I use the Find/Replace in Files utility to search in the DITA OT "plugins" folder for places where templates match and process that particular element.
For example if I want to style the HTML5 output I search in the "org.dita.html5" plugin folder for the template matching that element. As both the Oxygen WebHelp and PDF output based on HTML5 and CSS are also based on HTML5, using the "dita.xsl.html5" extension point to provide your custom XSLT stylesheet might fix things both in the WebHelp and PDF based on CSS output.

Regards,
Radu

Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

On 12/10/2019 3:35 AM, Chris Papademetrious wrote:
Hi everyone,
It's time to learn how to develop proper DITA-OT content-processing plugins! I need to replace some of my very ugly CSS hacks with proper XSLT-based content adjustments. In addition, I would like to explore how the processing pipeline works so I can implement (and share!) some sort of cross-book link solution.
The DITA-OT documentation lists plenty of extension points <https://www.dita-ot.org/dev/extension-points/plugin-extension-points.html>. However, how do I learn what each of these points does? Some questions:
* Is there a way to dump the content that flows through a given
extension point, so I can see what content is being operated on?
* Is there a way to list all the extension points in application
order, so I could apply the previous item to do a before/after diff
to see the changes between each extension point?
I always thought extension points provided *additive* content processing, where you apply additional XSLT code. However, when Radu provided me with a plugin to save keyref information in an HTML5 attribute, I saw that his plugin *copied the entire template* of
<xsl:template name="commonattributes">
...
</xsl:template>
out of
<DITA-OT>/plugins/org.dita.html5/xsl/topic.xsl
then added a few lines at the end for the new attribute. Now I am confused, which leads to more questions!
* Will I always be *copying* some part of an existing DITA-OT template
to augment it?
* Or are there times when I just provide the additional template
functionality *by itself*?
* How do I know which XSLT files in the DITA-OT directory correspond
with which extension point, to know what to copy?
I don't mind running experiments to learn. I really want to "instrument" the processing pipeline to dump out the content state at each step, then set up an interactive diff where I can see what each stage does, where information appears and disappears. This would make it easier to understand where best to wedge into the pipeline. Are the hooks there to do this?
Thanks!
 - Chris


ekimber@contrext.com
 

For the most part you'll need to look at the implementation of a given extension point to see what it does.

As for overriding versus extending: that depends in large part how the templates you're extending were implemented. Sometimes they are are implemented in a way that allows for simple extension without having to a copy a lot of code, sometimes they aren't, so you just have to look at the base code and take it on a case-by-case basis.

As Radu points out, xsl:next-match can be a friend when you're just adding something before or after (or around) what the base template generates. But if you need to modify what the base template is generating and the template itself doesn't offer extension points (for example, by not doing apply templates when it could have) then you have to copy the base template and modify it. Of course, in that case it would be ideal to contribute an update to that template that provides the extension features you needed, but I realize we don't always have the luxury of implementing our immediate extensions and enhancements to the base OT code.

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 12/9/19, 7:35 PM, "Chris Papademetrious" <dita-users@groups.io on behalf of chrispitude@...> wrote:

Hi everyone,

It's time to learn how to develop proper DITA-OT content-processing plugins! I need to replace some of my very ugly CSS hacks with proper XSLT-based content adjustments. In addition, I would like to explore how the processing pipeline works so I can implement (and share!) some sort of cross-book link solution.

The DITA-OT documentation lists plenty of extension points <https://www.dita-ot.org/dev/extension-points/plugin-extension-points.html>. However, how do I learn what each of these points does? Some questions:


* Is there a way to dump the content that flows through a given extension point, so I can see what content is being operated on?
* Is there a way to list all the extension points in application order, so I could apply the previous item to do a before/after diff to see the changes between each extension point?


I always thought extension points provided additive content processing, where you apply additional XSLT code. However, when Radu provided me with a plugin to save keyref information in an HTML5 attribute, I saw that his plugin copied the entire template of

<xsl:template name="commonattributes">
...
</xsl:template>

out of

<DITA-OT>/plugins/org.dita.html5/xsl/topic.xsl

then added a few lines at the end for the new attribute. Now I am confused, which leads to more questions!


* Will I always be copying some part of an existing DITA-OT template to augment it?
* Or are there times when I just provide the additional template functionality by itself?
* How do I know which XSLT files in the DITA-OT directory correspond with which extension point, to know what to copy?


I don't mind running experiments to learn. I really want to "instrument" the processing pipeline to dump out the content state at each step, then set up an interactive diff where I can see what each stage does, where information appears and disappears. This would make it easier to understand where best to wedge into the pipeline. Are the hooks there to do this?

Thanks!

- Chris


Chris Papademetrious
 

Thanks Radu and Eliot for the guidance!

Based on the guidance here, I put together the following DITA-OT "instrumentation" utility:


Here's an example of what comes out the other end (see the numbered XML output files):

I'll readily admit that I don't understand what's coming out the other end. :)  For example, I'm not sure if <xsl:message> prints the content before or after the default actions at that stage are applied. And I'm still not sure how or where the full HTML5 output file is put together; I was expecting to see that somewhere in here.

 - Chris


ekimber@contrext.com
 

The order that you see messages depends on the order that the XSLT engine actually processes things, which may not be the order they occur in the XSLT source.

In particular, variables will typically not be resolved until referenced, so if you have something like this:

<xsl:variable name="foo" as="element()*">
<xsl:message>+ [DEBUG] Constructing variable "foo" ...</xsl:message>
</xsl:variable>

<xsl:message>+ [DEBUG] Foo has been constructed</xsl:message>

<xsl:message>+ [DEBUG] Value of foo is: <xsl:sequence select="$foo"/></xsl:message>

You will probably see the message "+ [DEBUG] Foo has been constructed" before the message "+ [DEBUG] Constructing variable "foo"..."

You can, of course, force evaluation of a variable by using it in a message.

When I use messages for debugging XSLT I'm very careful to put enough contextual information in the message so I tell where in the code it was emitted, usually the mode in effect, the template name or match, as well as the name of the parent and context element (if it's an element-processing template).

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 12/10/19, 1:05 PM, "Chris Papademetrious" <dita-users@groups.io on behalf of chrispitude@...> wrote:

Thanks Radu and Eliot for the guidance!

Based on the guidance here, I put together the following DITA-OT "instrumentation" utility:

chrispy-snps/DITA-instrument-pipeline <https://github.com/chrispy-snps/DITA-instrument-pipeline> (github)

Here's an example of what comes out the other end (see the numbered XML output files):
https://github.com/chrispy-snps/DITA-instrument-pipeline/tree/master/example

I'll readily admit that I don't understand what's coming out the other end. :) For example, I'm not sure if <xsl:message> prints the content before or after the default actions at that stage are applied. And I'm still not sure how or where the full HTML5 output file is put together; I was expecting to see that somewhere in here.

- Chris


Chris Papademetrious
 

Hi Eliot,

The <xsl:message> instrumentation is applied only to the root element:

<xsl:template match="/*">
    <xsl:message>BEGIN 'dita.xsl.conref'</xsl:message>
    <xsl:message><xsl:copy-of select="."/></xsl:message>
    <xsl:message>END 'dita.xsl.conref'</xsl:message>
    <xsl:next-match/>
</xsl:template>

and I think this should output the entire root element after all subelements have been processed, but before the root element has been processed (which in retrospect, I'm not sure is very useful).

Another thing I'm trying to figure out is why I get different results if I use match="/*" versus match="/". The former includes XML PIs (expected), but there are other differences too.

I'm wondering if I need to wedge into the processing pipeline in some other way, but I don't know enough about how this works yet.

 - Chris


ekimber@contrext.com
 

The XPath "/" matches the document node, which may have PIs and comments before or after the root element. "/*" matches the root element of the document.

You're outputting the copy of the input before the call to xsl:next-match, so it will definitely be emitted first.

Also, you can use <xsl:sequence select="."/> instead of xsl:copy-of, although it will come to the same thing.

Since xsl:sequence select="." Doesn't need to make a literal copy of the selected nodes it might be more memory efficient than xsl:copy-of, which requires creating a copy of the selected nodes.

With XSLT 2+ there's really no reason to prefer xsl:copy-of over a simple select as far as I know.

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 12/10/19, 3:50 PM, "Chris Papademetrious" <dita-users@groups.io on behalf of chrispitude@...> wrote:

Hi Eliot,

The <xsl:message> instrumentation is applied only to the root element:

<xsl:template match="/*">
<xsl:message>BEGIN 'dita.xsl.conref'</xsl:message>
<xsl:message><xsl:copy-of select="."/></xsl:message>
<xsl:message>END 'dita.xsl.conref'</xsl:message>
<xsl:next-match/>
</xsl:template>

and I think this should output the entire root element after all subelements have been processed, but before the root element has been processed (which in retrospect, I'm not sure is very useful).

Another thing I'm trying to figure out is why I get different results if I use match="/*" versus match="/". The former includes XML PIs (expected), but there are other differences too.

I'm wondering if I need to wedge into the processing pipeline in some other way, but I don't know enough about how this works yet.

- Chris


Radu Coravu
 

Hi Chris,

I actually wanted at some point to find the time and blog about various XSLT writing tips, I added a link to this discussion to my opened issue:

https://github.com/oxygenxml/blog/issues/4

There is a Saxon function which can be used in xsl:messages to print the stack trace of current templates:

<xsl:message><xsl:value-of select="saxon:print-stack()" xmlns:saxon="http://saxon.sf.net/"/></xsl:message>
I also wrote a topic about using Oxygen's XSLT debugger to debug DITA OT PDF output:

https://www.oxygenxml.com/doc/versions/21.1/ug-editor/topics/debugging-pdf-transformation.html

but that's for classic XSL-FO based output, if you are interested I could try to come up with some steps also for the HTML5+CSS to PDF transform.

Regards,
Radu

Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

On 12/10/2019 9:05 PM, Chris Papademetrious wrote:
Thanks Radu and Eliot for the guidance!
Based on the guidance here, I put together the following DITA-OT "instrumentation" utility:
chrispy-snps/DITA-instrument-pipeline <https://github.com/chrispy-snps/DITA-instrument-pipeline> (github)
Here's an example of what comes out the other end (see the numbered XML output files):
https://github.com/chrispy-snps/DITA-instrument-pipeline/tree/master/example
I'll readily admit that I don't understand what's coming out the other end. :)  For example, I'm not sure if <xsl:message> prints the content before or after the default actions at that stage are applied. And I'm still not sure how or where the full HTML5 output file is put together; I was expecting to see that somewhere in here.
 - Chris


Chris Papademetrious
 

Hi Eliot,

I found a StackOverflow answer that clarified a lot of my confusion on how XSLT works:

"In what order do templates in an XSLT document execute, and do they match on the source XML or the buffered output?"

I had to throw away my mental model of the input "morphing" into the output. I see now that there are separate input, transformation, and result trees, and that selection always applies to the input tree, and that once content is pushed to the result tree, your window of opportunity to modify it is gone (in that transform).

So regardless of where I place the <xsl:message> to print the XML, it will always be from the input XML tree and not the result XML tree. There is no way to show the pre- and post-transformation results via debug messages, at least not from the same XSLT.

Radu, I had a look at your debugging topic and also a YouTube video on the debugger, and it looks very powerful. I need to spend some time learning this. A version of that for HTML5 would be useful, particularly since PDF Chemistry and WebHelp both use that format.

Eliot, you mentioned that I'll need to copy/modify or extend depending on how the original code is written. I want to add my own attribute to the "commonattributes" processing in the html5 transform, but I don't want to copy/modify it. I tried to use <xsl:next-match> to make it incremental:

<!-- extend "commonattributes" processing -->
<xsl:template name="commonattributes">
  <xsl:param name="default-output-class"/>

  <!-- add my own attribute -->
  <xsl:attribute name="data-foo" select="'bar'"/>

  <!-- continue with the next template -->
  <xsl:next-match>
    <xsl:with-param name="default-output-class" select="$default-output-class"/>
  </xsl:next-match>
</xsl:template>

but then I get the following error:

Error: The following error occurred while executing this line:
/home/chrispy/dita-ot/plugins/org.dita.html5/build_dita2html5.xml:158: net.sf.saxon.trans.UncheckedXPathException: An attribute node (id) cannot be created after a child of the containing element. Most recent element start tag was output at line 1893 of module topic.xsl

What am I missing? The plugin is attached if you want to try it.

 - Chris


ekimber@contrext.com
 

To debug your result you can capture the result to a variable, put in a message (or more likely, write it a file), and the send to the result, e.g.:

<!-- In the root-level processing or wherever you would normally be generating the main result -->
<xsl:variable name="result" as="node()*">
<xsl:apply-templates/> <!-- do what ever you would normally do -->
</xsl:variable>
<xsl:result-document href="result.xml">
<xsl:sequence select="$result"/>
</xsl:result>
<xsl:sequence select="result"/>

To override a named template like "commonattributes" you have to copy and replace it--next-match doesn't really make sense in that context because you've arrived at that template by calling it by name, not by matching, so there's really no "next match" (I'm sure there is a template that will match but you don't really know, in the context of the commonattributes template what the incoming match context was so you, as the XSLT author, have no easy way to predict what the next matching template might be).

The error message you're seeing means just what it says: between the time you output the attribute in the commonattributes and the time that another attribute was generated, something other than an attribute was generated: a text node, an element node, a comment, anything other than an attribute.

That is, whatever template matched next likely output something (probably a new element) and then ended up generating another attribute.

This is almost certainly because of your next-match call.

This is an example of where having used a name template rather than apply-templates makes the best code harder to extend. It would make more sense to have a separate "generate attributes" mode that is applied to the current input element, with a default template that does what commonattributes does and that matches all elements. With that approach you could then simply add a higher-precedence template to the mode to add the attributes you want rather than having to complete replace commonattributes with your version of it.

Cheers,

E.
--
Eliot Kimber
http://contrext.com


On 12/23/19, 5:47 AM, "Chris Papademetrious" <dita-users@groups.io on behalf of chrispitude@...> wrote:

Hi Eliot,

I found a StackOverflow answer that clarified a lot of my confusion on how XSLT works:

"In what order do templates in an XSLT document execute, and do they match on the source XML or the buffered output?" <https://stackoverflow.com/questions/1531664/in-what-order-do-templates-in-an-xslt-document-execute-and-do-they-match-on-the>

I had to throw away my mental model of the input "morphing" into the output. I see now that there are separate input, transformation, and result trees, and that selection always applies to the input tree, and that once content is pushed to the result tree, your window of opportunity to modify it is gone (in that transform).

So regardless of where I place the <xsl:message> to print the XML, it will always be from the input XML tree and not the result XML tree. There is no way to show the pre- and post-transformation results via debug messages, at least not from the same XSLT.

Radu, I had a look at your debugging topic and also a YouTube video on the debugger, and it looks very powerful. I need to spend some time learning this. A version of that for HTML5 would be useful, particularly since PDF Chemistry and WebHelp both use that format.

Eliot, you mentioned that I'll need to copy/modify or extend depending on how the original code is written. I want to add my own attribute to the "commonattributes" processing in the html5 transform, but I don't want to copy/modify it. I tried to use <xsl:next-match> to make it incremental:

<!-- extend "commonattributes" processing -->
<xsl:template name="commonattributes">
<xsl:param name="default-output-class"/>

<!-- add my own attribute -->
<xsl:attribute name="data-foo" select="'bar'"/>

<!-- continue with the next template -->
<xsl:next-match>
<xsl:with-param name="default-output-class" select="$default-output-class"/>
</xsl:next-match>
</xsl:template>

but then I get the following error:

Error: The following error occurred while executing this line:
/home/chrispy/dita-ot/plugins/org.dita.html5/build_dita2html5.xml:158: net.sf.saxon.trans.UncheckedXPathException: An attribute node (id) cannot be created after a child of the containing element. Most recent element start tag was output at line 1893 of module topic.xsl

What am I missing? The plugin is attached if you want to try it.

- Chris


Chris Papademetrious
 

I found this promising message, which mentions a way to call the overridden named template in XSLT 3.0:

https://www.oxygenxml.com/archives/xsl-list/201307/msg00081.html

But when I tried it:

<!-- continue with the next template -->
<xsl:call-template name="xsl:original">
  <xsl:with-param name="default-output-class" select="$default-output-class"/>
</xsl:call-template>

(and also updated the XSLT version to 3.0 in the header), DITA-OT 3.4 complained with

     [xslt] Static error in xsl:call-template/@name on line 15 column 44 of html5.xsl:
     [xslt]   XTSE0080: Namespace prefix xsl refers to a reserved namespace

So close to an element template augmentation approach!!

 - Chris


ekimber@contrext.com
 

Looks like a question for Mike Kay to be asked on the Saxon help list.

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 12/23/19, 10:32 AM, "Chris Papademetrious" <dita-users@groups.io on behalf of chrispitude@...> wrote:

I found this promising message, which mentions a way to call the overridden named template in XSLT 3.0:

https://www.oxygenxml.com/archives/xsl-list/201307/msg00081.html

But when I tried it:

<!-- continue with the next template -->
<xsl:call-template name="xsl:original">
<xsl:with-param name="default-output-class" select="$default-output-class"/>
</xsl:call-template>

(and also updated the XSLT version to 3.0 in the header), DITA-OT 3.4 complained with

[xslt] Static error in xsl:call-template/@name on line 15 column 44 of html5.xsl:
[xslt] XTSE0080: Namespace prefix xsl refers to a reserved namespace

So close to an element template augmentation approach!!

- Chris


Chris Papademetrious
 

Eliot - so the good news is, XSLT 3.0 can indeed chain named templates! The bad news is, the templates must be stored in packages, which then requires a change in the transformation command line to reference those packages.


While XSLT packages might be a nice way of organizing plugins in the future, it looks like I'll be sticking to the copy-and-modify solution for now.