Date   

Re: Xpath for validating if href in xref element is pointing to <fn> element #XSLT

Shaurabh
 

Hi Chris,

I am not trying to validate if all footnotes are used or not. I am just trying to find the count for all xrefs which are targeting to footnotes. As i said earlier, I am able to find the count for all xrefs being used inside <tgroup> irrespective of their target element using below xpath-

<xsl:variable name="fnXCount" select="count($prmElement/descendant::*[contains(@class,' topic/xref ')][@href])"/>

  But i want the total count for only those xrefs which are pointing to <fn> element.


Thanks,
Shaurabh


Re: Xpath for validating if href in xref element is pointing to <fn> element #XSLT

Chris Papademetrious
 

Hi Shaurabh,

Because <fn> elements with IDs don't generate content themselves, I keep them self-contained inside the table that references them. This way, they move along with the table if I copy/move/conref it to a different topic.

Are you trying to validate that all footnotes are used, and none have gone silently unused?

Is this check going into a Schematron file, or is it being put into some other XSLT file?

 - Chris


Xpath for validating if href in xref element is pointing to <fn> element #XSLT

Shaurabh
 

I have a test.dita file which consists of a table element as shown below. As you can see few cells inside <tgroup> element consists of xref element which are targeting to <fn> element. I want to have total count of <xref> elements targeting to <fn> element.
I am able to have the count of xref elements having href attribute using below xpath-

<xsl:variable name="prmElement" select="."/>
<xsl:variable name="fnXCount" select="count($prmElement/descendant::*[contains(@class,' topic/xref ')][@href])"/>

In addition to above, I want to validate if @href is pointing to <fn> element.  Please help me to improve above xpath for fn count validation.
test.dita

<topic id="test_overview">
    <title id="title_id">Test Title</title>
    <body>
        <table>
            <title>My table</title>
            <desc>
                <p>
                    <fn id="fn_5fc">My first footnote.</fn>
                </p>
                <p>
                    <fn id="fn_vfc">My second footnote.</fn>
                </p>
            </desc>
            <tgroup cols="3">
                <colspec colname="col1" colwidth="1.01*" colnum="1"/>
                <colspec colname="col4" colwidth="1.01*" colnum="2"/>
                <colspec colname="col5" colwidth="1*" colnum="3"/>
                <thead>
                    <row>
                        <entry>Col1</entry>
                        <entry>Col2</entry>
                        <entry>Col3</entry>
                    </row>
                </thead>
                <tbody>
                    <row>
                        <entry align="center">Cell <xref
                                href="test.dita#test_overview/fn_5fc"
                            />
                        </entry>
                        <entry align="center">Cell <xref
                                href="test.dita#test_overview/fn_5fc"
                            />
                        </entry>
                        <entry align="center">Cell <xref
                                href="test.dita#test_overview/fn_vfc"
                            />
                        </entry>
                    </row>
                </tbody>
            </tgroup>
        </table>
    </body>
</topic>
 


Regards,
Shaurabh


Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Radu Coravu
 

Hi Scott,

Looks like an out of memory error, if you are using the DITA OT from the command line there is some documentation about this in the DITA OT docs:

https://www.dita-ot.org/dev/topics/other-errors.html#troubleshooting__out-of-memory-error

https://www.dita-ot.org/dev/topics/increasing-the-jvm.html

Regards,

Radu

Radu Coravu
Oxygen XML Editor
On 6/13/21 01:35, scott ashmead via groups.io wrote:

Hi friends,

Has anyone received and fix this error while generating a PDF using DITA-OT 3.6.1?
I think this is related to Garbage Cleanup, but I don't know why I would get this while generating a 350 page PDF. Is that too large?

transformation failed. C:\dita-ot-3.6.1\plugins\org.dita.pdf2.fop\build_fop.xml:145: java.lang.OutOfMemoryError: GC overhead limit exceeded

Thank you,
Scott

  


Re: Links that are both local and cross-deliverable in shared topics

ekimber@contrext.com
 

Cross-deliverable links are a challenge for a number of reasons, but chief among them is the potential configuration complexity inherent in the fact that a single set of root maps can produce many different deliverables of different types and with different configurations. If there's an exact one-to-one mapping from source root maps to deliverables the problem is easy, but as soon as you can have two or more deliverables from a single root map, the problem gets much more challenging.

The DITA *source* markup makes it possible to unambiguously create a reference to any element in any topic in the context of a specific root map. This is necessary but not sufficient.

When you produce a given a deliverable from a root map you have to be able to control how the source-to-source links translate to deliverable-to-deliverable links when there are multiple possible deliverables for a given target root map.

That's exactly the scenario you described with the use of filtering applied to root maps to produce multiple deliverables from a single root map reflecting different filtering conditions.

Thus there has to be some way to configure the deliverable production process so that sets of related deliverables are produced correctly, i.e., for a set of inter-linked root maps, some way to ensure that the same filtering conditions (or the *correct* set of filtering conditions) is applied to a set of deliverable production processes so that the resulting deliverables are correctly linked to each other. This requires some sort of deliverable production project manager that gives you a way to configure the deliverable generation details.

For example, if you need to produce a set of interlinked deliverables that all reflect macOS, you need a way to specify not just the filtering but the deliverable URIs *as published* to that the links will be to the right place.

Using different wrapper maps that have different ditaval refs and result in different deliverable names is one way to do that and might be the easiest but it seems like that could quickly get either unwieldy or confusing or just impractical.

DITA OT's new project facility seems like at least a start for this kind of production processing configuration manager but more is probably required to fully coordinate the production of multiple interlinked deliverables that have different input parameters (different filtering conditions, etc.) and different result details (different publishing locations for the deliverables).

It sounds like you're able to impose simplifying assumptions that make the problem easier to solve in your environment, which is good--keeping it as simple as you can and still meeting requirements is always a good idea.

A challenge for a tool like DITA OT is that a more general solution starts to become pretty complex pretty quickly, which makes it less likely anyone will take a stab at implementing it...

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 6/13/21, 8:29 PM, "Chris Papademetrious" <main@dita-users.groups.io on behalf of chrispitude@gmail.com> wrote:

Hi Eliot,

The bug with Oxygen is when option #2 is used ((1) the current book A has its own local scope A, and (2) it references a peer book B with scope B, and (3) you drag-and-drop a topic from map B into a topic in map A to create a cross-book link), the link is incorrectly created as "B.B.topic1" instead of "B.topic1". SyncroSoft has confirmed it's a bug.

It would be hard to separate writers from their maps here. Each technical writer is assigned 1-3 products, and they own the user guides and reference manuals for those products. Because the map controls both PDF and OLH content delivery, the writers work with maps as tightly as they do the topics.

So, I try to empower the writers to manage their own maps by


* Keeping map structures and rules as simple as possible
* Using Schematron rules to keep writers on the right path
* Relying on Oxygen to make things like cross-book link creation easy

So far, this combination has allowed us to be successful without the need for map/key managers and such, and without the writers having to get too much into the technical DITA stuff. For the most part, it's just writing and Oxygen GUI stuff. And I'll ride that for as long as I can! (And quite frankly, we're not resourced for anything more.)

We're already using peer-book scopes, so local scopes should be a straightforward conceptual extension for the writers to learn. Once this double-scoping bug is fixed in Oxygen, we should be in good shape for creating cross-book links in topics reused across books, which is really cool.

The next hurdle is creating cross-book links between books that use profiling conditions to publish one map to multiple deliverables. Writers want the ability to cross-link to specific conditional versions of books, and also to cross-link automatically between same-condition books. I've had some success prototyping this with a "wrapper map," which is a simple <map> that has only a <ditavalref> and a <mapref> to a full <bookmap>. I plan to share more about that here when the loose ends are ironed out.

The tip about books being able to provide overriding key definitions is a good one - thank you!

- Chris


Re: Links that are both local and cross-deliverable in shared topics

Chris Papademetrious
 

Hi Eliot,

The bug with Oxygen is when option #2 is used ((1) the current book A has its own local scope A, and (2) it references a peer book B with scope B, and (3) you drag-and-drop a topic from map B into a topic in map A to create a cross-book link), the link is incorrectly created as "B.B.topic1" instead of "B.topic1". SyncroSoft has confirmed it's a bug.

It would be hard to separate writers from their maps here. Each technical writer is assigned 1-3 products, and they own the user guides and reference manuals for those products. Because the map controls both PDF and OLH content delivery, the writers work with maps as tightly as they do the topics.

So, I try to empower the writers to manage their own maps by

  • Keeping map structures and rules as simple as possible
  • Using Schematron rules to keep writers on the right path
  • Relying on Oxygen to make things like cross-book link creation easy
So far, this combination has allowed us to be successful without the need for map/key managers and such, and without the writers having to get too much into the technical DITA stuff. For the most part, it's just writing and Oxygen GUI stuff. And I'll ride that for as long as I can! (And quite frankly, we're not resourced for anything more.)

We're already using peer-book scopes, so local scopes should be a straightforward conceptual extension for the writers to learn. Once this double-scoping bug is fixed in Oxygen, we should be in good shape for creating cross-book links in topics reused across books, which is really cool.

The next hurdle is creating cross-book links between books that use profiling conditions to publish one map to multiple deliverables. Writers want the ability to cross-link to specific conditional versions of books, and also to cross-link automatically between same-condition books. I've had some success prototyping this with a "wrapper map," which is a simple <map> that has only a <ditavalref> and a <mapref> to a full <bookmap>. I plan to share more about that here when the loose ends are ironed out.

The tip about books being able to provide overriding key definitions is a good one - thank you!

 - Chris


preserve submap metadata in merged file #pdf #PDF

Leigh White
 

Hi all,

I'm following up on an old thread that doesn't seem to have reached a conclusion, or at least not a conclusion that I can understand. In short, I am trying to preserve the metadata of submaps in a bookmap. I'm using OT 3.5.4, PDF. I have tried adding preprocess.clean-map-check.skip=true to my build.properties file and to configuration.properties. I have also added it to my plugin's plugin.xml file. None of these seems to have any effect: the submaps' metadata does not appear in the merged file. Was that even the intent?

The thread I am referring to (https://github.com/dita-ot/dita-ot/pull/2739 and https://dita-ot.slack.com/archives/C02DW51E3/p1620311367058800) is about 4 years old, and a lot has changed since then.

I do see mode="preserve-submap-title-and-topicmeta" in maprefImpl.xsl and it appears to be default behavior but that is not what I am seeing. The metadata I am trying to preserve is straightforward topicmeta metadata like so:

<topicmeta>
    <critdates>
      <created date="1999-08-11"/>
      <revised modified="2019-10-25"/>
    </critdates>
    <permissions view="internal"/>
    <metadata>
      <audience type="user"/>
      <prodinfo>
        <prodname>Widget</prodname>
        <brand>Pro3</brand>
      </prodinfo>
    </metadata>
    <category>XYZ</category>
    <othermeta name="product-code" content="wid-1234"/>
    <othermeta name="docnumber" content="WIDP34567"/>
    <othermeta name="doctype" content="User Guide"/>
    ...

So...is there a straightforward (or even a non straightforward) way to preserve this metadata in the merged file?

Thanks,
Leigh


java.lang.OutOfMemoryError: GC overhead limit exceeded

scott ashmead
 

Hi friends,

Has anyone received and fix this error while generating a PDF using DITA-OT 3.6.1?
I think this is related to Garbage Cleanup, but I don't know why I would get this while generating a 350 page PDF. Is that too large?

transformation failed. C:\dita-ot-3.6.1\plugins\org.dita.pdf2.fop\build_fop.xml:145: java.lang.OutOfMemoryError: GC overhead limit exceeded

Thank you,
Scott


Re: Word2Dita?

Leigh White
 

Hi all...I'd like to clarify that IXIASOFT has acquired Stilo's AuthorBridge product, but we have not acquired Stilo the company. In case anyone was curious. :-) https://www.ixiasoft.com/ixiasoft-announces-authorbridge-acquisition/.

Best,
Leigh


Re: Word2Dita?

ekimber@contrext.com
 

I concur--if you have easy access to Stilo it might be the fastest route to DITA from your Word doc.

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 6/10/21, 7:49 PM, "teamwis" <main@dita-users.groups.io on behalf of dfanster@gmail.com> wrote:

To the best of my knowledge, you might want to have Stilo
International do the work for you because Ixiasoft has acquired Stilo
early on, your company should have no problem using Stilo to do the
conversion, depending on your budgets. Go to
https://www.stilo.com/migrate-dita/ for more info.

Cheers
Ray

On 6/10/21, Nancy Roberts <nanr93@gmail.com> wrote:
> Hi all,
>
> For the first time since 2009, I need to convert a Word doc to DITA that is
> too big to be done manually. What is the Word2Dita plugin? What does it plug
> in to? The OT? My company uses the Ixiasoft CMS, if that makes a difference.
> Any advice is much appreciated.
>
> Thanks,
> Nancy
>
>
>
>
>
>


--
Keep an Exacting Eye for Detail


Re: Word2Dita?

teamwis
 

To the best of my knowledge, you might want to have Stilo
International do the work for you because Ixiasoft has acquired Stilo
early on, your company should have no problem using Stilo to do the
conversion, depending on your budgets. Go to
https://www.stilo.com/migrate-dita/ for more info.

Cheers
Ray

On 6/10/21, Nancy Roberts <nanr93@gmail.com> wrote:
Hi all,

For the first time since 2009, I need to convert a Word doc to DITA that is
too big to be done manually. What is the Word2Dita plugin? What does it plug
in to? The OT? My company uses the Ixiasoft CMS, if that makes a difference.
Any advice is much appreciated.

Thanks,
Nancy





--
Keep an Exacting Eye for Detail


Re: Anyone Using Elasticsearch to Index DITA Content?

ekimber@contrext.com
 

That sounds very interesting. I'll take a look at that docs-bulk.html link.

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 6/10/21, 10:44 AM, "Toshihiko Makita" <main@dita-users.groups.io on behalf of tmakita@antenna.co.jp> wrote:

I have experienced to develop DITA full test search pilot project last year via AWS Elasticsearch before the conflict between AWS and elastic.co.
This search is integrated into the DITA to HTML (or .php) publishing result. Following are several things I have done:

* Use "curl" (or "awscurl") to generate index in AWS Elastic search.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html
* Convert DITA map & topics into JSON and execute "bulk" operation.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
* Develop PHP program that accepts search request from client browser and return the search result from Elasticsearch as JSON.
* The JSON search results are edited by JavaScript and displayed in the browser.
* By clicking the search result, a user can reach the target Web page.

It was very exciting experience because I must learn about AWS operations and develop PHP and JavaScript (TypeScript) programs which I haven't ever knew.
Unfortunately it is still pilot project. However it will be integrated into user Web publishing system in the feature.

--
/*--------------------------------------------------
Toshihiko Makita
Development Group. Antenna House, Inc. Ina Branch
Web site:
http://www.antenna.co.jp/
http://www.antennahouse.com/
--------------------------------------------------*/


Re: Images and scaling

ekimber@contrext.com
 

One possible workaround is to use ImageMagick to generate XML with the details of each image, from which a transform can then make informed decisions about sizing in the DITA being generated.

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 6/10/21, 10:55 AM, "Wayne Brissette" <main@dita-users.groups.io on behalf of wbrisett@att.net> wrote:



Kristen James Eberlein wrote on 2021-06-10 10:41:
> Use the @scale attribute on image, and do not use @height or @width.
>
> Kris Eberlein

Yeah, I was afraid that was going to be the proper answer here... this
is all part of our custom markdown to DITA tooling, and thus they want
me to support the HTML-ish scale 40% this way, and 60% that way... ;) I
think I'll have just tell them to use one value.

Thanks Kris!

Wayne


Re: Images and scaling

Wayne Brissette
 

Kristen James Eberlein wrote on 2021-06-10 10:41:
Use the @scale attribute on image, and do not use @height or @width.

Kris Eberlein
Yeah, I was afraid that was going to be the proper answer here... this is all part of our custom markdown to DITA tooling, and thus they want me to support the HTML-ish scale 40% this way, and 60% that way... ;) I think I'll have just tell them to use one value.

Thanks Kris!

Wayne


Re: Anyone Using Elasticsearch to Index DITA Content?

Toshihiko Makita
 

I have experienced to develop DITA full test search pilot project last year via AWS Elasticsearch before the conflict between AWS and elastic.co.
This search is integrated into the DITA to HTML (or .php) publishing result. Following are several things I have done:
It was very exciting experience because I must learn about AWS operations and develop PHP and JavaScript (TypeScript) programs which I haven't ever knew.
Unfortunately it is still pilot project. However it will be integrated into user Web publishing system in the feature.

-- 
/*--------------------------------------------------
 Toshihiko Makita
 Development Group. Antenna House, Inc. Ina Branch
 Web site:
 http://www.antenna.co.jp/
 http://www.antennahouse.com/
 --------------------------------------------------*/


Re: Images and scaling

Kristen James Eberlein
 

Use the @scale attribute on image, and do not use @height or @width.

Kris Eberlein

On Jun 10, 2021, at 11:32 AM, Wayne Brissette via groups.io <wbrisett=att.net@groups.io> wrote:

So, I've been asked to help with a scaling issue that an author is having with a larger image. I'm a little confused about some of the wording in the 1.2 Spec (we're still on 1.2 ... yeah, I know...). ;)

The Spec states that pixels is the default, so I would assume that an attribute of height="45" would be 45 pixels. OK, but then it also states valid options are pc, px, pt, mm, cm, and in. All of that makes sense. But then it states that if either height is provided but not width that the width is scaled the same amount as the height. Likewise for Width if that is provided but not height.

So, is there a way to instead of using a hard value use a percentage? it doesn't look like it, but it looks like the OT has the logic to scale. So I might be able to use 45 to indicate that an image needs to be scaled 45% of the original size.

-Wayne





Images and scaling

Wayne Brissette
 

So, I've been asked to help with a scaling issue that an author is having with a larger image. I'm a little confused about some of the wording in the 1.2 Spec (we're still on 1.2 ... yeah, I know...). ;)

The Spec states that pixels is the default, so I would assume that an attribute of height="45" would be 45 pixels. OK, but then it also states valid options are pc, px, pt, mm, cm, and in. All of that makes sense. But then it states that if either height is provided but not width that the width is scaled the same amount as the height. Likewise for Width if that is provided but not height.

So, is there a way to instead of using a hard value use a percentage? it doesn't look like it, but it looks like the OT has the logic to scale. So I might be able to use 45 to indicate that an image needs to be scaled 45% of the original size.

-Wayne


Re: Links that are both local and cross-deliverable in shared topics

ekimber@contrext.com
 

Note that in method 1, if you set up the indirect keydefs for the topics that may be in other root maps, authors can blindly refer to the unqualified keys and it's up to map authors to ensure that the referenced keys are bound to either the same-publication use of the topics or other-publication uses, which you can do, for example, by having shared sets of keydefs that "export" the key names for the referenceable topics for each publication so they can be referenced without scope qualification. These keydef maps can easily be generated once you have the pattern set up.

So in your example, for Book A you could generate a map that provides the necessary mapdef to Book A's map and a set of indirection keydefs for use from Book B:

book_a/keydefs/book_a_cross-deliverable-keydefs.ditamap:

<map>
<title>Cross-Deliverable Keydefs for Book A</title>
<topicgroup>
<mapdef keyscope="book_a" scope="peer" href="../book_a.ditamap"/>
<keydef keys="topic_01" keyref="book_a.topic01"/>
...
</topicgroup>
</map>

Then in book_b.ditamap:

<map>
<mapref href="../book_a/keydefs/book_a_cross-deliverable-keydefs.ditamap"/>
...
</map>

The main challenge here is that unqualified keys need to be unique across all your publications, which usually suggests defining key names that reflect the subject of the topic rather than its structural position or source filename, so "install_framitz" rather than "topic_01" so that the keyname is both meaningful to authors and more sensibly unique across all topics regardless of use context.

Then in a topic you can simply have:

<p>Perform task <xref keyref="install_framitz"/> ...</p>

Without having to worry about whether the framitz installation topic is in the same deliverable or a different one.

Note also that it's *map authors* who manage the mapping of key names to uses of topics in the appropriate context.

Map authors have absolute power over key-to-resource bindings (because a root map can override any keydef in any included submap).

This fact suggests that if you've got the kind of re-use and referential sophistication you've described then you need one or more people in a Map Manager job role, responsible for setting key naming policy, key management policy and practice, and construction of the maps themselves, at least as regards key definitions.

Ideally, authors responsible for writing topics (and not for creating maps necessarily) should not have to worry about the key definition details.

Another thing I'll suggest is *put keyscope on everything*, at least on each chapter.

As I've used key scopes more I've come to appreciate how much they help--when you need them you can't do without them and when you don't need them they don't get in the way (or shouldn't).

But for example, by having a keyscope on each chapter, it allows you to easily refer to topics within a chapter from outside clearly and unambiguously. It can also serve to make the subject of a chapter clearer, e.g.:

<bookmap>
...
<chapter keyscope="installation" keys="installation" href="./installation/topic-001.dita">
<topicref keys="prereqs" href="./installation/topic-002.dita"/>
<topicref keys="remove-cover" href="./installation/topic-003.dita"/>
<topicref keys="insert-cart" href="./installation/topic-004.dita"/>
<topicref keys="replace-cover" href="./installation/topic-003.dita"/>
</chapter>
...
</bookmap>

From within the chapter, topics can refer to unqualified keys, i.e., <p>You must satisfy the prerequisites defined in <xref keyref="prereqs"/> ...</p>

And from other chapters you use the key scope: <p>Cover must be removed. See <xref keyref="installation.remove-cover"/>.</p>

Note that all the keys in this example are semantic while the topic filenames are arbitrary and meaningless. This separates the names used to identify and refer to topics in the map and the publication content from the topic storage details. The information nature of the topics will not change but their storage details *will change*.

When you said that my option #2 didn't work--what aspect of it didn't work? The cross-deliverable link is the same in both #1 and #2. The only thing option #2 provides is a way to have a scope-qualified reference work when the target key is in the same root map.

Finally, keep in mind that you can override any scope-qualified key definition by defining a higher-precedence key by defining a key where the key name includes the scope name, i.e.:

<>
<keydef keys="installation.prereqs" href="./topics/installation/prereq-for-ios15.dita" platform="iOS15"/>
...
<chapter keyscope="installation" ...>

Cheers,

E.
--
Eliot Kimber
http://contrext.com


On 6/9/21, 6:32 PM, "Chris Papademetrious" <main@dita-users.groups.io on behalf of chrispitude@gmail.com> wrote:

Thanks Eliot! Excellent summary of the problem and the situation.

Your method #2 (adding a map-level @keyscope) is what I tried, but it doesn't seem to work in Oxygen. It doesn't form the cross-book links properly, and if I create them by hand, they don't resolve.

I will try method #1 tomorrow. However, I'm not sure how well it would work in practice, as the writers want to share entire chapters between books, with a variety of cross-references to both book within the chapters. Trying to create all the local/cross-book flavors of indirection links would be quite maddening!

The good news is, your reply and suggestion for method #2 gives me the confidence that this should be fixable in Oxygen. Thank you so much!

- Chris


Re: Word2Dita?

Chander Aima
 

Hi Nancy,
 
XMLmind offers a free online conversion service (DOCX to DITA) but this service is limited to 3 conversions per day and per IP address. 
 
You can find more info here:
https://www.xmlmind.com/w2x/docx_to_dita.html 
https://www.xmlmind.com/w2x/online_w2x.html#about
 
Regards,
Chander


Re: Anyone Using Elasticsearch to Index DITA Content?

despopoulos_chriss
 

This sounds totally cool.  At my current gig (Turbonomic) we're supporting an export to ElasticSearch data...  Basically an export to Kafka "documents"...  JSON objects.  You can read the JSON into ElasticSearch, and then do lots with it, including interesting analysis and visualizations.  This approach seems to be loose with the details of the JSON you send it.  So there seems to be leeway in what you do.

I would think that LwDITA would be easier to translate into JSON...  In fact, isn't there some thinking going on about making a JSON implementation of LwDITA? 

Our product does supply-chain analysis of entities to find the best provider of resources to each consumer, and to give advantage to consumers that in turn provide more value (resources) to the overall system.  It's designed to manage a network, but that's a matter of naming the entities and resources.  You could hijack the base model and overlay it on other domains (much like specialization works)...  To do that for a body of DITA you would need to convert the DITA to JSON.  Long-winded...  I've been thinking about doing this for a while.  Sadly, work in the software salt mines doesn't leave the time to get to it.  But some thoughts...

JSON to drive analysis probably should not try to replicate the full document, so much as replicate the structure.  If you want the analysis to map back to the actual content, use references to IDs. 

You probably don't need to replicate the full structure.  Depending on the analysis you want to perform, you can get away with dipping into the structure at different depths. 

Things you could do include:
  • Stats similar to the DITAMap Metrics Report, but maybe more powerful.
  • Change tracking and management -- A super-DIFF that can list where things have changed.  Mapping back to the content you could render change bars or some such.  So then you could have change tracking without adding cruft to the source.  (I might even use that...)
  • Pattern discovery -- This is what I want to play with.  For example, you could merge a search index with your structural representation in the JSON, and then look for elements with lexical similarities...  Elements that use the same words.  From there you could assemble a map of topics that answers just a specific question...  Personalized content.  You would have to start with analyzing the patterns that arise -- something ElasticSearch should be good at.

It never occurred to me to try this with ElasticSearch or similar...  It sounds like gangs of fun!

I have done some DITA to JSON, but only to turn a topic into an array for a walk-through tour.  This would be bigger...  I think the first step is to design what you want ES to analyze, and then decide how to get that out of the DITA.  I would start humbly, and try to grow on that.

cud

21 - 40 of 46295