Word2Dita?


Nancy Roberts
 

Hi all,

For the first time since 2009, I need to convert a Word doc to DITA that is too big to be done manually. What is the Word2Dita plugin? What does it plug in to? The OT? My company uses the Ixiasoft CMS, if that makes a difference. Any advice is much appreciated.

Thanks,
Nancy


ekimber@contrext.com
 

The Word2DITA framework is a general Word-to-DITA conversion framework. It does not require DITA Open Toolkit in any way but because it is often used with DITA content it is packaged as a DITA Open Toolkit plug-in for convenience and you can run it as an OT plugin if it's useful to do so, although most people use it standalone I think.

Word2DITA is part of the larger DITA4Publishers project (dita4publishers.org) but to get the latest version you should go to the Word2DITA project directly:

https://github.com/dita4publishers/org.dita4publishers.word2dita/releases

The documentation for the Word2DITA framework is here:

http://www.dita4publishers.org/d4p-users-guide/user_docs/d4p-users-guide/word2dita/word2dita-intro.html

I do maintain the Word2DITA code (you can see there are some recent bug fixes) but I haven't had the bandwidth to package it as cleanly as I've wanted to, which is why it's a bit scattered about.

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 6/9/21, 1:38 PM, "Nancy Roberts" <main@dita-users.groups.io on behalf of nanr93@gmail.com> wrote:

Hi all,

For the first time since 2009, I need to convert a Word doc to DITA that is too big to be done manually. What is the Word2Dita plugin? What does it plug in to? The OT? My company uses the Ixiasoft CMS, if that makes a difference. Any advice is much appreciated.

Thanks,
Nancy


Nancy Roberts
 

Thank you, Eliot. Can I download and install Word2Dita by itself, or do I need the full dita4publishers project?

Also, once I'm done with the conversion, I want to import the DITA output into the Ixiasoft CMS. I assume there's no conflict there?

Thanks,
Nancy


ekimber@contrext.com
 

Yes, you can use the Word2DITA package by itself--it does not require any other part of the DITA4Publishers project.

It will require some setup to define the mapping from your Word documents to the DITA you want but it is definitely capable of generating DITA with whatever aspects you need and that should definitely be importable to Ixiasoft without issue. For example, you can create both maps and topics from the same set of Word paragraphs, including submaps, as long as you have appropriate paragraph styles (i.e., Heading 1, Heading 2, etc.).

Word2DITA requires a little familiarity with typical XML tools and, often, a little light XSLT work to get going but you can, for example, run it directly on the files inside a DOCX file from OxygenXML using the Archive Viewer and transformation configuration (which is how I almost always run it).

I tried to document Word2DITA well as I could in the time I had but it's not always easy to get going. Feel free to ask questions here or reach out to me directly if you have any difficulty getting it set up.

Beyond simply getting the transform running, the two main challenges are working out how the styles you have in your Word documents (or possibly don't yet have) should map to DITA structures and then expressing that as a Word2DITA style-to-tag map and then, if necessary, doing any post-processing cleanup on the mapping-generated DITA using XSLT through the Word2DITA "final fixup" extension point. Cleanup is usually pretty straightforward XSLT but it's still XSLT....

If your Word documents are consistently styled, even if it's with the built-in generic styles, you should be able to get a pretty good result.

If your Word documents are not consistently styled then it will be a bigger challenge--you will either need to do some cleanup on the Word docs to add the styles or make the styling more consistent (which you can often do with Word's advanced search and replace features) or put more work into Word2DITA preprocessing or post processing extensions. If your Word docs are just not consistent enough you may be better served by one of the commercial conversion services or Word conversion tools.

Word2DITA was designed specifically for environments where there is good control over the Word styling, such as in Publishing workflows where an Editor can ensure that manuscripts are styled appropriately before being converted to DITA. It is not intended to be (and will not work as) an "any Word to good DITA" transform.

That said...

Word2DITA will produce *parseable* DITA topics from any Word document *as long as* the first non-skipped paragraph in the document is mapped to a topic because any non-mapped paragraph in the Word document is automatically mapped to <p>. The result will probably not be that useful but is a result...

Cheers,

E.
--
Eliot Kimber
http://contrext.com


On 6/9/21, 1:54 PM, "Nancy Roberts" <main@dita-users.groups.io on behalf of nanr93@gmail.com> wrote:

Thank you, Eliot. Can I download and install Word2Dita by itself, or do I need the full dita4publishers project?

Also, once I'm done with the conversion, I want to import the DITA output into the Ixiasoft CMS. I assume there's no conflict there?

Thanks,
Nancy


Nancy Roberts
 

Thanks! All we really need is DITA content that is clean enough for a reasonable import into Ixiasoft. None of us know XSLT, but we don't mind doing a little cleanup once we're in Ixia. It's an 85-pager that's almost all tables. If Word2Dita can get us cleanish tables in DITA, we're good. I'll definitely take you up on your kind offer if we run into any bumps.

Best,
Nancy


ekimber@contrext.com
 

If it's just one doc just contact me offline and I'll convert it for you.

Word2DITA does really well with tables except for one known issue related to a particular row and column spanning use case.

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 6/9/21, 4:36 PM, "Nancy Roberts" <main@dita-users.groups.io on behalf of nanr93@gmail.com> wrote:

Thanks! All we really need is DITA content that is clean enough for a reasonable import into Ixiasoft. None of us know XSLT, but we don't mind doing a little cleanup once we're in Ixia. It's an 85-pager that's almost all tables. If Word2Dita can get us cleanish tables in DITA, we're good. I'll definitely take you up on your kind offer if we run into any bumps.

Best,
Nancy


Chris Brand
 

Did you try copy/paste the Word content (maybe chapter wise) into Oxygen? This way, tables, lists, etc. convert rather nicely to DITA elements. If it's just one doc and not x-numbers, I would do it manually. The effort with 85 pages cannot be that huge.

 

Also, there's a Word to Dita conversion function in Oxygen itself you might wanna try:

 

 

 

Greez,

Chris.

 

Von: main@dita-users.groups.io <main@dita-users.groups.io> Im Auftrag von Nancy Roberts
Gesendet: Mittwoch, 9. Juni 2021 23:37
An: main@dita-users.groups.io
Betreff: Re: [dita-users] Word2Dita?

 

Thanks! All we really need is DITA content that is clean enough for a reasonable import into Ixiasoft. None of us know XSLT, but we don't mind doing a little cleanup once we're in Ixia. It's an 85-pager that's almost all tables. If Word2Dita can get us cleanish tables in DITA, we're good. I'll definitely take you up on your kind offer if we run into any bumps.

Best,
Nancy


Chander Aima
 

Hi Nancy,
 
XMLmind offers a free online conversion service (DOCX to DITA) but this service is limited to 3 conversions per day and per IP address. 
 
You can find more info here:
https://www.xmlmind.com/w2x/docx_to_dita.html 
https://www.xmlmind.com/w2x/online_w2x.html#about
 
Regards,
Chander


teamwis
 

To the best of my knowledge, you might want to have Stilo
International do the work for you because Ixiasoft has acquired Stilo
early on, your company should have no problem using Stilo to do the
conversion, depending on your budgets. Go to
https://www.stilo.com/migrate-dita/ for more info.

Cheers
Ray

On 6/10/21, Nancy Roberts <nanr93@gmail.com> wrote:
Hi all,

For the first time since 2009, I need to convert a Word doc to DITA that is
too big to be done manually. What is the Word2Dita plugin? What does it plug
in to? The OT? My company uses the Ixiasoft CMS, if that makes a difference.
Any advice is much appreciated.

Thanks,
Nancy





--
Keep an Exacting Eye for Detail


ekimber@contrext.com
 

I concur--if you have easy access to Stilo it might be the fastest route to DITA from your Word doc.

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 6/10/21, 7:49 PM, "teamwis" <main@dita-users.groups.io on behalf of dfanster@gmail.com> wrote:

To the best of my knowledge, you might want to have Stilo
International do the work for you because Ixiasoft has acquired Stilo
early on, your company should have no problem using Stilo to do the
conversion, depending on your budgets. Go to
https://www.stilo.com/migrate-dita/ for more info.

Cheers
Ray

On 6/10/21, Nancy Roberts <nanr93@gmail.com> wrote:
> Hi all,
>
> For the first time since 2009, I need to convert a Word doc to DITA that is
> too big to be done manually. What is the Word2Dita plugin? What does it plug
> in to? The OT? My company uses the Ixiasoft CMS, if that makes a difference.
> Any advice is much appreciated.
>
> Thanks,
> Nancy
>
>
>
>
>
>


--
Keep an Exacting Eye for Detail


Leigh White
 

Hi all...I'd like to clarify that IXIASOFT has acquired Stilo's AuthorBridge product, but we have not acquired Stilo the company. In case anyone was curious. :-) https://www.ixiasoft.com/ixiasoft-announces-authorbridge-acquisition/.

Best,
Leigh


Nancy Roberts
 

Thank you so much for your advice, everyone. Sorry for the late reply to your kind offers.

Eliot, our Legal department has a couple of questions before they greenlight using Word2Dita:
1. They need to see some sort of license. I assume the Dita4Publishers license covers Word2Dita? 
2. In this link, which libraries are relevant? https://www.dita4publishers.org/d4p-users-guide/user_docs/d4p-users-guide/html5-plugin/general/licence.html 

Thanks,
Nancy


ekimber@contrext.com
 

All of DITA for Publishers uses the same Apache 2 license as for DITA Open Toolkit. If that’s not clear in the Word2DITA project I’ll see what I can do.

 

The D4P HTML5 transform is obsolete and it’s dependencies only apply to it *as deployed* and have nothing whatsoever to do with Word2DITA.

 

That is, even if you use the D4P HTML5 transform to generate HTML, the JavaScript libraries are only used in the deployed content, not in the generation process.

 

If you are using the D4P HTML5 transform I strongly encourage you to work to replace it with something new. The latest Open Toolkit versions of the HTML5 transform do as much as or more than the D4P HTML5 transform and the technology for deploying responsive modern web sites has evolved dramatically since we last updated the D4P HTML5 transform.

 

You can contact me directly (ekimber@...) if you have more detailed questions about the legal details of using Word2DITA.

 

Cheers,

 

E.

 

--

Eliot Kimber

http://contrext.com

 

 

 

Thank you so much for your advice, everyone. Sorry for the late reply to your kind offers.

Eliot, our Legal department has a couple of questions before they greenlight using Word2Dita:
1. They need to see some sort of license. I assume the Dita4Publishers license covers Word2Dita? 
2. In this link, which libraries are relevant? https://www.dita4publishers.org/d4p-users-guide/user_docs/d4p-users-guide/html5-plugin/general/licence.html 

Thanks,
Nancy