How correct is your Simplified Chinese Index? #DITA-OT #PDF
Hi List,
If you are publishing DITA documents that contains <indexterm>, you may make index pages by specifying backmatter/booklists/indexlist. Also if your publication contains Simplified Chinese localization, the indexlist will be generated by sorting <indexterm> using following sort keys:
pinyin-reading/strokes/radical/GB0 code
However there is very headache problem in making index pages. A Hanzi (Chinese Character) has sometimes plural pinyin readings and only the most frequently used pinyin reading is adopted for sorting/grouping <indexterm>.
For instance:
1. should be grouped into "T" and 2. should grouped into "D" according to its readings.
Here is sample PDF result generated by PDF2 plug-in (DITA-OT 3.5.1). This is well-known problem and is not avoidable as long as the index-sorting program uses ICU (http://userguide.icu-project.org/collation) or Java collator directly. (Both collator may uses pinyin-reading defined in Unihan database) We (Antenna House) has been working on this problem and developed new dictionary based index sorting in I18N Index Library (https://www.antennahouse.com/i18n-index-library). This is still under the development but we can generate correct results for above example. (Outputted via PDF5-ML plug-in https://github.com/AntennaHouse/pdf5-ml) The dictionary based index-sorting outputs the following log: [xslt] [readKeyFile][DEBUG] Unihan database entry=41377
[xslt] [readDictionaryFile][DEBUG] Dictionary entry=189082
[xslt] [readDictionaryFile][DEBUG] User dictionary entry=5
[xslt] [getKey][DEBUG] Processing indexterm=调速系统
[xslt] [processHanziKey][DEBUG] Got pinyin from dictionary! word=调速 pinyin=tiao2 su4
[xslt] [processHanziKey][DEBUG] Got pinyin from dictionary! word=系统 pinyin=xi4 tong3
[xslt] [getKey][DEBUG] Processing indexterm=调查结果
[xslt] [processHanziKey][DEBUG] Got pinyin from dictionary! word=调查结果 pinyin=diao4 cha2 jie2 guo3
It shows that dictionary based method is useful for generating Simplified Chinese index pages. We hope to refine this library function more accurate to automatically generate index pages. If you have any interest about this library, could you offer your Simplified Chinese DITA publication data for estimation?
Hope this helps your DITA publishing. Regards, -- /*----------------------------------------------------------------------- Toshihiko Makita Development Group. Antenna House, Inc. Ina Branch E-Mail tmakita@... Web site: http://www.antenna.co.jp/ http://www.antennahouse.com/ ------------------------------------------------------------------------*/
|
|