The way most (automated) indexing tools work, you are right: indexing done in that way just repeats the words from titles or body text and presents not much added value over a well-organised TOC. If indexing is done right, it requires an indexer who understands the subject matter and the possible target audiences and adds index words that are NOT in the running text - synonyms in particular contexts (which are almost impossible to inject using automated tools). This also means that indexes cannot simply be translated as each language group will have different synonyms and idioms that the indexing needs to handle.

Sorting a translated index is the smallest of your problems with indexing.

Smart Information Design
Amsterdam, Netherlands
Cell: +31 646 854 996

On 30 Aug 2021, 08:45 +0200, teamwis <dfanster@...>, wrote:
In real-world life, I doubt how passionate people are about embracing
a so-called sorting or indexing in PDF. To be honest, at Sony
Ericsson, Antenna House I18N utility was implemented in the DITA OT
toolchain from within SDL Trisoft or Tridon Doc, but nobody seemed to
have known what value indexing brings. By the way, at that time Sony
Ericsson translated into up to 56 different languages, including
Japanese, Korean, Thai, and both Traditional and Simplified Chinese.

In other words, even if with AHF, you probably end up doing something
that adds no value. Well, in a regulated industry as Varian is in, it
is really uncommon a regulation, if any will require a PDF indexing
for a specific language.


On 8/28/21, ekimber@... <ekimber@...> wrote:
The DITA Community project is (was) my
attempt to implement general I18N support for Open Toolkit, including both
locale-aware sorting and grouping as well as other locale-specific features
(such as line breaking and word detection).

It also includes an open-source Simplified Chinese dictionary-based
collator, which offers an open-source alternative to Antenna House’s
licensed Simplified Chinese collator (definitely buy theirs if you have the

Unfortunately, life took some turns and I haven’t been able to maintain this
project the last several years, so it might need a little attention. The
main issue I was running into (and that *should* be solved) is the automatic
registration of Java extension functions to Saxon through Open Toolkit. It
was one of those things that worked in my local development environment but
then wasn’t working in other environments but then I had to put it down and
never came back to it.

If it’s not working it shouldn’t take much to make work, just somebody who
can attend to the Java-and-OT details. Otherwise the processing should all
be solid as far as the grouping and sorting and other ICU4J-based stuff




Eliot Kimber

Hi there,

We are currently adding Indonesian, Korean and Vietnamese as new languages
to our CMS and I was wondering if anybody has some experience related to the
index sorting and if the default configuration files in the i18 plugin of
the DITA-OT work fine in general for these languages or if a customization
is required?


Keep an Exacting Eye for Detail

Join to automatically receive all group messages.