Re: Anyone Using Elasticsearch to Index DITA Content? <ekimber@...>

I have created the GitHub project as a place to capture my experimentation with using Elasticsearch to store and query DITA content.


It’s super minimal at the moment—I’m mostly using it to drive my learning of Elasticsearch by giving me a data set I understand.


At the moment there’s just one very small XSLT transform intended to be run against the output of the built-in Open Toolkit normalization transform (transtype “dita”). The transform generates a JSON file for the input file where each element is represented by a separate JSON “document” (in Elasticsearch terms), where each document captures the XML and DITA details, as well as the parentage of each non-root element, and the element’s full text (including the text of any subelements—this lets you quickly find both leaf and ancestor elements that that contain given text {writing this just now I think I need to capture the children elements of each non-leaf element}).


This is all very much “as is” and I make no guarantee I’ll do anything more with it after next week but it’s there and maybe it will be useful.







Eliot Kimber




From: <> on behalf of "ekimber@..." <ekimber@...>
Reply-To: <>
Date: Thursday, June 10, 2021 at 4:46 PM
To: <>
Subject: Re: [dita-users] Anyone Using Elasticsearch to Index DITA Content?


That sounds very interesting. I'll take a look at that docs-bulk.html link.







Eliot Kimber



On 6/10/21, 10:44 AM, "Toshihiko Makita" < on behalf of tmakita@...> wrote:


    I have experienced to develop DITA full test search pilot project last year via AWS Elasticsearch before the conflict between AWS and

    This search is integrated into the DITA to HTML (or .php) publishing result. Following are several things I have done:


    * Use "curl" (or "awscurl") to generate index in AWS Elastic search.

    * Convert DITA map & topics into JSON and execute "bulk" operation.

    * Develop PHP program that accepts search request from client browser and return the search result from Elasticsearch as JSON.

    * The JSON search results are edited by JavaScript and displayed in the browser.

    * By clicking the search result, a user can reach the target  Web page.


    It was very exciting experience because I must learn about AWS operations and develop PHP and JavaScript (TypeScript) programs which I haven't ever knew.

    Unfortunately it is still pilot project. However it will be integrated into user Web publishing system in the feature.




     Toshihiko Makita

     Development Group. Antenna House, Inc. Ina Branch

     Web site:















Join to automatically receive all group messages.