This sounds totally cool. At my current gig (Turbonomic) we're supporting an export to ElasticSearch data... Basically an export to Kafka "documents"... JSON objects. You can read the JSON into ElasticSearch, and then do lots with it, including interesting analysis and visualizations. This approach seems to be loose with the details of the JSON you send it. So there seems to be leeway in what you do.
I would think that LwDITA would be easier to translate into JSON... In fact, isn't there some thinking going on about making a JSON implementation of LwDITA?
Our product does supply-chain analysis of entities to find the best provider of resources to each consumer, and to give advantage to consumers that in turn provide more value (resources) to the overall system. It's designed to manage a network, but that's a matter of naming the entities and resources. You could hijack the base model and overlay it on other domains (much like specialization works)... To do that for a body of DITA you would need to convert the DITA to JSON. Long-winded... I've been thinking about doing this for a while. Sadly, work in the software salt mines doesn't leave the time to get to it. But some thoughts...
JSON to drive analysis probably should not try to replicate the full document, so much as replicate the structure. If you want the analysis to map back to the actual content, use references to IDs.
You probably don't need to replicate the full structure. Depending on the analysis you want to perform, you can get away with dipping into the structure at different depths.
Things you could do include:
It never occurred to me to try this with ElasticSearch or similar... It sounds like gangs of fun!
I have done some DITA to JSON, but only to turn a topic into an array for a walk-through tour. This would be bigger... I think the first step is to design what you want ES to analyze, and then decide how to get that out of the DITA. I would start humbly, and try to grow on that.