Topics

paragraph usage

Mark Nazimova <mark_nazimova@...>
 

 

I'm new to DITA, and just beginning to convert some old documents.  I have several related questions about p.  Pardon me if the answers should be obvious.

 

·         When a context (e.g., result, dd, or li) can contain unmarked text or a p element, and one wants to include a single paragraph of verbiage, is the "best practice" to specify p or to leave it as unmarked text?

(My preference is always to specify p, since I think explicit markup makes for easier maintenance, easier reuse, and easier transformation.)

What do people usually choose to do, and what factors do they take into account?

 

·         The p element cannot be directly nested, which makes sense to me.  However, so far as I can tell from the 1.3 Language Reference, p can contain ul, and ul can contain p.  (I'm going to be referring to ul only, but my questions apply to other kinds of lists also, especially ol.)

According to DITA usage rules and design philosophy, is it invalid--or valid but poor design--to indirectly nest p’s this way?   

For the sake of simplicity, let's consider the body and p contexts for ul, and ignore other possible contexts.  When ul is introduced by some verbiage in a paragraph, I assume that the ul should always be embedded within that p element? 

In all other cases, should it be independent of the p element, and be embedded directly within body? 

 

Example 1, ul embedded within p, p continues past ul:

Benefits are determined by employment status. If your status is

  • full-time
  • three-quarter time
  • on leave

then you are eligible for tuition assistance for the following term.

 

Example 2, ul embedded within p, p ends with ul:

See your supervisor about any issue relating to:

  • Change in work schedule.
  • <li>Request for tuition reimbursement.

 

Example 3, ul not embedded within p:

I had a hard time thinking of a realistic example of a list not introduced by some verbiage, and everything I did think of was already accounted for by other structures, such as a simple list within a reference topic.  Which implies that ul will almost always be embedded in p or something related?

Don R. Day <dond@...>
 

--- In dita-users@..., "Mark Nazimova"
<mark_nazimova@i...> wrote:


I'm new to DITA, and just beginning to convert some old documents.
I have several related questions about p. Pardon me if the answers
should be obvious.



· When a context (e.g., result, dd, or li) can contain
unmarked text or a p element, and one wants to include a single
paragraph of verbiage, is the "best practice" to specify p or to
leave it as unmarked text?

(My preference is always to specify p, since I think explicit
markup makes for easier maintenance, easier reuse, and easier
transformation.)

What do people usually choose to do, and what factors do they take
into account?
I would agree with this best practice, Mark. The mixed content model
for these "base class" elements allows information architects to
model a specialization of that context to require block content only
(much like a DocBook listitem requires a paragraph element) or to
require text or phrase only (say for a dd specialized as a hover tip).

Even if a writer does use textual data before the first paragraph in
a list item, the processing intent (whether the current tools truly
achieve it or not) is to render that text as if it were a virtual
paragraph, in which the first paragraph of a block has no intrinsic
leading space, and subsequent paragraphs (non-first) initiate their
own leading space.



· The p element cannot be directly nested, which makes
sense to me. However, so far as I can tell from the 1.3 Language
Reference, p can contain ul, and ul can contain p. (I'm going to be
referring to ul only, but my questions apply to other kinds of lists
also, especially ol.)

According to DITA usage rules and design philosophy, is it invalid--
or valid but poor design--to indirectly nest p's this way?

DITA allows many seemingly loose things like indirectly nested
paragraphs for the sake of specialization. The design thought was not
so much "What can we limit and enforce in the basic, single-use DTD"
as "What can we allow in the basic DTD that might be of interest to a
specialization somewhere down in a hierarchy of specializations?"

DITA has a more subtle inherent design quirk owing to XML's DTD
rules, which cannot express SGML's inclusion and exclusion rules. In
SGML, you could control the policy that a footnote could not
contain another footnote nested in a paragraph, for example. Norm
Walsh pointed this out in an article he wrote for XML.com on
converting DTDs from SGML to XML. DITA has exactly that same issue,
and thus it has some number of instances where the best practice
is "just because you can do it doesn't mean you should."


For the sake of simplicity, let's consider the body and p contexts
for ul, and ignore other possible contexts. When ul is introduced by
some verbiage in a paragraph, I assume that the ul should always be
embedded within that p element?

In all other cases, should it be independent of the p element, and
be embedded directly within body?
Again, the base infotypes for DITA will seem surprisingly
inclusive. The requirements for a specialization will
determine how you restrict the content model rules.

For software documentation, it is not at all uncommon to find
constructs with this apparent structure:

<p>
The syntax for all frame relay sections, FRInterface, FRCircuit, and
FRCrtGroup (for devices that support frame relay), is:
<codeblock>
[<apiname>FRInterface</apiname><var>*</var>]
</codeblock>
where
<var>*</var> is the frame relay interface number.
</p>

This is used often enough that I suspect there is a design pattern
underlying the usage that could be expressed as a sort of parameter
list specialization. But my point is that the natural containing unit
for the complete expression IS the paragraph. Your example using the
embedded ul could not be expressed in HTML as an authoring language
because HTML does not allow such nesting. For authors, these natural
groupings are particularly useful within structured editors because
they can easily pick the element and delete the entire thing or move
it as a unit from one location to another.

As a best practice, I would let the scope of the expression or
thought be the rule for whether to include a list within a
paragraph. Your examples certainly seem to qualify for inclusion.

I'll also note that when a discourse moves from conventional
paragraphs to a structure such as this, there is often an implicit
sectional chunking of some sort going on in the writer's mind. The
example I cited above came from an API description following the
section heading "Syntax". DITA sections do not require titles
(unless you specialize the requirement!), so if there IS a degree of
separation from one of these sets to the next, perhaps the containing
markup for both p and ul could be an untitled section.

I can't make too specific a recommendation here... it depends on your
content and on the business rules you wish you could enforce on that
type of content (exept that with DITA specialization, you CAN).
I'd suggest that, having done that analysis, you
could devise a specialization of the infotype that will in fact
enforce how writers are allowed to organize this content.


Example 1, ul embedded within p, p continues past ul:

<p>Benefits are determined by employment status. If your status is

<ul><li>full-time</li>

<li>three-quarter time</li>

<li>on leave</li></ul>

then you are eligible for tuition assistance for the following
term.</p>
If I were authoring this in HTML, I would have to use two separate
paragraphs, which to me is both an unnatural structural break in the
discourse, but also three times as much work to select and move.
This looks like a safe, not-unexpected usage in DITA.


Example 2, ul embedded within p, p ends with ul:

<p>See your supervisor about any issue relating to:

<ul><li>Change in work schedule.</li>

<li>Vacation schedule.</li>

<li>Request for tuition reimbursement.</li></ul></p>
Again, I also see this as a single logical unit, so I see no problem
with your markup. Each list item individually completes the sentence
indicated by the parent paragraph.


Example 3, ul not embedded within p:

I had a hard time thinking of a realistic example of a list not
introduced by some verbiage, and everything I did think of was
already accounted for by other structures, such as a simple list
within a reference topic. Which implies that ul will almost always
be embedded in p or something related?

A good hypothesis, but not fully provable given the vagaries of
human communication! I wouldn't go so far as to
enshrine that rule as a general constraint. The %basic.block;
parameter entity in the topic.mod file establishes that
the typical sections of discourse may contain a number of things like
ul as a peer of p. This just sets up the possibility for many
subsequent kinds of specializations, say one that uses ONLY ul within
a section, much like a list of parameter values or files in a man
page description.


Regards,
--
Don Day
IBM Lead DITA Architect

Mark Nazimova <mark_nazimova@...>
 

Don,
 
Not to beat a dead horse, but I was hearing a mixed message in your response, and I'd like to clarify my understanding.  (Though I heard your primary message loud and clear, that "it depends on your content and on the business rules.")
 
 
 
   [MN 1]  When a context (e.g., result, dd, or li) can contain unmarked text or a p element, and one wants to include a single paragraph of verbiage, is the "best practice" to specify p or to leave it as unmarked text?  (My preference is always to specify p, since I think explicit markup makes for easier maintenance, easier reuse, and easier transformation.)
 
   [DD 1]  I would agree with this best practice, Mark.  The mixed content model for these "base class" elements allows information architects to model a specialization of that context to require block content only....
 
[MN 2]  Okay, so you're saying that it's reasonable to mark verbiage within a list item as a paragraph.  For example,
  • An include file contains information that you wish to include in multiple documents. The file is simply a FrameMaker file that you can import by reference.

  •  
     
     
       [MN 1]  When ul is introduced by some verbiage in a paragraph, I assume that the ul should always be embedded within that p element?
     
       [DD 1]  ... the natural containing unit for the complete expression IS the paragraph.  ...  As a best practice, I would let the scope of the expression or thought be the rule for whether to include a list within a paragraph. Your examples certainly seem to qualify for inclusion.
     
    [MN 2] And you're saying here that it's reasonable to include an unordered list within a paragraph if the list is part of the thought expressed by that paragraph.  For example,

    Benefits are determined by employment status. If your status is

    • full-time

    • three-quarter time

    • on leave
    • then you are eligible for tuition assistance for the following term.


     
     
     
       [MN 1]  p can contain ul, and ul can contain p. According to DITA usage rules and design philosophy, is it invalid--or valid but poor design--to indirectly nest p's this way?
     
       [DD 1]  DITA allows many seemingly loose things like indirectly nested paragraphs for the sake of specialization.  ...  [DITA] has some number of instances where the best practice is "just because you can do it doesn't mean you should."
     
    [MN 2]  I infer from this that it's poor practice to take advantage of the ability to indirectly nest paragraphs, but that would contradict your two previous statements.  Shall I assume--in keeping with your point "it depends on your content and on the business rules"--that  if our content model supports indirectly nesting paragraphs in the ways described above, it's valid to do so?
     
     
    Thanks,
    Mark Nazimova
     

    Don R. Day <dond@...>
     

    --- In dita-users@..., "Mark Nazimova"
    <mark_nazimova@i...> wrote:
    Don,

    Not to beat a dead horse, but I was hearing a mixed message in your
    response, and I'd like to clarify my understanding. (Though I heard
    your
    primary message loud and clear, that "it depends on your content and
    on the
    business rules.")
    One way to approach your questions is to look at the issues from a
    different perspective: imagine that you have already defined a
    particular specialization that requires block-like content for an
    element based on <li>. DITA allows you to generalize this specialized
    content back to the base topic infotype. Such content generalized
    from the restrictive specialized DTD would never have mixed content
    (text mingled with elements as content of the <li>). By reverse
    logic, if your desired specializations do not yet exist but you might
    eventually migrate in that direction, then your best practice for
    authors using the base DTDs should be to eschew any direct text or
    phrase content of an <li>.

    On the other hand, if your eventual specialization posits a structure
    in which the base <li> would never have a paragraph, you know what you
    need to tell your writers--eschew using paragraphs in a list item.

    Once you have your specializations, some of these best practice issues
    go away because the DTDs will enforce the intended requirements.

    [MN 1] When a context (e.g., result, dd, or li) can contain unmarked
    text or a p element, and one wants to include a single paragraph of
    verbiage, is the "best practice" to specify p or to leave it as unmarked
    text? (My preference is always to specify p, since I think explicit
    markup
    makes for easier maintenance, easier reuse, and easier transformation.)

    [DD 1] I would agree with this best practice, Mark. The mixed
    content
    model for these "base class" elements allows information architects
    to model
    a specialization of that context to require block content only....


    [MN 2] Okay, so you're saying that it's reasonable to mark verbiage
    within
    a list item as a paragraph. For example,
    <li><p>An include file contains information that you wish to include in
    multiple documents. The file is simply a FrameMaker file that you
    can import
    by reference.</p></li>
    In my most assertive voice, "Yeah, whatever." :-) Even though fully
    blocked content might be easier to maintain, DITA also permits you to
    specialize a list item's content to be more HTML-like, *if that is
    what your authors or information architectures mandate*. But yes, it
    is reasonable to recommend always using a paragraph in a list item. I
    won't get into whether this is always right--at the base level, DITA
    is agnostic by design about your intent.

    To argue by an analogous case, based on general principles of
    well-structured User Assistance, I would recommend always using a
    <shortdesc> as a virtual first paragraph in every topic, knowing that
    it will become a hover tip in any links to that topic in standard DITA
    processing. Yet I do not believe that <shortdesc> should be required
    in the base topic content model because I can imagine that someone
    might have non-UA specializations that can justifiably omit the
    element and forego the inherent processing benefits for UA. Knowing
    that <shortdesc> does have benefits for users of my UA topics created
    in DITA, I endorse the use of <shortdesc> as a best practice across
    the board for User Assistance. On the other hand, neither the user
    nor author receive processing benefit from proscribing the use of text
    in a list item UNLESS there is a design goal that makes such a best
    practice beneficial to the eventual up-migration of your topics to an
    intent-based specialization that enshrines that practice in a required
    content rule.

    [MN 1] When ul is introduced by some verbiage in a paragraph, I
    assume
    that the ul should always be embedded within that p element?

    [DD 1] ... the natural containing unit for the complete
    expression IS
    the paragraph. ... As a best practice, I would let the scope of the
    expression or thought be the rule for whether to include a list within a
    paragraph. Your examples certainly seem to qualify for inclusion.

    [MN 2] And you're saying here that it's reasonable to include an
    unordered
    list within a paragraph if the list is part of the thought expressed
    by that
    paragraph. For example,
    <p>Benefits are determined by employment status. If your status is
    <ul><li>full-time</li>
    <li>three-quarter time</li>
    <li>on leave</li></ul>
    then you are eligible for tuition assistance for the following term.</p>
    It's reasonable. As I mentioned before, HTML's rule for paragraph
    content is much less friendly for authoring this kind of material.
    The DITA transforms will handle turning the nested DITA authoring view
    into an appropriately serialized rendering view in HTML.

    [MN 1] p can contain ul, and ul can contain p. According to DITA
    usage
    rules and design philosophy, is it invalid--or valid but poor design--to
    indirectly nest p's this way?

    [DD 1] DITA allows many seemingly loose things like indirectly
    nested
    paragraphs for the sake of specialization. ... [DITA] has some
    number of
    instances where the best practice is "just because you can do it doesn't
    mean you should."

    [MN 2] I infer from this that it's poor practice to take advantage
    of the
    ability to indirectly nest paragraphs, but that would contradict
    your two
    previous statements. Shall I assume--in keeping with your point "it
    depends
    on your content and on the business rules"--that if our content model
    supports indirectly nesting paragraphs in the ways described above, it's
    valid to do so?
    I am certain that the loose content models of DITA's base infotypes
    contains many other such apparently illogical nestings of elements.
    The substitution mechanism used to introduce domain vocabulary back
    into content models will also turn up some interesting surprises from
    time to time. All I can say is that if the rules were more
    constrained, so too would be the expressiveness of DITA as an
    extensible architecture. If the DITA content models allow creating
    surprising content, then by definition it is valid XML, but some cases
    may run contrary to common sense or preferred practice. In reality,
    the only time I have seen DITA pushed to illogical limits is when
    using automated content generators that create XML instances according
    to all possible permutations allowed in a DTD. This is not how most
    writers create content, thank goodness.

    If someone puts a paragraph within a list item within a paragraph, I'm
    inclined to let it go. I might ask whether that nested list should
    have been a simple list, in which case it would not allow a paragraph
    as content. Or I might challenge the writer to analyze what the
    information is trying to model. Often, a structure such as this:

    sdfkjfd
    asldkfja
    asldkfd (maybe a paragraph)
    ...

    is really a case of this simpletable:

    sdfkjfd asldkfja
    ------- --------
    asldkfd
    (maybe a paragraph)
    ...

    or perhaps a definition list, or even a case of using a paragraph as
    if it were a logical section (a unit of discourse that supports a
    sub-argument of the topic). In this case, recasting the content into a
    section might resolve the illogical nesting unless the writer really
    has a bad habit of recursive writing. Time for some behavior
    modification?

    But if the nested structure truly represents a design pattern that is
    repeatedly appearing in your information, perhaps it should be
    analyzed as a possible domain specialization so that you can convert
    it from a seemingly illogical practice into just the right data
    structure that you need.

    Thanks,
    Mark Nazimova
    I'm sort of enjoying this, Mark. These are hard questions, and you
    might be frustrated that DITA has a somewhat more elusive quality than
    other DTDs you may have worked with. In practice, IBM is handling
    many thousands of DITA topics, some of which exhibit the very cases
    you've mentioned, and the company has not imploded yet. It's a
    characteristic of loosely-coupled applications that they interoperate
    with high tolerance, and perhaps that's what makes DITA so interesting
    to work with. I'd appreciate hearing thoughts from others about
    Mark's observations.

    --
    Don Day
    IBM Lead DITA Architect