DITA-OT pre-processing architecture

This topic describes the set of steps commonly known as the pre-processing stage of a DITA build. These steps typically run at the start of any build using the DITA-OT, regardless of the final output format.

Each step described corresponds to one Ant target in the build pipeline. The general Ant target "preprocess" will call all of the targets described here.

Generate lists (gen-list)

The gen-list step examines the input files and creates lists of topics, images, document properties, or other content. These lists are used by later steps in the pipeline. For example, one list includes all topics that make use of the conref attribute; only those files are processed during the conref stage of the build. This step is implemented in Ant and Java.

The result of this list is a set of several list files in the temporary directory, including dita.list and dita.xml.properties.

List file property List file List property Usage
canditopicsfile canditopics.list canditopicslist  
chunkedditamapfile chunkedditamap.list chunkedditamaplist  
chunkedtopicfile chunkedtopic.list chunkedtopiclist  
codereffile coderef.list codereflist topics with coderef
conreffile conref.list conreflist Documents that contains conref attribute that need to be resolved in preprocess.
conrefpushfile conrefpush.list conrefpushlist  
conreftargetsfile conreftargets.list conreftargetslist  
copytosourcefile copytosource.list copytosourcelist  
copytotarget2sourcemapfile copytotarget2sourcemap.list copytotarget2sourcemaplist  
flagimagefile flagimage.list flagimagelist  
fullditamapandtopicfile fullditamapandtopic.list fullditamapandtopiclist All of the ditamap and topic files that are referenced during the transformation. These may be referenced by href or conref attributes.
fullditamapfile fullditamap.list fullditamaplist All of the ditamap files in dita.list
fullditatopicfile fullditatopic.list fullditatopiclist All of the topic files in dita.list
hrefditatopicfile hrefditatopic.list hrefditatopiclist All of the topic files that are referenced with an href attribute
hreftargetsfile hreftargets.list hreftargetslist link targets
htmlfile html.list htmllist resource files
imagefile image.list imagelist Images files that are referenced in the content
keyfile key.list keylist List of keys. The format is:
keyname "=" link "(" source ")"
Both href and source URLs are relative to base directory.
keyreffile keyref.list keyreflist Topics and maps which have key references.
outditafilesfile outditafiles.list outditafileslist  
relflagimagefile relflagimage.list relflagimagelist  
resourceonlyfile resourceonly.list resourceonlylist  
skipchunkfile skipchunk.list skipchunklist  
subjectschemefile subjectscheme.list subjectschemelist  
subtargetsfile subtargets.list subtargetslist  
tempdirToinputmapdir.relative.value      
uplevels      
user.input.dir     Absolute input directory path
user.input.file.listfile     Input file list file
user.input.file     Input file path, relative to input directory

Debug and filter (debug-filter)

The debug-filter stage processes all referenced DITA content, and creates copies in a temporary directory for use during the remainder of the build. Several modifications are made during this process.

As the files are copied, the following modifications are made:

  • The files are filtered according to entries in any specified DITAVAL file.
  • Debug information is inserted into each element (using the xtrf and xtrc attributes). These values allow messages later in the build to reliably indicate the original source of the error — for example, a message may trace back to the fifth <ph> element in a specific source document. Without these attributes, that count may no longer be available due to filtering and other processing.
  • Adjust column names in tables to use a common naming scheme. This is done only to simplify later conref processing; for example, if a table row is pulled into another table, this ensures that a reference to "column 5 properties" will continue to work in the fifth column of the new table.

This step is implemented in Java.

Copy related files (copy-files)

The copy-files step copies related non-DITA resources to the output directory, such as HTML files referenced in a map or images referenced by DITAVAL files.

Conref push (conrefpush)

The conrefpush step resolves "conref push" references. The conref push feature was added in the DITA 1.2 specification, and the associated processing is available in DITA-OT version 1.5 and later. This step only processes documents that use conref push (or that are updated due to the push action). The step is implemented in Java.

Conref (conref)

The conref step resolves traditional conref attributes, processing only the documents that use the conref attribute. Each map or topic is processed with XSLT to resolve the attributes.

As part of the process, IDs within referenced content are changed as they are pulled into the new location. This is done in order to ensure that IDs within the original (referencing) topic remain unique.

If an element with an ID is pulled into a new context along with a cross reference that references the target, both the ID and the reference are updated so that they remain valid in the new location. For example, a referenced topic may include a section as in the following example.
<topic id="referenced_topic">
  <title>...</title>
  <body>
    <section id="sect"><title>Sample section</title>
      <p>Look at the next figure <xref href="#referenced_topic/fig">here</xref>.</p>
      <fig id="fig"><title>Sample</title>
        <p>This is a rather useless figure, but it
           illustrates a point.</p>
      </fig>
    </section>
  </body>
</topic>
If the section is referenced with a conref attribute, the ID on the <fig> element will be modified to ensure it remains unique inside the new topic. At the same time, the <xref> element will also be modified so that after the conref is resolved, it remains valid as a local reference. If the topic pulling in a new copy of the section has the id "new_topic", then the pulled copy of the section may look something like this in the intermediate document.
<section><title>Sample section</title>
  <p>Look at the next figure <xref href="#new_topic/d1e25">here</xref>.</p>
  <fig id="d1e25"><title>Sample</title>
    <p>This is a rather useless figure, but it
       illustrates a point.</p>
  </fig>
</section>

In this case, the ID of the figure has been changed to a generated value of "d1e25". At the same time, the <xref> element has been updated to use that new generated ID, so that the reference stays local in the updated topic.

Move metadata (move-meta-entries)

The move-meta-entries step pushes metadata back and forth between maps and topics. For example, index entries and copyrights in the map are pushed into affected topics, so that topics may be processed later in isolation while retaining all relevant metadata.

This step is implemented in Java.

Resolve keyref (keyref)

The keyref step examines all keys defined in the source material, and updates key references appropriately. Links that make use of keys are updated so that any href value is replaced by the appropriate target; key based text replacement is also evaluated. The keyref mechanism was defined as part of the DITA 1.2 standard, and is available in DITA-OT 1.5 and later.

This step is implemented in Java.

Resolve code references (codref)

The coderef module resolves references made with the <coderef> element, which was added in DITA 1.2. This module is available in DITA-OT 1.5 and later.

The <coderef> element is used inside of <codeblock> to reference code stored externally in non-XML documents. During the pre-process step, this Java module pulls the referenced content into the <codeblock> element.

Resolve map references (mapref)

The mapref module resolves references from one map to another.

Maps may reference other maps using markup similar to the following:
<topicref href="other.ditamap" format="ditamap"/>
The DITA 1.2 standard added a new element that allows this sort of reference without setting the format attribute:
<mapref href="other.ditamap"/>

In either case, the element that references the other map is replaced by the topic references from the other map. Relationship tables are pulled into the referencing map as a child of the root element (<map> or a specialization of <map>).

This step is implemented in XSLT.

Pull content into maps (mappull)

The mappull step pulls content from referenced topics into maps, and cascades data within maps.

This step uses XSLT to make the following changes to the map:
  • Pull titles from referenced DITA topics. This step replaces the navigation title specified on the topicref. If the locktitle attribute is set to "yes", the value in the map is not changed.
  • The <linktext> element is set based on the title of the referenced topic, unless it is already specified locally.
  • The <shortdesc> element is set based on the short description of the referenced topic, unless it is already specified locally.
  • When a local DITA topic is referenced, the type attribute is set on the topicref based on the type of topic referenced. For example, a reference to a task topic will end up with type="task".
  • Inheritable attributes, such as toc or print, are made explicit on child topicref elements. This allows any future step to work with the attributes directly, without reevaluating the cascade behavior.

Chunk topics (chunk)

The chunk step is a Java module that breaks apart and assembles referenced DITA content based on the chunk attribute in maps.

The following values are recognized on the chunk attribute, based on definitions provided in the DITA specification. These values were initially defined in the DITA 1.1 specification, with significant clarifications in the DITA 1.2 specification.
  • select-topic
  • select-document
  • select-branch
  • by-topic
  • by-document
  • to-content
  • to-navigation.

Pull content into topics (topicpull)

The topicpull module pulls content into <xref> and <link> elements (if needed).

For <xref> elements, if the <xref> does not contain link text, the target is examined and link text is pulled. For example, a reference to a topic will pull the title of the topic; a reference to a list item will pull the number of the item. If the <xref> element references a topic that has a short description, and the <xref> element does not already contain a child <desc> element, a <desc> element is created with the short description of the target.

The process is similar for <link> elements. If the <link> does not have a child <linktext> element, one is created with the appropriate link text. Similarly, if the <link> element does not have a child <desc> element, and the short description of the target can be determined, a <desc> is created with the short description of the target.

This step is implemented in XSLT.