Re: DITA and Git: Minimizing merge conflicts #version-control #change-management

Jean-Noël AVILA

On 18/12/2019, Kristen James Eberlein wrote:

This might be slightly off-topic, but certainly Git is version control for many companies' DITA implementations.

Short version: What tactic have you found best to minimize merge conflicts for new technical writers working in Git?

Gruesome detail: I am working with a client who stores their DITA content in GitLab. They are using SourceTree as a client. Before I was brought on-board, there was turnover and the team reduced to a manager and two technical writers, all brand-new to DITA, Git, and version control. Infrastructure engineers set up an automated build for them and recommended that they use the following practices:

  • A development branch that is merged into master every two weeks when new content is delivered to the public-facing Web site.
  • "Feature" branches for technical writers to use to work on new content that might span several development cycles. These branches are created from the most recent development branch.

These feature branches are problematic. Often by the time that the technical writer wants to merge them into the current development branch, there are many merge conflicts. Or the technical writer wants to selectively add in some (but not all) changes that have been made in the feature branch.

We need a bit more information on the workflow here. If changes happen only in a directory dedicated to documentation where only writers add content, that means that the changes introduced over time are touching the same parts of certain files. There are several causes and ways to deal with that:

  • The tools to edit the files may be upsetting the whole files, even when changing a small chunk of the file. Git is a version control system which is mainly line based. So, it is very important that the xml be formatted with a semantic setup matching as much as possible a line by line setup and that editing tools just change the minimum number of lines. Please also note, that indentation can be a big cause of conflicts if writers don't follow exactly the same formating rules. Diffing the changes before committing should prevent big changes to inadvertantly land into the main repo.
  • Writers have to introduce changes to the same files for small changes. Git also has a strong opinion on files as units of content. The solution is more a matter of dispatching information in different files to minimize common overlapping changes (architecture of repo). Don't be afraid to multiply the number of files instead of cramming as much as possible in one big container file. The one big container's approach with XML files can lead to big issues because it can make Git make merge errors by tricking it with similar context lines: repeated structures of lines in the file may confuse the merge algorithm.
  • The same way files are unit of data, branches must be unit of changes. Branches can only be merged at once. So the rule is : one branch, one feature. Note that workflows with feature branches require that files are really split into the correct granularity of information, so that different branches don't introduce changes into the same files.

I'm trying to think of processes this team can use that will minimize merge conflicts. Here are some of the approaches I've considered:

  1. Update the feature branch frequently by pulling the current development branch into the feature branch.

This is recommended practice.

  1. Leave the feature branch alone. Do not update it by pulling the content from the the current development branch into the feature branch. When it comes time to merge the feature branch into the development branch:
    1. Try to do it automatically.
    2. If there are significant merge conflicts, build a "patch of modified files" by diffing 1) the feature branch as it was first created, and 2) the feature branch at its current point in time. Apply the patch to the current development branch.

This can be done by pulling the development branch into the feature branch first, solve conflicts then merge back feature branch into develop branch.

Have I missed something? Any advice? I have not been able to think of any non-manual solutions to the "I want to merge some but not all work from my feature branch in the development branch. And a CCMS is not an option.


Kristen James Eberlein
Chair, OASIS DITA Technical Committee
Principal consultant, Eberlein Consulting
+1 919 622-1501; kriseberlein (skype)

Join to automatically receive all group messages.