DITA and Git: Minimizing merge conflicts #version-control #change-management
This might be slightly off-topic, but certainly Git is version control for many companies' DITA implementations. Short version: What tactic have you found best to
minimize merge conflicts for new technical writers working in Git? Gruesome detail: I am working with a client who stores their DITA content in GitLab. They are using SourceTree as a client. Before I was brought on-board, there was turnover and the team reduced to a manager and two technical writers, all brand-new to DITA, Git, and version control. Infrastructure engineers set up an automated build for them and recommended that they use the following practices:
These feature branches are problematic. Often by the time that the technical writer wants to merge them into the current development branch, there are many merge conflicts. Or the technical writer wants to selectively add in some (but not all) changes that have been made in the feature branch. I'm trying to think of processes this team can use that will minimize merge conflicts. Here are some of the approaches I've considered:
--
Best, Kris Kristen James Eberlein Chair, OASIS DITA Technical Committee Principal consultant, Eberlein Consulting www.eberleinconsulting.com +1 919 622-1501; kriseberlein (skype)
|
|
despopoulos_chriss
We use GIT in a team of tech writers. We do a few things...
I'll point out that we do docs as code, in that we deliver our source as part of the product. That means we not only have to merge into the product repository, but we have to cherry-pick into different branches of the product repository. So far so good! Hope this helps!
|
|
Hi Kris,
Could you please let us know why the technical writers are supposed to do check-ins on a developer/feature branch? This sounds to me like a very risky situation in the first place. I am not familiar with Git though, but would it be possible to set up a folder/repository for the writers where they can write independent from the developer/feature branches? And maybe the developer branches need to make references to this folder/repository to avoid these conflicts? I've found something about using Git submodules in such a case: https://stackoverflow.com/questions/36554810/how-to-link-folder-from-a-git-repo-to-another-repo In Subversion, our tech writers have their own folder, but they work in same repository as the developers. So we can at least use svn:externals. Hope this helps somehow, Christina
|
|
despopoulos_chriss
Christina, even if you have a separate repository for your documentation source, GIT divides it into a main branch, a develop branch, and then practice is to use various feature branches for your personal sandboxes. Often (as is the case in our environment) you never see or know about main.
|
|
despopoulos_chriss
One thing I didn't address is merging a SUBSET of your feature branch into develop... Wow... That never even occurred to me. I guess I would handle that by controlling my commits to my local version of the feature branch, and only push out committed changes to the remote version of the feature branch. And then I would merge from the remote version, not the local. Caveat here... Our environment is set up with GitLab to handle merges, and it requires (or maybe just REALLY encourages) that you merge from a remote branch. So that kind of split is possible.
But I don't recommend it, for a number of reasons:
|
|
Alastair Dent
Using Git requires a change in thinking. All authors need to abandon the thoughts of 'check out', 'check in', 'lock'. Those don't really apply to git. New concepts that are absolutely essential to learn are 'remote', 'origin', 'branch'. When an author pulls a repository ( pulling a copy of the main repository), they are creating a local repository. Their work will take place in this repository. Usually, good practice is to create a local branch for new work. *Always* update your local repository before starting a new piece of work (fetch and pull the latest updates from the origin). When working on DITA files, the files are numerous and small. It is unusual for two authors to both edit the same file at the same time. Frequent updates and pushes are the way to work. This is no different to programmers working on code.
|
|
Jean-Noël AVILA
On 18/12/2019, Kristen James Eberlein
wrote:
We need a bit more information on the workflow here. If changes happen only in a directory dedicated to documentation where only writers add content, that means that the changes introduced over time are touching the same parts of certain files. There are several causes and ways to deal with that:
This is recommended practice.
This can be done by pulling the development branch into the
feature branch first, solve conflicts then merge back feature
branch into develop branch.
|
|
Thank you, Chris, Alastair, and Jean-Noël for your feedback. I do want to set some additional context here:
That said, version control with Git can have a steep ramp-up for technical writers. Any thoughts about how to best minimize that? Useful resources? And how to best plan a process for the scenario in which a writer wants to merge some (but not all content) from a feature branch into the development branch? In the absence of granular commits with descriptive commit messages, cherry picking is a not a realistic option. So far I've handled this manually, by asking the writer for names
of the files that she wanted merged; merging them into a local
copy of the development branch; resolving merge conflicts and QA
errors (errors in DITA source, spelling); and eventually merging
into the development branch. Best, Kris Kristen James Eberlein Chair, OASIS DITA Technical Committee Principal consultant, Eberlein Consulting www.eberleinconsulting.com +1 919 622-1501; kriseberlein (skype)
On 12/18/2019 6:13 AM, Alastair Dent
wrote:
|
|
ekimber@contrext.com
For an Oxygen-using client that uses git on the back end I implemented a custom set of actions using the Oxygen git plugin as my code base that automates all the git actions for users, including creating feature branches, checking for updates, letting users know that their local branch is out of date, and so on.
toggle quoted messageShow quoted text
This was not a trivial effort but it wasn't that hard either. Unfortunately this was a work for hire so I can't share the code, but it was mostly an exercise in figuring out how to automate the git actions and do the necessary error handling, as well as my personal struggle with how to do UI programming in Java and Oxygen (not something I had done much of to that point). But looking Kris' initial question, I think the key is the amount of time that feature branches go without being updated to reflect the develop branch--doing that frequently will avoid most merge conflicts and limit the ones that do occur to hopefully clear reasons. As long as authors are not simultaneously working on the same topics or maps or inadvertently modifying files they shouldn't (for example, by reformatting the markup in a file that hasn't had its content otherwise changed) there should not be many merge conflicts. But ensuring this requires communication among the team members. Cheers, E. -- Eliot Kimber http://contrext.com On 12/18/19, 6:52 AM, "Kristen James Eberlein" <dita-users@groups.io on behalf of kris@eberleinconsulting.com> wrote: Thank you, Chris, Alastair, and Jean-Noël for your feedback. I do want to set some additional context here: * I am familiar with and comfortable with Git, although it is new to the technical writers on this project. (And perhaps frightening to the manager.) I started using Git when the DITA-OT project moved to it from SVN, and I'm still grateful for all the help that Jarno Elovirta, Robert Anderson, and Roger Sheen gave me as I learned Git. * Yes, good practices for Git are just standard, good software development practices: Keep feature branches discrete; update feature branches by pulling from the current development repository frequently; commit thoughtfully and frequently with good commit messages, so that cherry picking is a possibility. * For this project, the technical writers are using oXygen, and we can control formatting through common project files to reduce conflicts. That said, version control with Git can have a steep ramp-up for technical writers. Any thoughts about how to best minimize that? Useful resources? And how to best plan a process for the scenario in which a writer wants to merge some (but not all content) from a feature branch into the development branch? In the absence of granular commits with descriptive commit messages, cherry picking is a not a realistic option. So far I've handled this manually, by asking the writer for names of the files that she wanted merged; merging them into a local copy of the development branch; resolving merge conflicts and QA errors (errors in DITA source, spelling); and eventually merging into the development branch. Best, Kris Kristen James Eberlein Chair, OASIS DITA Technical Committee Principal consultant, Eberlein Consulting www.eberleinconsulting.com <http://www.eberleinconsulting.com> +1 919 622-1501; kriseberlein (skype)
On 12/18/2019 6:13 AM, Alastair Dent
wrote: Using Git requires a change in thinking. All authors need to abandon the thoughts of 'check out', 'check in', 'lock'. Those don't really apply to git. New concepts that are absolutely essential to learn are 'remote', 'origin', 'branch'. When an author pulls a repository ( pulling a copy of the main repository), they are creating a local repository. Their work will take place in this repository. Usually, good practice is to create a local branch for new work. *Always* update your local repository before starting a new piece of work (fetch and pull the latest updates from the origin). When working on DITA files, the files are numerous and small. It is unusual for two authors to both edit the same file at the same time. Frequent updates and pushes are the way to work. This is no different to programmers working on code.
|
|
despopoulos_chriss
Regarding this:
And how to best plan a process for the scenario in which a writer wants to merge some (but not all content) from a feature branch into the development branch? In the absence of granular commits with descriptive commit messages, cherry picking is a not a realistic option. =========== My first instinct is to say, just don't. Ok, that's not an answer. My second instinct is to slap your wrist for the lack of granular commits and descriptive messages. I do consider that a viable answer. Have a meeting and instill good habits! (if you're worried about merging too many commits into develop at a go, look up squash commit.) The practice of creating a local feature branch, and then pushing that feature branch out to origin should be your starting point. Now you have two branches... One that's local and one that's remote. As you work on the local, when you reach a milestone you can add/commit the affected files, and then push them out to the remote. Note that push will only push out committed files. So you can have a number of files that you have changed on local, plus a number of other files you have changed AND committed on local. When you push, only the latter set will go out to remote. At any time, you can merge your remote into develop, because it only has finished files. So you can use this as a way to stage your work in a way that, WITH PROPER PLANNING AND GOOD HABITS, you can incrementally merge your work into develop. There is a fly in this ointment. In all cases, you MUST REGULARLY (at least once a day, plus just before you plan any merge into develop) check out develop, pull, check out your feature branch, then merge develop... And then push the merged-into feature branch out to remote feature. Only then will the remote of your feature branch be in sync with ongoing changes in develop. Here's the problem... If you have un-committed changes in your local feature branch, git will not allow a merge from develop (for obvious reasons). All is not lost... Git has a workaround. Before you do the merge, execute git stash. This hides all your un-committed changes. Then you can merge develop into your feature branch, push, and then execute git stash pop to reveal your hidden files. This almost covers all your bases. But still a fly in the ointment... Stash will not hide a new file. If you have added a file to your local, and you have not committed that change, stash will not hide it. So... I believe that in nearly all reasonable cases, this can handle your situation. You need:
I'll say that we work this way all the time. I was first introduced to Git in this project -- we switched from svn. We have had people on the project with varying git experience... Some people with zero experience, and not much command-line confidence either. Neither fire nor poisonous toads have rained from the sky. Keep your eyes open, and think ahead. And yes, you might need to bend some ways of thinking or doing if you want to get the most out of this tool. That's how technology works...
|
|
I really appreciate everyone's feedback and advice; it reminds me what a wonderful community we have here! Chris, thanks especially for your most recent e-mail.
-- Best, Kris Kristen James Eberlein Chair, OASIS DITA Technical Committee Principal consultant, Eberlein Consulting LLC www.eberleinconsulting.com +1 919 622-1501; kriseberlein (skype)
|
|
despopoulos_chriss
On Thu, Dec 19, 2019 at 05:03 AM, Kristen James Eberlein wrote:
The technical writer works hard on improving the content. But approval for deploying the content falls to product management. When they do review the content, they want to tweak words, and so postpone deploying the content for a cycle. Next cycle, they want to cherry pick what is deployed: "I'm good with this; it can go. No, this needs more work; it should wait for another release."Yikes... What a PM nightmare. Why is PM always so difficult to work with? What you're trying to do is apply a tool that counts on technical discipline to a workflow that counts on social lack of discipline. The first thing I would do as a manager is lay out the manual process that is required to support this lack of discipline, and assign a cost to it. Nothing is free. If they want to insist on their work flow, then they need to assume the cost in both budget and schedule. You could expect a combination of them increasing their discipline, and build managers coming up with ways to more easily accommodate their requirements. I know that doesn't specifically answer your question, but I can't help myself. Using local/remote versions of the feature branch, you could just hold off on commits until you have the OK from PM. That plus stash will work up until you need to add new files. Another thing that might help could be smaller feature branches. How is it possible, for a given "feature", that some changes are good to go and others are not? Are these all changes about the same thing? In what world is it ok to correct half of a feature, and leave the other half incorrect? I would explore whether their concept of a "feature" is the same as yours, and see if you can adjust the chunking of your feature branches. Not coming up with much more... You need to brainstorm on this. But the key feature I see here is that it is not strictly a technical problem. I think there's a social aspect as well. I'm a naturally obnoxious person, and so I would shine some light on the social problems. Not clear that you have that option...
|
|
Alastair Dent
I would avoid stashing in this case. Push commits for individual file. Get those merged when approved. The mindset needs to change to think in small units. Many software teams work in a DEV/Master (or DEV/Release) branch model. Small changes are made, checked, and pushed to Dev. Using a tool like Gerrit as a gatekeeper between Git and Dev helps here; Gerrit 'holds' a patch in its own branch, allowing review with comments in a UI. Amendments to the commit can be made, pushed to Gerrit, then merged (to Dev) when complete. Get a nightly CI set up, review the output. When that is acceptable for a release, merge to Master, tag, build from Master and release.
|
|
You've nicely summed up the fundamental problem: "What you're trying to do is apply a tool that counts on technical discipline to a workflow that counts on social lack of discipline." More background: The product managers are not software engineers; they are marketing guys. Yes, this is not just a technical problem in search of a solution. Change management is definitely required, and I certainly can clarify for the tech doc manager that "the problem" is not DITA or Git; a big component of "the problem" is the product managers' workflow. As a consultant, I cannot be obnoxious. I think most people would
describe me as outspoken and persistent. Best, Kris Kristen James Eberlein Chair, OASIS DITA Technical Committee Principal consultant, Eberlein Consulting www.eberleinconsulting.com +1 919 622-1501; kriseberlein (skype)
On 12/19/2019 8:40 AM,
despopoulos_chriss via Groups.Io wrote:
On Thu, Dec 19, 2019 at 05:03 AM, Kristen James Eberlein wrote:
|
|
Kris, The long-running feature branches are indeed the challenge in this scenario, but this is a common issue for which Git provides multiple solutions. As others have suggested, one way to minimize merge conflicts is to regularly merge the development branch back into any previously-spawned feature branches. This ensures that feature branches contain the latest upstream code in addition to the feature-specific changes, and prevents surprises later when the feature branch is merged to the development branch, as any conflicts are resolved within the feature branch rather than waiting for the final merge to develop. However, this approach tends to create a rather chaotic revision graph, as relevant changes are interspersed with multiple merge commits, which can make it difficult to focus on the changes in the feature itself. Where a cleaner history is preferred, feature branches can be rebased onto the development branch, effectively snipping them off from the outdated point where they initially diverged, and stitching them back on to the tip of the development branch with the latest changes. Git novices may find rebasing difficult to grasp, but the commit graph helps to visualize the results, and Sourcetree's interactive rebase tool guides users through this process without touching the command line. Atlassian provides a dedicated tutorial with good explanations on the differences between merging and rebasing: https://www.atlassian.com/git/tutorials/merging-vs-rebasing Hope that helps, Roger
|
|
Jean-Noël AVILA
What you are pointing out is basically rewriting the history of feature
toggle quoted messageShow quoted text
branches, which is totally against Git's way of doing in its standard usage. In this case, there's need for more advanced git management, with all the required caveats. So first of all, if we are to rewrite history, it's mandatory to not introduce merge commits. Basically that means rebasing feature branches. Be careful to make people understand that these rebased feature branches can not be modified by more than one writer. Then, when it's time to cherry pick the changes, I would put on the "expert mode": * Checkout the branch and rebase it on top of develop * Reset the working copy to develop. The history seems to be at develop, but the working copy holds all the changes of the branch. As a result, all the changes of the branch are available in the working copy for selection of hunks to add to the next and unique commit. * Stage changes that go in the commit (for later merge into develop) hunk by hunk,effectively leaving in the working copy changes that won't make it into develop. * Commit the changes to be merged, stash the rest * Check out develop, spawn a new branch, unstash and commit as the branch "not merged". * Merge the first branch into develop. This is quite involved, but that's a minimum to rewrite history. JN
Le 19/12/2019 à 14:03, Kristen James Eberlein a écrit :
|
|