Is Git really the best possible version control system imaginable?
Git is a truly great piece of software (thanks, Linus!), but is this really the best possible version control system imaginable?
In this two-part post I will explore some of Git’s limitations and present some ideas on how to improve version control. Refining and improving these ideas ultimately led to the development of Divversion, which improves source code control in a fast-paced development environments, while always remaining fully compatible with existing Git repos.
Let’s start by identifying the following issues with the current incarnation of source code control via Git:
1) The branch divergence problem
Let’s say you implement a new feature and you use a separate feature branch to do this, as you have been well-trained to do. Unfortunately, you forgot your normally exemplary Git hygiene and did not constantly update your feature branch with the upstream changes while you write your code.
After a few days (or hours), you are ready to merge your changes back into the upstream main (or development) branch. And surprise! — You now have 37 merge conflicts to deal with. Fixing these will probably be a strong contender for the the most frustrating part of your day.
This is the branch divergence problem: Without constantly incorporating upstream changes, you end up with a highly divergent branch at the end of your feature development, which has to be painfully merged back in. OK, you say, you should have incorporated the upstream changes (say, via rebase) more often. Correct, but that brings us to…
2) The all-or-nothing conflict resolution problem
When you incorporate upstream changes early and often, you also get the merge conflicts early and often. There is nothing you can do about this except solve them in your branch before continuing with your work. This constant solving of conflicts tends to be disruptive and time-consuming.
But Git will not let you continue merging when there are conflicts. So, you either go through this unpleasant task again and again or you give in to the normal human response to and unpleasant task and avoid doing it, leading to the branch divergence problem above.
The issue here is that conflict resolution in Git is always all-or-nothing: You cannot resolve just some conflicts and leave the others for some time. Experience shows that some of these conflicts actually go away by themselves once work progresses further. But Git will not let you continue with a merge while there are unresolved conflicts.
3) The multi-merge problem
The two problems above get much worse when there is a lot of activity in the repo (Agile, I’m looking at you!). Blink and there may now be lots of pull requests to be merged in. Here’s the catch: A stressed-out senior dev accepts the first pull request of the day between meetings, which then changes the main (or dev) branch, so all your painstaking work to resolve conflicts with the (now outdated) previous main branch commit may have to be done all over again.
This kind of friction quickly piles up and more and more time is spent on these “administrative” tasks, taking away valuable time (and energy) from solving coding problems, fixing bugs, testing, etc.
Wouldn’t it be great if Git could merge lots of heads together at the same time? It can, in fact, via the 🐙 octopus merge. Have you ever used it? I bet you haven’t. It’s not very well known because it’s much less capable at resolving conflicts automatically (more on this later), so the octopus-merge is more of academic interest at present.
“The problem isn’t the problem. The problem is your attitude about the problem!”
This headline is perhaps a bit unfair — how else could Git handle the above situations? Thinking about the three problems a bit more, a while back I asked myself the following question:
What if a file could exist in a partially merged state, where mergeable sections have been resolved, but conflicts in other sections are just left “undecided”?
It turns out that such “Schrödinger’s files”, together with a merge engine capable of handling them, can solve all the above problems!
How? This is best explained with an example:
Say, at some point in time, a file in a repo contains a few lines only:
Original version:
We
are
the
Borg
Two users, Hugh and Locutus, both edit the file “at the same time” (meaning only that they do not see each other’s changes while they edit) to the following versions:
Hugh’s version:
I
am
Hugh
La Forge
Locutus’s version:
I
am
Locutus of
Borg
Now, perhaps upon reconnecting to the Collective, the versions need to be merged. This is easy for the first two lines since they have been changed by both users to the same words “I” and “am”, respectively. Git does exactly that: If a line is changed to the same thing by both users, then this change is accepted.
The last line was only changed in Hugh’s version (to “La Forge”), but left alone in Locutus’ version. In this case, Git assumes that the change should be accepted. Indeed, Git compares both versions to their common ancestor (the “Original version” above) and then looks at the changes made by either user: Here, Hugh replaced the last line (“Borg”) by “La Forge”, but Locutus kept the last line as-is, so Git reasons that this is a non-conflicting change that should be accepted.
For the second-to-last line, however, we have a merge conflict: Both Hugh and Locutus changed this line (to “Hugh” and “Locutus of”, respectively) and we (= Git) cannot decide which version should be accepted. At this point, Git throws in the towel and reports a conflict, which needs to be manually resolved, potentially stopping all assimilations currently in progress.
If we were to print out the conflict via Git, we would get something like the following:
I
am
<<<<<<< Hugh
Hugh
=======
Locutus of
>>>>>>> Locutus
La Forge
This just expresses that there is a conflict in the second-to-last line and Hugh’s version is “Hugh” whereas Locutus’s version is “Locutus of” (see here for more on this syntax). Note that the ancestor content (“the”) is not shown (if you want to see it you need to set merge.conflictStyle
).
But is this really all that can be said?
Going back to the analogy with a box containing Schrödinger’s cat, which is both dead and alive simultaneously (in “superposition”) before the box has been opened, we could also say that our file exists in two partially merged states at the same time:
Hugh’s partially merged version:
I
am
Hugh
La Forge
Locutus’s partially merged version:
I
am
Locutus of
La Forge
The point here is that what could be merged has been merged (the first, second and last lines). The conflict of course remains — what else could one do? — but we just keep Hugh’s third line in Hugh’s “view” of the file and Locutus’s third line in Locutus’s “view” of the file.
Crucially, at this point, neither Hugh nor Locutus need to resolve the conflict! Both Hugh and Locutus can go about their business and only pick one choice when they are finally ready to resolve their conflict, maybe when their feature branches are merged back into the main
branch.
Can Git do this?
If the above “merge what can be merged and leave the rest alone” strategy can be implemented, this would solve the all-or-nothing conflict resolution problem: We can now safely postpone the resolution of conflicts that we do not want (or are able to) resolve at a given time. The file remains in a “Schrödinger state” (or, using more precise language, a superposition of states) in the lines that could not automatically be resolved.
Then, we can also fix the first problem: Just keep incorporating all upstream changes continuously — automatically of course — and if conflicts occur keep our own changes.
Can Git do this? Kind of! You just need to use the -Xours
option with the merge command
as
git merge -Xours main
This will merge main
into the current branch, but automatically resolving all conflicts with our version.
However, very few people seem to be doing this, or even know about -Xours
. I can only speculate as to the reasons for this, but this is probably so because the feature did not always exist in Git and is a bit confusingly documented: There is also an ours
"merge strategy", which is different from the -Xours
option (to the standard ort
merge strategy). Head over to here if you want to know more about this. In any case, the end result seems to be that only the local Git guru has ever really used -Xours
, so this does not help. Remember, you should be doing this while working on your feature branch!
So, let’s assume for a moment that you are the local Git guru. So, you follow this strategy and merge in the upstream changes via git merge -Xours main
. Excellent, the conflict is resolved, at least locally in your branch. You fixed it - by selfishly choosing your own version. Is this really what we want, though? It would be good to keep at least a record of the fact that this was a conflict in the first place. This information is now lost, so if you all of a sudden change my mind and decide to resolve the conflict by keeping the original version, you cannot actually do this anymore, since there is no longer a (documented) conflict!
There is also another problem: The above strategy can work reasonably well for incorporating upstream changes (in main
) into feature branches. It is really tedious for exchanging changes between two or more feature branches. Maybe two developers are working on different aspects of a feature and want to stay in sync. In this case, there would have to be constant pushing and fetching to keep things in sync, which is not very practical.
More variations on this theme exist, but in the interest of staying awake, let’s move on.
Double, double toil and trouble
Let’s recall the final problem mentioned above, namely the multi-merge problem. It centered on the fact that everything gets worse when trying to tie more than one branch together at the same time. Why would we want to do this?
The most important use-case for many branches being continuously merged together that I can think of is collaborative coding. I do not mean live code collaboration, where you can see each other’s editing as it is typed. This is great for editing a file together while chatting about the code on a call. Otherwise, I usually want my colleagues to keep their mitts out of my code as I edit, thank you very much. On the other mitt, however, it is not uncommon that indeed several people work on the same file at the same time. This in fact happens all the time when refactoring, or even when using several machines (home/work/etc.).
Currently, this is almost a complete no-go with Git’s inbuilt tooling. There is the 🐙 octopus merge mentioned above, but it is very restrictive: If you merge three branch heads and two heads make the same change (not one and not all), it’s reported as a conflict, even though it is pretty reasonable to accept that change. Also, there is no -Xours
option for the octopus merge, so what we discussed before cannot be done at the same time.
By the way, if you want to get your hands dirty, then you can clone the repo https://github.com/divversion/octopus-demo.git
and play around with the octopus merge (in addition to the git clone ...
you may want to do a git pull --all --tags
to make sure to get all branches and tags). The tag expected-octopus-abc
points to the commit that contains the octopus merge result of the branches alice
, bobby
, and clara
into main
. The tag expected-octopus-ours-abcd
points to the commit that contains the octopus merge result of the branches alice
, bobby
, clara
, and david
into main
, using a fictional -Xours
merge option from main
's perspective.
Where is this going, if anywhere?
I hope to have convinced you by now that an improvement to Git that allows for Schrödinger-like partially merged file states, that automates upstream change incorporation, and that can handle multi-branch merges, would be a good thing.
Indeed, the above ideas were the starting point for the development of Divversion. It offers collaborative coding with continuous syncing, “following” branches, reconciliation between branches, and a friendly file history functionality.
If you want to give it a spin, then head over to www.divversion.com. You can also find docs and tutorials at docs.divversion.io.
If you want to know more and learn about how to actually make these ideas work, then stay tuned for Part II no of this post.
Thanks for reading!