git subtrees: a tutorial
Scenarios on git subtrees
This is a tutorial on how to use git subtrees. In this example, we will work on “parent”, the repository that consumes a library called “my-subproject”. To understand why you would use git subtrees, read my article Modularizing Medium’s iOS codebase.
There is a repo called “parent” and we are about to import a library called “my-subproject”. Here’s how they look:
parent:
my-subproject:
my-subproject’s commits
Observe that I added [parent] and [vinibaggio/my-subproject] to the commit titles, manually, just to improve visibility of where these commits are coming from.
Now, I want to add my-subproject to parent, using git subtrees. To do that, I have to use a command called “git subtree add”:
git remote add my-subtree git@github.com:vinibaggio/my-subproject.git
git subtree add —-prefix=vendor/ my-subtree master
Notice that I added the library’s remote as if it was my own. This will significantly simplify the commands, so that I don’t have to specify the repository’s address all the time. Then, you use subtree add to add that repo’s code into a path in the parent’s project, specified by prefix. The last parameter, master, is the branch you are pulling code from (my-subtree/master).
Now, add the origin and then add the subtree will look like this:
After pushing to master, here’s how parent looks:
No changes were done to my-subproject.
Changing my-subproject from parent
Now I need to get some work done, so I add new files to my-subproject and add them to my project, working directly from parent:
After pushing to parent, here’s how both parent and my-subproject look:
parent:
my-subproject has no changes:
You can keep working on the parent project as much as you need to. For instance, I am going to create a commit that has changes both in parent andmy-subproject:
Now we want to be good people and contribute our changes to the library back to the original repository, which doesn’t need to happen too often. But when it does, you have to use git subtree push. Let’s push our changes back to my-subproject:
See those -n 1…5 lines? This is git going through the commits and picking the changes that should go to the repo. At the end, there are two commits that needs to be pushed to vinny-some-awesome-work, in the library directory. But! What happens to the other files, outside of the prefix directory? It gets filtered out.
Let’s open a pull request in my-subproject from this branch and observe the changes:
Let’s see the changed files:
It seems everything we expected, so I merge the project.
Bringing updates from my-subproject back to parent
Imagine that someone made awesome contributions to my-subproject and you want to pull these new changes back into parent:
Now, I am in my feature branch and created these two commits:
f455188 Implementing feature
a75255c Introducing bug
Let’s import our new changes with git pull:
git subtree pull —prefix=vendor my-subtree master
This will execute a pull, using the “subtree” merge strategy. It is necessary so we tell git to inspect the patches and identify it should be applying in a subtree of our current project (parent). Quoting the Git handbook: “it’s pretty amazing”.
This will generate a merge commit, like so (note that I customized the merge commit message):
This is the end result of merging this branch into master.
Notice the commits near the highlighted area. See that the commits are repeated? This is git doing its magic on the pull with subtree strategy. It will bring all the commits from the other repo and will stay together with your own repo’s versions of it.
For this reason, it’s good practice to use the “squash” option when merging changes back. I reverted the changes so parent#master looks like this:
I then recreated the branch “vinny-feature” with the same commits, but now using the squash flag:
git subtree pull —prefix=vendor —squash my-subtree master
After merging, this is how master looks:
Long story short, if you pull changes, make sure you merge the commits with squash. Another good practice to avoid this repetition is to, whenever you make changes to the subprojects, make individual commits. This is not obligatory, but it will make sure there is no repetition of commit names, and also, when pushed back to the original repository, they will make sense on their own.
Advantages of using subtrees
The advantages of subtrees over submodules are mostly related to work flow. Since subtrees require no change to the project’s workflow and lack of necessity of introducing new commands, it makes the life of the developers much, much easier. Merge conflicts will make (more) sense and they’re much easier to work with.
Caveats
Contributing back and forth between repos is definitely more complicated. To simplify potential merge conflicts, pulling changes from the libraries should be done in separate pull requests. Also, rebasing after subtree pulls don’t work (on rebases, git loses track of the —prefix, so you will have a big mess in your project’s root).
Finally, it takes a bit of time how to understand how the commands are formatted and why there are all these options, but it is very valuable, and remove the friction from all the developers of your team.