git subtrees: a tutorial

Scenarios on git subtrees

This is a tutorial on how to use git subtrees. In this example, we will work on “parent”, the repository that consumes a library called “my-subproject”. To understand why you would use git subtrees, read my article Modularizing Medium’s iOS codebase.


There is a repo called “parent” and we are about to import a library called “my-subproject”. Here’s how they look:

parent:

Parent’s commits
Parent’s files

my-subproject:

my-subproject’s commits

my-subproject’s file

Observe that I added [parent] and [vinibaggio/my-subproject] to the commit titles, manually, just to improve visibility of where these commits are coming from.

Now, I want to add my-subproject to parent, using git subtrees. To do that, I have to use a command called “git subtree add”:

git remote add my-subtree git@github.com:vinibaggio/my-subproject.git
git subtree add —-prefix=vendor/ my-subtree master

Notice that I added the library’s remote as if it was my own. This will significantly simplify the commands, so that I don’t have to specify the repository’s address all the time. Then, you use subtree add to add that repo’s code into a path in the parent’s project, specified by prefix. The last parameter, master, is the branch you are pulling code from (my-subtree/master).


Now, add the origin and then add the subtree will look like this:

After pushing to master, here’s how parent looks:

Parent, after adding the subtree
parent/vendor
Note how the commits are interwoven. Also, initial commit is the commit generated by Github

No changes were done to my-subproject.

Changing my-subproject from parent

Now I need to get some work done, so I add new files to my-subproject and add them to my project, working directly from parent:

After pushing to parent, here’s how both parent and my-subproject look:

parent:

Parent with the new commit

my-subproject has no changes:

You can keep working on the parent project as much as you need to. For instance, I am going to create a commit that has changes both in parent andmy-subproject:

Now we want to be good people and contribute our changes to the library back to the original repository, which doesn’t need to happen too often. But when it does, you have to use git subtree push. Let’s push our changes back to my-subproject:

See those -n 1…5 lines? This is git going through the commits and picking the changes that should go to the repo. At the end, there are two commits that needs to be pushed to vinny-some-awesome-work, in the library directory. But! What happens to the other files, outside of the prefix directory? It gets filtered out.

Let’s open a pull request in my-subproject from this branch and observe the changes:

The PR. There are no actual changes since all the files are blank

Let’s see the changed files:

It seems everything we expected, so I merge the project.

Bringing updates from my-subproject back to parent

Imagine that someone made awesome contributions to my-subproject and you want to pull these new changes back into parent:

Pushing new changes to my-subproject directly

Now, I am in my feature branch and created these two commits:

f455188 Implementing feature
a75255c Introducing bug

Let’s import our new changes with git pull:

 git subtree pull —prefix=vendor my-subtree master

This will execute a pull, using the “subtree” merge strategy. It is necessary so we tell git to inspect the patches and identify it should be applying in a subtree of our current project (parent). Quoting the Git handbook: “it’s pretty amazing”.

This will generate a merge commit, like so (note that I customized the merge commit message):

pulling changes from my-subproject into parent

This is the end result of merging this branch into master.

State of master

Notice the commits near the highlighted area. See that the commits are repeated? This is git doing its magic on the pull with subtree strategy. It will bring all the commits from the other repo and will stay together with your own repo’s versions of it.

For this reason, it’s good practice to use the “squash” option when merging changes back. I reverted the changes so parent#master looks like this:

Master reverted

I then recreated the branch “vinny-feature” with the same commits, but now using the squash flag:

git subtree pull —prefix=vendor —squash my-subtree master

After merging, this is how master looks:

State of master after merging

Long story short, if you pull changes, make sure you merge the commits with squash. Another good practice to avoid this repetition is to, whenever you make changes to the subprojects, make individual commits. This is not obligatory, but it will make sure there is no repetition of commit names, and also, when pushed back to the original repository, they will make sense on their own.

Advantages of using subtrees

The advantages of subtrees over submodules are mostly related to work flow. Since subtrees require no change to the project’s workflow and lack of necessity of introducing new commands, it makes the life of the developers much, much easier. Merge conflicts will make (more) sense and they’re much easier to work with.

Caveats

Contributing back and forth between repos is definitely more complicated. To simplify potential merge conflicts, pulling changes from the libraries should be done in separate pull requests. Also, rebasing after subtree pulls don’t work (on rebases, git loses track of the —prefix, so you will have a big mess in your project’s root).

Finally, it takes a bit of time how to understand how the commands are formatted and why there are all these options, but it is very valuable, and remove the friction from all the developers of your team.