git subtrees: a tutorial

This is a tutorial on how to use git subtrees. In this example, we will work on “parent”, the repository that consumes a library called “my-subproject”. To understand why you would use git subtrees, read my article Modularizing Medium’s iOS codebase.

There is a repo called “parent” and we are about to import a library called “my-subproject”. Here’s how they look:

parent:

Image for post
Parent’s commits
Image for post
Parent’s files

my-subproject:

Image for post

my-subproject’s commits

Image for post
my-subproject’s file

Observe that I added [parent] and [vinibaggio/my-subproject] to the commit titles, manually, just to improve visibility of where these commits are coming from.

Now, I want to add my-subproject to parent, using git subtrees. To do that, I have to use a command called “git subtree add”:

Notice that I added the library’s remote as if it was my own. This will significantly simplify the commands, so that I don’t have to specify the repository’s address all the time. Then, you use subtree add to add that repo’s code into a path in the parent’s project, specified by prefix. The last parameter, master, is the branch you are pulling code from (my-subtree/master).

Now, add the origin and then add the subtree will look like this:

Image for post

After pushing to master, here’s how parent looks:

Image for post
Parent, after adding the subtree
Image for post
parent/vendor
Image for post
Note how the commits are interwoven. Also, initial commit is the commit generated by Github

No changes were done to my-subproject.

Changing my-subproject from parent

Now I need to get some work done, so I add new files to my-subproject and add them to my project, working directly from parent:

Image for post

After pushing to parent, here’s how both parent and my-subproject look:

parent:

Image for post
Parent with the new commit

my-subproject has no changes:

Image for post

You can keep working on the parent project as much as you need to. For instance, I am going to create a commit that has changes both in parent andmy-subproject:

Image for post

Now we want to be good people and contribute our changes to the library back to the original repository, which doesn’t need to happen too often. But when it does, you have to use git subtree push. Let’s push our changes back to my-subproject:

Image for post

See those -n 1…5 lines? This is git going through the commits and picking the changes that should go to the repo. At the end, there are two commits that needs to be pushed to vinny-some-awesome-work, in the library directory. But! What happens to the other files, outside of the prefix directory? It gets filtered out.

Let’s open a pull request in my-subproject from this branch and observe the changes:

Image for post
The PR. There are no actual changes since all the files are blank

Let’s see the changed files:

Image for post

It seems everything we expected, so I merge the project.

Bringing updates from my-subproject back to parent

Imagine that someone made awesome contributions to my-subproject and you want to pull these new changes back into parent:

Image for post
Pushing new changes to my-subproject directly

Now, I am in my feature branch and created these two commits:

Let’s import our new changes with git pull:

This will execute a pull, using the “subtree” merge strategy. It is necessary so we tell git to inspect the patches and identify it should be applying in a subtree of our current project (parent). Quoting the Git handbook: “it’s pretty amazing”.

This will generate a merge commit, like so (note that I customized the merge commit message):

Image for post
pulling changes from my-subproject into parent

This is the end result of merging this branch into master.

Image for post
State of master

Notice the commits near the highlighted area. See that the commits are repeated? This is git doing its magic on the pull with subtree strategy. It will bring all the commits from the other repo and will stay together with your own repo’s versions of it.

For this reason, it’s good practice to use the “squash” option when merging changes back. I reverted the changes so parent#master looks like this:

Image for post
Master reverted

I then recreated the branch “vinny-feature” with the same commits, but now using the squash flag:

After merging, this is how master looks:

Image for post
State of master after merging

Long story short, if you pull changes, make sure you merge the commits with squash. Another good practice to avoid this repetition is to, whenever you make changes to the subprojects, make individual commits. This is not obligatory, but it will make sure there is no repetition of commit names, and also, when pushed back to the original repository, they will make sense on their own.

Advantages of using subtrees

The advantages of subtrees over submodules are mostly related to work flow. Since subtrees require no change to the project’s workflow and lack of necessity of introducing new commands, it makes the life of the developers much, much easier. Merge conflicts will make (more) sense and they’re much easier to work with.

Caveats

Contributing back and forth between repos is definitely more complicated. To simplify potential merge conflicts, pulling changes from the libraries should be done in separate pull requests. Also, rebasing after subtree pulls don’t work (on rebases, git loses track of the —prefix, so you will have a big mess in your project’s root).

Finally, it takes a bit of time how to understand how the commands are formatted and why there are all these options, but it is very valuable, and remove the friction from all the developers of your team.

Written by

Breaking things on the internet. Engineer at large.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store