Working with git submodules

A walkthrough


A Git submodule is a fast and neutral way to re-use code across multiple projects and in different technologies.

The essence of a Git submodule is that it’s just a directory in our project which points to, well, another git repository. Like a git symbolic link.

Module files are available for each clone, but are not stored within the main project’s repository online.

Throughout this article, tilde ~ marks our “projects” directory.

Chapter 1 — Module Repository Within a Repository

We’ll create a simple project: An HTML page with some styles.

The main project

Prepare an empty new repo at <MAIN_PROJECT_REPO>

cd ~/main-project
git init
echo -e "<html>\n\t<head>\n\t\t<meta rel=\"stylesheet\" href=\"styles/module/reset.css\" />\n\t</head>\n\t<body>\n\t\t<h1>Heading</h1>\n\t\t<p>Paragraph</p>\n\t</body>\n</html>" > index.html
git add .
git commit -m "add HTML file"
git remote add origin <MAIN_PROJECT_REPO>
git push -u origin master

Let’s assume our reset.css file is a part of a re-usable collection.

A re-usable repo

Have an empty repo at <MODULE_REPO>

cd ~/module
git init
echo -e "* { maring:0; padding:0 }\nhtml, body { min-height:100% }\nbody { background:#eee; color:#454545; font:normal 16px sans-serif; }" > reset.css
git add .
git commit -m "add CSS reset"
git remote add origin <MODULE_REPO>
git push -u origin master

To avoid confusion, let’s delete this repository from our local machine, so there is only an online copy of this repository:

cd ../
rm -rf module

A module

Include this repository as a module in our main project:

cd ~/main-project
mkdir styles
git submodule add <MODULE_REPO> styles/module

A new file, .gitmodules, was created in our root directory. This file is a list of all submodules and respective repositories. It is created and managed by git. Do not edit it manually.

[submodule "styles/module"]
path = styles/module

Once we’ve included the module in our main project, the workflow within our module’s directory is preformed just like any other repository, and the containing repository updates the pointer to the module’s repository.

A module’s .git file points to the part in it’s parent git directory that manages this specific module (.git/modules/<MODULE_PATH>). The module’s git data is actually managed by the main project. This way the main project actually performs staging and commits for all submodules, and is aware of their states.

Editing the module’s content and updating it’s repository

We have a working repository. We can work, commit and push code straight to the module itself.

Ignoring the fact that this is another repository, edit a file in the module from within the main project:

echo "h1 { color: green; }" >> styles/module/reset.css

Notice there is an un-committed change in the main project. The module contains some changes, so it is considered “dirty”:

git status
modified: styles/module (modified content)
git diff
diff /styles/module
-Subproject commit <SHA1_ID>
+Subproject commit <SHA1_ID>-dirty

Step into the module, commit and push the changes:

cd styles/module
git commit -am "colour change"
git push origin master
cd -

There is still an un-committed change in the main project, but now it is pointing to the new commit ID

git status
modified: styles/module (new commits)
git diff
diff /styles/module
-Subproject commit <OLD_SHA1_ID>
+Subproject commit <NEW_SHA1_ID>

Think of the commit ID as a version. Across branches and tags, the commit ID is what’s important. If we commit and push to a submodule, The main project’s repo now points to our current submodule’s commit. Now we need to commit the update to the main project.

git commit -am "styles/module submodule updates"
Recap: We have included a submodule in our project, updated it’s contents, and pushed back to it’s repository, all from our main project.

Chapter 2 — Distribution Across Projects

A Git submodule is not only a fast and easy way to include the same directory in multiple projects. The real beauty is inherited from Git; it’s distributed — each clone is capable of updating the code.

Create a second project. Again an HTML page and some styles.

Prepare an empty new repo at <OTHER_PROJECT_REPO>

cd ~/other-project
git init
echo -e "<html>\n\t<head>\n\t\t<meta rel=\"stylesheet\" href=\"styles/module/reset.css\" />\n\t</head>\n\t<body>\n\t\t<h1>Another</h1>\n\t\t<p>Project</p>\n\t</body>\n</html>" > index.html
git add .
git commit -m "add HTML file"
git remote add origin <MAIN_PROJECT_REPO>
git push -u origin master

Include the same styles module:

git submodule add <MODULE_REPO> styles/module

Edit the css file, regardless of it being a part of a module

echo -e "* { maring:0; padding:0 }\nhtml, body { min-height:100% }\nbody { background:#eee; color:#454545; font:normal 16px sans-serif; }\nh1 { color: red; }" > styles/module/reset.css

Again, we have modified content in a submodule. Commit these changes to a branch so we can create a request for peer review before merging the changes to the submodule, and create a respective branch on the main project to go with it.

cd styles/module
git checkout -b change-header-colour
git commit -am "Change header colour"
git push origin change-header-colour
cd -
git checkout -b a-module-update
git commit -am "update module pointer"
git push origin a-module-update

Now merge everything back to the master branch.

cd styles/module
git checkout master
git merge change-header-colour
git push origin master
cd -
git checkout master
git merge a-module-update
git push origin master

We’ve updated the submodule to suit this project, but the first project is still pointing to a previous commit. That’s good, we don’t want it to update without us checking everything’s working correctly.

We’ll return to the main project for some maintenance work, and try to see if any of our submodules can be updated and tested.

cd ~/main-project
git submodule update --remote styles/module
git status

If we don’t know if or which submodules were updated — we can perform “pull” on all submodules

git submodule foreach git pull origin master
modified: styles/module (new commits)

We can test our project, once we see everything’s fine we can update the main project, and commit the project with the updated submodules.

Chapter 3 — Consumption in Teams

When pulling a project’s latest changes, submodules do not get updated automatically. You’ll have to update the submodules:

git submodule update

If there’s a new submodule, we’ll have to update and init:

git submodule update --init

And if there are nested submodules in those modules we’ll have to update recursively:

git submodule update --recursive

The init and recursive flags are non destructive. If the submodule’s signature is unchanged, nothing will happen. It is safe to perform the flagged request on all pulls.

Since we don’t want a developer having to inform the team that they each need to update the submodules, it is a good practice to add a githook on “post-merge” which will happen after each pull.

cd ~/main-project
echo "git submodule update --init --recursive" >> .git/hooks/post-merge

This hook can prevent a lot of confusion working in teams.

⚠️ When cloning a new project, include all submodules from the get-go with the recursive flag:

git clone --recursive <MAIN_PROJECT_REPO>

Chapter 4 — Extra curricular

Git configurations

Set up Git to show updates to submodules on git status:

git config status.submodulesummary 1

Point to a branch instead of a commit

Set the project to always update from a specific module’s branch (latest commit). Usually we’d set this to ‘master’ or ‘stable’.

git config -f .gitmodules submodule.<MODULE_NAME>.branch <BRANCH_NAME>

(Where in our examples the submodule’s name was “styles/module”)

  1. Deploy jobs will always get the latest version of that branch.
  2. Next time we update submodules, the latest commit in the specified branch will be checked out.

Automatic fetch and merge (module, not project)

If we don’t want to navigate into the module and fetch it manually, we can update from the remote repository. Now we only have to commit the change in your parent project.

git submodule update --remote

Adding --rebase or --master flag will prevent from resetting our project to detached HEAD state.

Pushing from a project with modules

The --recurse-submodules have some useful options. if set to check, it’ll block us from pushing to the project until all submodules have been pushed:

git push --recurse-submodules=check

It’s dangerous sibling, on-demand, will just go in each submodule and push it before pushing the parent project.

git push --recurse-submodules=on-demand

Batch work

We’ve mentioned foreach. Basically, this will perform the subsequent command.

For example, step into the each submodule and:
1. Reset all the changes
2. Remove untracked files

git submodule foreach git reset --hard
git submodule foreach git clean -fd

Chapter 5 — Fallibles

Some use cases may become a little confusing.

Introducing a submodule in a feature branch

When we introduce a new submodule in a feature branch, while it’s waiting for peer review we checkout our main branch (or any other one). Now we’ve still got the module directory in the filesystem (unstaged).

Naturally, we’ll want to remove the files, but when we checkout the feature branch, containing the submodule, it’s important to remember the module directory will be empty and the submodule should be updated (git submodule update). It’s a hassle but it’s usually over by the time we’ve incorporated the submodule into our main project.

A submodule was updated while I was in a feature branch

This one can be a little frustrating because we didn’t touch any submodule. But someone else did on the main branch. We can just update (git submodule update) each time, but once we’re merged or rebased from main branch we’ll be aligned again.

Updating a submodule in a feature branch

We’ll encounter a different behaviour when our feature branch introduces a new version (commit ID) for a submodule. When we checkout the main branch, we’ll have to update the submodule (git submodule update) so we have it’s respective version, then when we checkout the feature branch again — update to it’s respective version. Again, once we’ve merged this whole mess is over with.

Switching from directory to a submodule

A new submodule cannot populate an existing directory. If we switch use of one directory to a submodule in our project, we’ll have to remove the directory and unstage it

rm -rf <DIRECTORY>
git rm -r <DIRECTORY>
git submodule add <MODULE_REPO> <DIRECTORY>

In such cases expect inconsistencies between branches. Where your feature branch may have this directory as a submodule, and main branch as a filesystem directory, you may need to manually checkout the files in the submodule after switching to and from branches.

Git-scm Resources

Git submodule documentation
Pro Git Book — Submodules