Project Management with GitHub Sub-modules

Kaizin
3 min readFeb 5, 2024

--

I have been made a LLM chat project.
The beginning of my project was testing openAI library to call assistant api in Python code.
I added RAG modules to retrieve data from vectorDB and Oracle tables.
And I wanted to run those codes using some chat UI, so finally I added streamlit chat UI.
After all of those tasks, my project was massed up as all of python files, FAISS files, and some documents are all in same folder.

I separated the files into 4 sectors by its purposes:
LLM, DB, Streamlit, and Utility.

Separating and putting the files in different folders is not that hard task.

What I realized during the segregating files is that the sectors that I separated can be used in other project.
But since those sectors are in a same repository, cloning the part that I want only is troublesome.

Setup sub-module

The solution for this is Github sub-module.

I made a super repository that covers all codes and sub repositories for each sectors.

To add these repositories as parts of super repository, git provides sub-module command. Unfortunately, Github desktop does not support for this. Below run below command on Git Bash on super repository folder.

It specify the branch to track and add the repository as folder in super repository folder with the folder name.

git submodule add -b <branch name> <your repository address.git> <folder name>

After run above codes for each repositories, unlike normal repository, .gitmodules file and the sub-module folder that is different from normal folder shows up in super repository.

Each sub-module folder is connected to isolated repository.
The initial commit adds files to super repository so the files in each sub-module shows up as the version when the initial commit was made.

But after that moment, the commit in super repository does not reflect the latest version of files in each sub-module repositories.
And even some commit is made on each sub-module, super repository does not follow the commit.

Update and mange sub-module

The super repository tracks the latest commit of the specified branch of the sub-module when below command is made. In other words, until update is done the sub-module codes are not affected by any changes. The sub-module can make commit or branches without affecting to super repository.

git submodule update --remote

What if we want to change the branches to track?
In that case, we should update gitmodules file using below command.
<submodule_path> is path to the sub module folder in local.

[submodule "libs/example"]
path = libs/example
url = https://github.com/username/example.git
branch = main
git config -f .gitmodules submodule.<submodule_path>.branch <branch_name>

If branch is not specified, It will track main or master depending on sub-module repository status.

Pros and Cons

The sub-module management has its own pros and cons.
It allows precise version control and collaboration without affecting main project if we use correctly. However it can make confusion and difficulty to control repository if sub-module increase without consideration.

If you carefully check your project and use appropriate sub-modules, it would increase efficiency of your work.

Pros:

  1. Version Control on Dependencies
  2. Separate Repository Management
  3. Cleaner Project Structure
  4. Efficient Collaboration

Cons:

  1. Complexity in Management
  2. Potential for Repository Pollution

Next story:

https://medium.com/@gonchogo/how-to-clone-repository-with-sub-modules-d188a22859d1

--

--

Kaizin

Interested in data science. Recently keen into LLM related stuff.