Git: how to contribute to Azure docs without cloning 20Gb+ of git repository
I occasionally contribute to the Azure documentation. What I appreciate about the Azure documentation is that its markdown source code is managed in version control with Git. Moreover, this source code is accessible to customers on GitHub, empowering them to accomplish two key tasks:
- Track documentation changes with git
- Propose PRs to improve the documentation
The challenge of working on the azure-docs repository lies in the fact that cloning the repository via Git consumes more than 20GB of storage on your hard disk. Although it remains feasible to edit files and propose pull requests entirely through the GitHub web interface, for more complex contributions spanning multiple files, having the files locally and working with your favorite editor would be better.
I attended FOSDEM 2024, where Scott Chacon delivered an insightful talk titled “So You Think You Know Git?” During the presentation, he shared valuable tips on effectively managing large repositories like the one you encounter with Azure documentation.
My new workflow is now as follows:
I clone the repository without downloading all the Git blobs and without checking out.
git clone --no-checkout --sparse --filter=blob:none https://github.com/MicrosoftDocs/azure-docs
The resulting folder size is just 1.5 GB.
The next step is to checkout only the folder I need. For instance, if I want to contribute to AKS documentation, I specifically check out the subfolder articles/aks
.
cd azure-docs
git sparse-checkout add articles/aks/
git checkout main
Under the hood, what occurs is that the configuration file .git/info/sparse-checkout
is updated.
You can verify the configuration either by examining the contents of that file directly or by running the following command:
git sparse-checkout list
When checking out the main
branch, you’ll notice a download operation starting. This is because we cloned the Git repository without the blobs, so the data is downloaded when it is actually needed.
At this point, you can work normally, editing multiple files locally and creating new commits. For further details, refer to the Git sparse checkout documentation.