Git: how to contribute to Azure docs without cloning 20Gb+ of git repository

Saverio Proto
Microsoft Azure
Published in
2 min readApr 9, 2024
artwork that depicts the tree representation of a git code repository, with multiple commits, tags, branches and merges. The graph connecting the commit objects should be futuristic and giving the idea of a large amount of data.
Generated with AI Microsoft Image Creator with the following prompt: “artwork that depicts the tree representation of a git code repository, with multiple commits, tags, branches and merges. The graph connecting the commit objects should be futuristic and giving the idea of a large amount of data.”

I occasionally contribute to the Azure documentation. What I appreciate about the Azure documentation is that its markdown source code is managed in version control with Git. Moreover, this source code is accessible to customers on GitHub, empowering them to accomplish two key tasks:

  • Track documentation changes with git
  • Propose PRs to improve the documentation

The challenge of working on the azure-docs repository lies in the fact that cloning the repository via Git consumes more than 20GB of storage on your hard disk. Although it remains feasible to edit files and propose pull requests entirely through the GitHub web interface, for more complex contributions spanning multiple files, having the files locally and working with your favorite editor would be better.

I attended FOSDEM 2024, where Scott Chacon delivered an insightful talk titled “So You Think You Know Git?” During the presentation, he shared valuable tips on effectively managing large repositories like the one you encounter with Azure documentation.

My new workflow is now as follows:

I clone the repository without downloading all the Git blobs and without checking out.

git clone --no-checkout --sparse --filter=blob:none https://github.com/MicrosoftDocs/azure-docs

The resulting folder size is just 1.5 GB.

The next step is to checkout only the folder I need. For instance, if I want to contribute to AKS documentation, I specifically check out the subfolder articles/aks.

cd azure-docs
git sparse-checkout add articles/aks/
git checkout main

Under the hood, what occurs is that the configuration file .git/info/sparse-checkout is updated.

You can verify the configuration either by examining the contents of that file directly or by running the following command:

git sparse-checkout list

When checking out the main branch, you’ll notice a download operation starting. This is because we cloned the Git repository without the blobs, so the data is downloaded when it is actually needed.

At this point, you can work normally, editing multiple files locally and creating new commits. For further details, refer to the Git sparse checkout documentation.

--

--

Saverio Proto
Microsoft Azure

Customer Experience Engineer @ Microsoft - Opinions and observations expressed in this blog posts are my own.