Hacking Git Directories

How to reconstruct source code from an exposed .git directory

Vickie Li
The Startup
6 min readDec 25, 2019

--

Photo by Yancy Min on Unsplash

When attacking an application, obtaining the application’s source code can be extremely helpful for constructing an exploit. This is because some bugs, like SQL injections, are way easier to find using static code analysis compared to black-box testing.

Obtaining an application's source also often means getting a hold of developer comments, hardcoded API keys, and other sensitive data. So the source code of an application should always be protected from public view.

Finding .git directory information leaks

A way that applications accidentally expose source code to the public is through an exposed .git directory.

When a developer uses Git to version control a project’s source code, a git directory (located at project.com/.git) is used to store all the version control information of the project, including the commit history of project files. Normally, the .git folder should not be accessible to the public. But sometimes the .git folder is accidentally made available, and this is when information leaks happen.

To check if an application’s .git folder is exposed, simply go to the application’s root directory, for example project.com, and add /.git to the URL. There are three possibilities that can happen when you browse to the /.git directory:

  • If you get a 404 error, this means that the .git directory of the application is not made available to the public, and you won’t be able to leak information this way.
  • If you get a 403 error, the .git directory is available on the server, but you won’t be able to directly access the folder’s root, and therefore will not be able to list all the files contained in the directory.
  • If you don’t get an error and the server responds with the document tree of the .git directory, you can directly browse the folder’s contents and retrieve any information contained in it.
Photo by Luke Chesser on Unsplash

Reconstructing project source from .git directory

If directory listing is enabled, an attacker can simply browse through the files and retrieve the leaked information. She can also use the wget command in recursive mode (-r) to mass-download the contents of the directory.

But if directory listing is not enabled and the directory’s files are not shown, there are still ways for an attacker to reconstruct the entire .git directory. To understand how this is done, we must first understand the structure of .git directories.

.git directory structure

The .git directory is laid out in a specific way. When you execute the command:

In the command line, you would probably see this:

Here are a few standard files and folders in the .git directory that is important in reconstructing the project’s source.

  • The /objects folder

The /objects directory is used to store Git objects. This directory contains additional folders that each have two character names. These subdirectories are named after the first two characters of the SHA1 hash of the git objects stored in it.

Within these subdirectories, there are files named after the SHA1 hash of the git object stored in it.

For example, the command below will return a list of folders:

And this command will reveal the git objects stored in that particular folder:

Git objects are stored in /objects according to the first two characters of their SHA1 hash. For example, the Git object with a hash of 0a082f2656a655c8b0a87956c7bcdc93dfda23f8 will be stored with the file name of 082f2656a655c8b0a87956c7bcdc93dfda23f8 in the directory .git/objects/0a.

Git stores different types of objects in .git/objects. An object stored here could either be a commit, a tree, a blob, and an annotated tag. You can determine the type of an object by using the command:

Commit objects store information about the commit’s directory tree object hash, parent commit, author, committer, date, and message of a commit. Tree objects contain the directory listings for commits. Blob objects contain copies of files that were committed (read: actual source code!). Whereas tag objects contain information about tagged objects and their associated tag names.

You can display the file associated with a Git object by using the command:

  • The /config file is the Git configuration file for the project.
  • The /HEAD file is a file that contains a reference to the current branch.

Confirming that files are accessible

If you are not able to access the .git directory listing, you’ll need to confirm that the folder’s contents are indeed available to the public. You can do this by trying to access the config file of the .git directory.

If this file is accessible, you might be able to download the entire contents of the .git directory.

Downloading the files

If you cannot access the /.git folder’s directory listing, you have to download each file you want instead of recursively downloading from the directory root.

But how do you find out which files on the server are available when object files have complex paths such as “.git/objects/0a/72e6850ef963c6aeee4121d38cf9de773865d8”?

You start with file paths that you already know exist, like “.git/HEAD”! Reading this file will give you a reference to the current branch (for example, .git/refs/heads/master) that you can use to find more files on the system.

The .git/refs/heads/master file will point you to the corresponding object hash that stores the directory tree of the commit. From there, you can see that the object is a commit and is associated with a tree object, 0a72e6850ef963c6aeee4121d38cf9de773865d8.

Now when you examine the tree object stored at 0a72e6850ef963c6aeee4121d38cf9de773865d8:

Bingo! You discover some source code files and additional object trees to explore.

On a remote server, your requests to discovering the different files would look more like this:

On a remote server like this, you will need to decompress the downloaded object file before you read it. This can be done using Ruby:

Finding useful information

After recovering the project’s source code, you can grep for hardcoded credentials, encryption keys and developer comments for quick wins. You should also look for new and deprecated endpoints and record them for further analysis.

If you have time, you can simply browse through the entire recovered codebase to find potential vulnerabilities. Here’s a guide to reviewing code for security purposes:

Thanks for reading. And remember: trying this on systems where you don’t have permission to test is illegal. If you’ve found a vulnerability, please disclose it responsibly to the vendor. Help make our Internet a safer place.

--

--

Vickie Li
The Startup

Professional investigator of nerdy stuff. Hacks and secures. Creates god awful infographics. https://twitter.com/vickieli7