Structure of Git Objects

Git is a filesystem, the data are stored in .git/objects. In this post, I will explain the structure of Git objects while looking at the actual repository.

Type of Objects

First of all, prepare a repository to use as an example.

$ git init
Initialized empty Git repository in /tmp/blog_test/.git/
$ ls -lah .git/objects/
total 0
drwxr-xr-x 4 xxx 2028347278 128B 8 29 12:54 .
drwxr-xr-x 9 xxx 2028347278 288B 8 29 13:07 ..
drwxr-xr-x 2 xxx 2028347278 64B 8 29 12:54 info
drwxr-xr-x 2 xxx 2028347278 64B 8 29 12:54 pack
# Commit something
$ echo "Hellow World" > sample.txt
$ git add .
$ git commit -m "Initial commit"
[master (root-commit) ae269c7] Initial commit
1 file changed, 1 insertion(+)
create mode 100644 sample.txt
$ ls .git/objects/
4e 4f ae info pack
# Check files
$ find .git/objects -type f
.git/objects/ae/269c7ccc5ce608b60038ef6923e82fb167d774
.git/objects/4e/bf5763311653c68f44db6b66c300192a7e11c4
.git/objects/4f/52b57b2a3a96457d18049ea34c6085de0e09a4

It looks three directories were added (each directory has one file). It is a 40 digit SHA-1 hash created from contents and headers. The subdirectory is named with the first 2 characters of the SHA-1, and the filename is the remaining 38 characters. 
You can check the object type with git cat-file -t.

for i in ae26 4ebf 4f52; do git cat-file -t $i; done
commit
tree
blob

It looks there three type in git objects. Explain the contents of each type in the next chapter.

Blob Objects

You can check the object’s content by git cat-file with -p.

git cat-file -p 4f52
Hellow World

Blob is an abbreviation of “Binary Large Object”, just the content of the file (not included attributes and filename).
For example, when updating a file and adding other one, blob objects change as follows.

$ echo "Update a sent" > sample.txt
$ echo "Hello World2" > sample2.txt
$ git add .
$ git commit -m "Second commit"
$ find .git/objects -type f
.git/objects/b2/b6f00d3432b3a12bc47e2ae31ee679f2baae92 ※new blob
.git/objects/ae/269c7ccc5ce608b60038ef6923e82fb167d774 ※commit
.git/objects/4e/bf5763311653c68f44db6b66c300192a7e11c4 ※tree
.git/objects/d2/fc8330756fc2dd131ff428a48e5a402d515cfe ※new tree
.git/objects/f8/6effb19a7ee6cea51166c3a1438ba313794fc8 ※new blob
.git/objects/4f/52b57b2a3a96457d18049ea34c6085de0e09a4 ※blob
.git/objects/7f/f510652d06bb1268da28c08e28b41f42c4b0fe ※new commit
# Check blog objects
$ git cat-file -p 4f52
Hellow World
$ git cat-file -p f86e
Update a sent
$ git cat-file -p b2b6
Hello World2

Two blob objects have been added (Notice that the blog object 4f52 has not been updated or deleted). That means a blob object is a snapshot of the file.

Commit Objects

git cat-file -p ae26
tree 4ebf5763311653c68f44db6b66c300192a7e11c4
author Tetsuro Ohyama <tetsuro.ohyama@grvo.net> 1535598879 +0700
committer Tetsuro Ohyama <tetsuro.ohyama@grvo.net> 1535598879 +0700
Initial commit
git cat-file -p 7ff5
tree d2fc8330756fc2dd131ff428a48e5a402d515cfe
parent ae269c7ccc5ce608b60038ef6923e82fb167d774
author Tetsuro Ohyama <tetsuro.ohyama@gree.net> 1535603579 +0700
committer Tetsuro Ohyama <tetsuro.ohyama@gree.net> 1535603579 +0700

Second commit

The commit objects contain the reference to a tree object, author, committer and commit message. Except for the initial commit, commits also have the reference to the tree object of the parent. There may be multiple parents. For example, if it’s a merge commit. 
Commit object is metadata and has no content, unlike a blob.

Tree Objects

$ git cat-file -p 4ebf
100644 blob 4f52b57b2a3a96457d18049ea34c6085de0e09a4 sample.txt
$ git cat-file -p d2fc
100644 blob f86effb19a7ee6cea51166c3a1438ba313794fc8 sample.txt
100644 blob b2b6f00d3432b3a12bc47e2ae31ee679f2baae92 sample2.txt

Tree objects contain one line per file or subdirectory. Each line has file permission, object type (blob or tree) and object hash and filename.

How to calculate an object hash

If you are interested, you can calculate it by the following formula.

Commit Objects

sha1("commit <size of the commit object (bytes)>\0<contents of commit object>")
# e.g.,
$ (printf "commit %s\0" $(git cat-file -p ae26 | wc -c); git cat-file -p ae26) | openssl sha1
ae269c7ccc5ce608b60038ef6923e82fb167d774

Blob Objects

It is almost the same as the commit object.

sha1("blob <size of the blob object (bytes)>\0<contents of blob object>")
# e.g.,
(printf "blob %s\0" $(git cat-file -p f86e | wc -c); git cat-file -p f86e) | openssl sha1
f86effb19a7ee6cea51166c3a1438ba313794fc8

Tree Objects

Since the hash of tree object contains the binary of tree or blob object, it is very complicated.
see: https://stackoverflow.com/questions/14790681/what-is-the-internal-format-of-a-git-tree-object

References