Understanding GIT. And Github
Part 1
You will most likely be reading this if you are a programmer, or if you work on medium to large projects. Or well, if you as enthusiastic about gaining knowledge.. like me. You may also be interested in this if you will be working remotely with other users, especially if you will be creating Open Source software.
For the past couple of months, I had been seeing the word ‘GIT’ thrown around the internet through forums, blogs and videos. At a point, I even got me a free download of the book, ‘Getting Good with GIT’ on Nettuts. I registered on the social coding website Github sometime three months ago, tried for hours to understand it, then dropped it. I have been quite familiar with using CVSs and Subversion because I have hosted a few plugins on WordPress(You will need to know a bit of those to host your plugins on WordPress’ servers). I just checked the website and the link to getting a free copy of Getting Good with GIT is not there. If you want a copy, just leave a comment and I will email it to you.
Through a bit of research and a bit of screwing and unscrewing things, I finally understand how to go about Git, and Github. I will be going through a bit of my own personal experiences but permit me to lift a few things from an article I read on an Harvard.edu document by Charles Duan.
Why GIT?
The purpose of Git is to manage a project, or a set of files, as they change over time. Git stores this information in a data structure called a repository.
A git repository contains, among other things, the following:
- A set of commit objects.
- A set of references to commit objects, called heads.
The Git repository is stored in the same directory as the project itself, in a subdirectory called .git. Note differences from central-repository systems like CVS or Subversion:
- There is only one .git directory, in the root directory of the project.
- The repository is stored in files alongside the project. There is no central server repository.
What is a Commit?
A commit object contains three things:
- A set of files, reflecting the state of a project at a given point in time.
- References to parent commit objects.
- An SHA1 name, a 40-character string that uniquely identifies the commit object. The name is composed of a hash of relevant aspects of the commit, so identical commits will always have the same name.The parent commit objects are those commits that were edited to produce the subsequent state of the project. Generally a commit object will have one parent commit, because one generally takes a project in a given state, makes a few changes, and saves the new state of the project. The section below on merges explains how a commit object could have two or more parents.
A project always has one commit object with no parents. This is the first commit made to the project repository.
Based on the above, you can visualize a repository as a directed acyclic graph of commit objects, with pointers to parent commits always pointing backwards in time, ultimately to the first commit. Starting from any commit, you can walk along the tree by parent commits to see the history of changes that led to that commit.
The idea behind Git is that version control is all about manipulating this graph of commits. Whenever you want to perform some operation to query or manipulate the repository, you should be thinking, “how do I want to query or manipulate the graph of commits?”
What is a Head?
A head is simply a reference to a commit object. Each head has a name. By default, there is a head in every repository called master. A repository can contain any number of heads. At any given time, one head is selected as the “current head.” This head is aliased to HEAD, always in capitals.
Note this difference: a “head” (lowercase) refers to any one of the named heads in the repository; “HEAD” (uppercase) refers exclusively to the currently active head. This distinction is used frequently in Git documentation. I also use the convention that names of heads, including HEAD, are set in italics.
Creating a repository
To create a repository, create a directory for the project if it doesn’t exist, enter it, and run the command git init. The directory does not need to be empty.(For the benefit of this write up, I will be using my own illustrations. Download the Github bash client from github.com. That’s where you can control your GIT. When you download it, you also get a GUI. In my opinion, it’s better to get familiar with the shell.
This will create a .git directory in the [project] directory.
To create a commit, you need to do two things:
- Tell Git which files to include in the commit, with git add. If a file has not changed since the previous commit (the “parent” commit), Git will automatically include it in the commit you are about to perform. Thus, you only need to add files that you have added or modified. Note that it adds directories recursively, so git add . will add everything that has changed.
- Call git commit to create the commit object. The new commit object will have the current HEAD as its parent (and then, after the commit is complete, HEAD will point to the new commit object).
As a shortcut, git commit -a will automatically add all modified files (but not new ones).
Note that if you modify a file but do not add it, then Git will include the previous version (before modifications) to the commit. The modified file will remain in place.
In the next articles on GIT, I will be talking about:
- Branching
- Merging and
- Collaborating