Git Basics 1

Git is a distributed version control system. If a document is under version control, this simply means that a record is kept of all changes made to it — enabling the user to return to a previous state of the document (or entire program) with ease.

When you work in software teams (and many other types of teams besides — we also used version control systems in my previous career as a lawyer), you need a way of tracking what changes are being made to a document in an organised way — particularly as several people may be working on the same document at once. A good VCS will enable you to resolve conflicts in a smooth and orderly way. You also might wish to revert to a previous version of a document and disregard attempted changes which did not lead to a useful result — which means you also have the freedom to attempt a possible solution to the problem with the comfort of knowing that you can easily return to a previous state if it does not work out. Further, a clear history of changes to a program is often an instrumental tool in finding the cause of an expected bug.

Git is by far the most popular VCS in the software world. It is free and open source. It was developed by Linus Torvalds, who previously created the Linux operating system kernel. It is a distributed system, which means that the ‘final’ version of the project, including the record of all historical changes, is not held only in one place (in a central repository), with developers checking out one document at a time, or one document’s history of changes at a time — instead, the entire project’s history of changes is re-downloaded every time the project is pulled from any other repository or ‘repo’. This means that if one system containing the project is lost for any reason, any developer who pulled it previously will have the full history saved at the time it was pulled. It also enables a developer to work on any part of the project with or without internet — they can commit the record of changes to their local repo as they like.

Although it is not required, often a team of developers will have an equivalent to a ‘central repo’ in the form of a remote repo — usually hosted by GitHub (which is not the same as the VCS Git!). Although the full history is downloaded every time a developer pulls from the GitHub repo, we often like to think of GitHub as hosting the ‘ultimate’ version of the project. Developers will only push their changes here when they are happy with the record of changes that they have made (amending previous snapshots if required, and so on).

Git is so popular because of its features. It is smart in deciding on the history of the file tree (detecting name changes to documents automatically, for example, and document splits and rearrangements) and uses a secure hashing algorithm called SHA1 to protect its content from, for example, secret alteration by hackers.

I have already installed Git and will not be explaining the basic setup here.

Basic Git Flow

Initialising a Repository

If you are beginning a brand new project, go to the root folder of the project and enter:

git init

This creates a new .git subdirectory, which essentially means that you have placed the project under Git version control locally.

As a working developer, however, you will often be joining a project which is already underway with a remote repo saved in Github. To clone such a repo, enter the following command in your root directory (which should usually have the same name as that of the remote root directory):

git clone <repo url>

The files will magically appear!

Now, add your first code or code changes. For example, create a failing test then write the source code to make the test pass. At this point, you may wish to save your project’s file change history.

gitignore

If you are setting up a project for the first time, you’ll want to add a .gitignore file before your first commit. This is simply a list of the files that Git will not add to its history of file changes. You don’t need those swap files and .iml files and so on to be saved to the version history! This is unnecessary and potentially a source of conflict for other developers’ equivalent files.

The simplest way to generate a gitignore file for your project is to navigate to gitignore.io and generate a standard gitignore template based on your language and other software that you are using, such as your IDE. Copy the template and save it as ‘.gitignore’ in your root directory. From now on, Git will not ‘see’ such files and they will not be included in version control (note that they will still exist in your actual files, of course).

You can then run git status which shows you the state of the working directory and the staging area, including which files are being tracked and which have been modified. You can then add any additional files that you can see that you do not want to be placed under version control. In the project I have just begun, for example, I added ‘*.iml’ to ensure that all iml files would be ignored.

Saving changes

A Git commit is equivalent to a traditional ‘save’ in normal document modification.

There are 3 ‘areas’ in the Git work flow. You edit files in the working directory. When you’re ready to save a file, you ‘stage’ its changes using git add which adds changes to the staging area. When you’ve finished updating all files in the current save, you commit the state of the staging area with git commit.

Note that you can add the -m flag to create the commit message to be saved along with the commit in the command line. The canonical format for a message is to summarise the entire commit on the first line in a brief sentence then detail a full explanation after a paragraph break — normally in present tense (“[This commit] Changes this/Updates that”) so that they read like actions on the repo.

The staging area essentially allows you to group related changes (across documents) into highly focussed snapshots/milestones before actually committing the whole lot to the project history. It’s a buffer zone between working directory and project history — in the same way that each developer’s local repo will be used as a buffer before the final version is pushed to the remote. If required, the developer can go back and edit snapshots before pushing to the remote, so it is often helpful to think of the remote as the centralised ‘ultimate’ history of the project, even though everyone has the full history saved to their system at the point of the last pull.

A key difference between Git and other VC systems is that Git records the entire contents of each file in every commit instead of simply a list of file changes. This makes Git operations much faster because a particular version of a file does not need to be assembled from its diffs — the complete version is immediately available from Git’s internal database.

When you are happy with your save/commit — or perhaps after you have made several commits — you can save your repository to a remote repo so that there is an external backup of your project. If you did not clone your repo from a remote initially, then likely you will need to create a new remote repo to connect to — and the easiest place to do this for new developers is GitHub.

You will then need to establish a connection to this remote repo with the following command:

git remote add origin <repo url>

The ‘origin’ part is the nickname for the repo url, so that in future any reference to ‘origin’ will indicate the repo at the url. If you cloned your repo initially, the connection would have been established automatically.

You can then ‘push’ your changes to the remote repo using:

git push origin master

(Where ‘master’ here indicates which branch you wish to be pushed to the remote called ‘origin’).

The commands above are enough to set up your first Git local repo and connected remote repo, enabling you to keep your project under version control all within one central master branch.

The next topic to cover is using Git branches.

--

--