Git for Beginners: From 0 to 0.001!

Piyush Deshmukh
11 min readSep 29, 2016

--

Recently I started learning Git, and realized that it is one of the best tools that I had ever used. Linus Torvalds wrote a marvelous piece of a very fast working version control system which beats others in the same domain for many purposes. However, for now, let’s not focus on any sort of comparison but get introduced it.

Why should someone use a VCS at all?

Version control systems are really a helping hand when it comes to maintaining your project code. They eliminate a lot of problems at the first place, for instance, maintaining different versions of a code at different timestamps or making portions of your code disappear from particular versions but not solely ripping out of existence.

Consider the scenario in which you right a code on Monday, get some changes to it on Tuesday, add some more files to it the next day, make changes to those files on Thursday only to realize on start of the weekend that one of the 3 files that you edited on Tuesday contained wrong changes and now, those changes are to be reverted to the previous version. What do you do now? Re-implement the stuff that you coded on Tuesday and spoil your weekend correcting your mistakes or simply get the version of the file as it was on Tuesday?

Well, I would refer the second option and enjoy my weekend, Git saved me!

‌Can anyone use it?

Git is a version control system that was designed by Linus Torvalds, and is the second thing why I am fan of this guy. Like the first one, it is also open source, so you can start using it right now to maintain your projects, just download and install it from here. In this article, we shall look at the command line version, although there are many GUI clients in case you are interested in.

‌Git is Distributed

There are mainly two types of version control systems hanging around, centralized , and distributed. A centralized version control system is one in which all the details of the project, I mean the whole project and the meta-data is stored on a centralized location i.e. the remote server, and all the clients have to work on the server to make any changes to the project. On contrary, a distributed VCS stores all the stuff not only on a central remote location, but also at each of the client computers, basically making many copies of itself on different systems. The central location is only meant to share and collaborate code. Although when talking about Git, GitHub not only provides a remote server to share your code, but also provides many other features that help different people to collaborate on various projects.

The thing that I love most about these distributed systems is that if something goes wrong at any node, say at central remote server and it loses all its data, then you are in a big trouble in case of centralized VCSs, but in case of distributed ones, the node with the most updated version could simply push the whole code to the server. So the problem is almost resolved and you are at minimum loss.

‌So what is the least getting started for me?

Vaguely speaking, Git stores your stuff in form of commits. A commit is nothing but a snapshot of the entire filesystem, i.e. the whole layout from the project’s root directory. To initialize a project to be version controlled, go to the root directory of the project and issue the init command.

git init

The moment you start making changes to your project(which includes insertion, deletion, and modification of files), according to Git, you are making changes in the working directory. It won’t track them unless you explicitly tell git to do so. When you ask Git to track your files(we will be on the command shortly), it will put them in a special region known as staging area. Even if you tell Git to track particular files, it doesn’t mean that git has your changes stored in the Git repository(aka .git directory). In order to securely put your files into versioning, we want them to be placed in the Git repository.

git add .

The Git add command(followed be a space and then a period) adds all the files to the staging area which have been changed shortly(to be accurate, since the last commit). You can individually add filenames if you don’t want to include all files to the staging area.

git add filename.ext

At any point of time, you are free to know which files are currently in which section, are they untracked, or are placed in the staging area? You can run the status command to know this.

git status

Having done this, our changes are just staged, not committed! In order to put these changes to our local database, we issue a commit command.

git commit -m “my first commit”

The commit command has -m flag to specify the commit message, which is arguably a short message that describes technically what changes have been done in the project. So the message written above is useless one, I repeat, the commit message written above is just a fancy one, messages like this should not be written in practice, because when seen from history or the logs, these sort of messages reveal nothing more than non-sense.

There is also a -a flag with the commit command which actually can be used to directly put stuff from working directory to the Git repository bypassing the staging area, but I neither prefer that, nor I recommend it since it is always a good habit to see exactly what goes into your commit, hence, escaping the staging area is a bad habit.

Revealing the past!

Having done this a plenty of times, you have stuff stored in the Git repository, and now you may be interested in looking back at what was done when. You can run git log command to see the various commits so far with their corresponding hash values and the commit message.

git log

Git also provides another way to visualize this, which is my personal favorite, by passing “ — graph — oneline” flags along with git log which gives nice non-chronological view of commits done till now right from the starting.

Now, if we want to explore how the repository looked like during the moment when one those commits were made, then we can do

git checkout <hashvalue> 

This enables us to browse the project at that particular time instant when that commit was made.

Undoing Stuff!

One of the questions that would arise in your minds is what if I entered any command wrongly, can I undo them? Fortunately, the answer is yes. In this case, you may be looking for either of the two things:

  • Removing staged files
  • Undoing commits

What if you ran the ‘git add' command mistakenly over a file that you did not plan to go to the next commit? One of the simplest approaches is to reset the file to the previous commit.

git reset HEAD <filename>

If you did a wrong commit, may be wrong commit message, or actually a wrong commit, you would again use the reset command

git reset --hard HEAD~1

This undoes the last commit, and make your HEAD point to the commit preceding the last one. There exists another option that you may be interested in.

Interacting with the remote repository

All that we saw till now was on your own system, i.e. local repository. What about sharing your local repository with a remote computer. You can use services like Github or BitBucket as central remote repository from where anyone can access your project. The only thing to be worried about is that the central repository and all the remote repositories should be kept in sync. This achieved by two commands, one to upload stuff onto remote repository, and another to download stuff from the remote repository. ‘origin’ is the name given to default alias of the url of the remote repository.

git push origin master

The push command can be used to upload to the remote repository’s ‘origin’ branch all the stuff that was stored on the master branch on your system, so that the origin branch on the remote becomes similar to your master branch. There exists an analogous command to get your local branch updated from the remote branch.

git pull origin master

It should be noted that the pull command itself uses ‘git fetch’ command followed by merge command to download and apply the commits in order to make your local branch in sync with the remote one.

The pull and push commands are useful when you are already working on a project, but what if you found an awesome project, and want to contribute to it, how would you go?

Create a new folder, navigate to it, and run a ‘git clone’ command alongwith the url of the remote repository.

mkdir the_project
cd the_project
git clone <url corresponding to the remote repository>

This will download all the commits and the history of the project since it’s initial commit. Then you can work on it using the regular git commands.

The .gitignore File

Till now, I am sure you would have encountered another problem than simply undoing the mistakenly typed commands. Did you notice some temporary files that are generated by your text editors, or compilers, or are generated by any other way(I don’t care how), but the point is that they are generated and we don’t want to track them, we don’t want Git to even think about them. For instance, I generally work in Python, so it’s common to see .pyc files lying around. Now since I don’t want them to be tracked, I simply would mention them in the .gitignore file. The contents would simply be

*.pyc

Long story short, it is a hidden file that tells Git what files, or types of files should not be tracked. gitignore is located in the root directory of your project. There are versions of gitignore available online for different languages. You can download and customize those files as per your needs.

How actually should I maintain things?

Now, as a novice, you can keep repeating git add and git commit commands again and again, but this isn’t how version controlling should be done. You need to play with branches. Branches are a linear collection of commits. A branch may emerge from and merge into another branch. Branching means diverging from the main line of development and then continuing the work without affecting the parent branch. This enables you to work on two different modules of the project simultaneously without messing up with each other, and then merge all the changes later. The important thing to note here is that by default Git provides you with a branch named “master”. So all the commits that you did till now, were on the branch “master”.

You can create and delete branches and since their implementation in Git is lightweight; creating, deleting, and switching between them is pretty fast as compared to other VCSs. But now since we have got many branches that can exist in parallel, how does Git know which branch do we wish to work on? Git maintains a special pointer known as HEAD, which points to the tip of the branch on which we are currently working.

A commit is pointed by its parent commit and itself points to its child commit(the commit following the current commit). Analogy says the earliest commit made was parent and the latest commit made shall be the child. Branch names are themselves pointer to a commit, to be specific, the latest commit. HEAD is usually a pointer to a branch name.

To change the position of HEAD, we use the checkout command.

git checkout <hashvalue>git checkout <branch name>

In both the cases, we change the position where HEAD points to. But whenever you are working on your project, make sure your HEAD points to a branch and not a commit. If you point to a commit, Git notifies you that you are in a detached HEAD state, this means that your changes will not be saved even if you make a commit. Actually they will be saved but there shall be no way to access it since it does not belong to any branch and it’s not a good idea to remember the hash value of the commit.

To switch back to master branch, do

git checkout master

To know how many branches exist, type only

git branch

This displays list of branches, and the branch you are currently on is preceded by an asterisk.

Okay, switching back and forth between branches is done, but how does this help us maintain our project? Generically speaking, projects can be divided into subsections, like the development, testing, adding a new feature, or a bug-fix. So one of the proposed model says, branches can be organized in the same way.

The master branch should always have release-ready stable versions, i.e. anything on the master branch should be ready to be deployed. There should be another branch, let’s say ‘development’, which should run parallel to the master branch. All the development job is done on this branch and then after adding enough features to the previous release, we can merge this branch to the master branch indicating a new release. In turn, how to maintain development branch? For implementing each individual feature, you can create a separate branch(using the feature’s name as branch name would be a good idea), write awesome code, do some commits, run unit tests, and when done, merge that branch to the development branch. The same would go for bug-fixes. And you were done, if you knew one single thing that I didn’t tell you till now, how to merge two branches!

Merging is one of the concepts that are not only hard to explain, but also hard to perform. Let’s give it a try, but in case you don’t understand, it’s okay for now especially if you don’t have any prior experience with Git.

Let’s say, you want to merge development branch into the master branch(BTW congrats on the new release). So you switch to master branch and then run the merge command.

git checkout mastergit merge development

Remember, first make sure that you are on the destination branch into which you want to merge the contents. Then with the git merge command, mention the name of the branch whose contents you wished to merge. By default, Git creates a merge commit signifying that the merge operation was performed(doesn’t apply for fast-forward merge). This behavior can be overridden by passing the ‘ — no-merge’ flag along with the merge command.

Git uses different merging strategies, fast-forward, recursive, octopus, to name a few! But we won’t be talking about them, instead we should be concerned about what if merging fails.

The problem with merging is that you often get ambiguities. Imagine you have two different branches b1 and b2(don’t dare to use those names in real life, they are copyrighted for this article only) containing files with same names but different contents. What do you expect when you want b2 to be merged into b1? Which copy of the file do you wish to keep, the one belonging to b1 or the one belonging to b2? This is known as conflict. In order to resolve the conflict, you need to manually tweak into the code and perform the merge. Stackoverflow has good question if you are looking for a follow up reading.

Conclusion

I hope you have got at least a little idea if not all of it, and I expect that you shall be using this tool maintain your next big project. If you still want to learn more, I would recommend Pro Git by Scott Chacon. It has a detailed description not only of how to use and work with Git, but also of the internals and how Git works under the hood. And Lastly, our all-time friend, the documentation exists!

If you have any queries or want to give feedback on this article(I would really appreciate that), comment section is waiting for you.

Happy Coding…!!

--

--

Piyush Deshmukh

Make the World a slightly better place with all the knowledge you have - Pythonista, MOOC addicted, and exploring all that computers have to offer