Understanding Git and GitHub

I first started using Git and GitHub a few years ago when I began a full-stack web development program called Launch School. After learning the basics at the beginning of the program, students are encouraged to commit each problem and assignment from there on out, which I did as I progressed through the courses. Here is where I went wrong: thinking I was being efficient, I committed the assignments in bulk at the end of each lesson. I had learned about the tools, practiced with the exercises, and become comfortable with the basic commands, but in a way, I was missing the point of Git and GitHub completely. In truth, the role these tools play in a developer’s workflow is simple, and I’m willing to bet you are already familiar with some of the underlying concepts.

Let’s start with Git. Git is a distributed version control system. Okay, but what does that really mean? Let’s break it down.

First, it’s a version control system. Linus Torvalds, the creator of Git, described the tool as a ‘stupid content tracker’. He said, “…you can just see git as a filesystem — it’s content-addressable, and it has a notion of versioning, but I really really designed it coming at the problem from the viewpoint of a filesystem person…” (Source). Linus’ definition of a filesystem is undoubtedly more detailed than yours or mine, nonetheless, this reference provides a useful starting point for understanding the fundamentals of Git.

Many people have organized their documents into folders, worked with files in a word processor, and, at one point or another, learned the value of a save button the hard way. If I think back to my high school papers, I didn’t just save one time like I was doing with my coding assignments. I saved dozens of times before turning in a paper to my teacher. I would click ‘save’ at the end of each night, after I wrote a new paragraph, and if I took a break to get a snack. A word processor has features like the save button to help you manage your work.

Git is simply a more powerful tool to help you manage your work, any type of work actually — not just code. Not only does it keep track of the latest version of your file, but every version, which can be extremely useful when you need to change back to a previous state. This was the bigger picture I initially missed.

Let’s think of a file as having a linear timeline. There are two directions you can go: you can move forward to save your changes and move backward to undo your changes. The Git commands to save a version of a file are git add <filename> and git commit -m <description>. The Git commands to undo are only a little more complex as you have a choice of several options depending on your situation.

When working on one file for which you want to throw away the most recent changes, use git checkout <filename>. (Be careful using git checkout. If you use git checkout with a commit ID instead of a filename, you can end up with a detached HEAD. This is important to know because if you make changes to a detached HEAD they will not be retrievable. More info here).

When working on a bunch of files for which you want to throw away all the most recent changes, first, use git diff in order to double check you’re not getting rid of any changes you actually want to keep. Once you are sure you want to undo the changes to all your files, use git reset --hard HEAD. By using this command, you don’t have to use git checkout with each file individually.

Use `git log` to view commit IDs

You can also change your files back to a specific point in the project’s history. First, use git log in order to view the ID of a previous commit. Copy the commit ID and paste it into this command: git reset --hard <commit ID>. This will throw away any recent changes as well as any commits that came after this point in the project’s history. (So far, we’ve just been talking about undoing local, private changes. Things get a little trickier when undoing public changes. We won’t get into the details in this article, but it’s important to note that git reset should not be used on public repositories as this command may rewrite project history someone else is depending on. More info here).

If that last sentence about rewriting history was confusing, don’t worry. We’ll discuss the second part of the Git description now, which will hopefully help you understand that last bit.

Up to this point, we’ve covered how you can use Git to move between multiple versions of files on your local machine. In essence, this is what makes Git a version control system. More specifically, though, Git is a distributed version control system. The ‘distributed’ part refers to how Git allows multiple people to work on a project. Sometimes it is easier to understand a concept if you compare it to its opposite, so let’s start there.

A centralized version control system will have one master copy of the project. Each person can pull a part of the project to their local computer, make edits, and push the changes directly to the master copy. Some of the disadvantages of a centralized system and, perhaps, reasons why it’s no longer an industry standard are a centralized system requires a connection in order to make changes, effects everyone when buggy code is committed, and is at risk of a single point of failure if that one server goes down.

A distributed version control system is different. People contributing to the project will pull all the files (and history) down to their computer. Each person can make changes locally (online or offline), and when ready, push the entire project back to the server. It’s common practice to push the edited project to a remote branch, which is a copy of the project on the server. This way the changes can be reviewed, and the branch merged with the master copy only when the code is bug-free and ready for production. A distributed version control system is fast, solves the problem of a single point of failure because anyone’s copy of the project can be pushed to the server for recovery, and allows many people to work simultaneously on the same files. Two team members can even work on the same files at the same time for completely different features. Git is a very popular, but not the only, distributed version control system.

To use either a centralized or distributed version control system, the project needs to be hosted on a server or computer somewhere. Not every developer and business, however, wants or can host their own server. For those using Git to manage their project, this is where GitHub comes in. Most simplistically, GitHub is a very popular, but, again, not the only, Git repository hosting service. This tool provides easy collaboration for team members working on projects using Git. Click here to see a short, animated video about how GitHub facilitates this collaboration.

Git and GitHub are simply tools to help you manage your files, get them back to a working state when you mess up, and safely share your progress with others. For those who are learning to program, I hope this more basic overview of Git and GitHub will help you keep the bigger picture in mind as you continue your programming journey, learn additional Git commands, and start working with other developers.

Remember: it takes time for a new programming concept to settle into your own mental model or for a new tool to fit properly into your workflow. My understanding of Git, especially the collaboration part, did not click until I started working on a team. When I first started using Git, I was working on my own as a student: I had a naive sense of how things connected, all I saw was the computer in front of me, and I was nervous about messing up my files. Even when I did join a team, I was still a little nervous about messing up the files. This, however, is what circular learning is all about; We learn, we practice, we learn again — sometimes repeating previous lessons. Each time we revisit a topic, we deepen our understanding and become more comfortable.

A recommended simple guide to Git:

Other References: