The Ultimate Git Guide to Creating Your First Repo
Addendum: I’ve recently compiled this post and the follow-up post, with a lot of extra content, into a book which can be bought here on my website. If you get any value from this post and would like to learn more about git, please check it out! They are being sold as DRM-free PDFs over Gumroad. You can check out the first chapter here.
Also, be sure to spread the love and share on social media! It would mean the world to me :)
Note: There are a lot of opinions surrounding source code (SCM) management. Git is just one way to do it. People will have strong opinions and they might disagree with mine. I wrote this guide as a starting point so please don’t consider this the end-all-be-all.
The Who
Prior to source code management, organizations often used flat files. This means all changes got copy and pasted into new text files. While this is sufficient for small projects, its often terribly lacking for more complex. Imagine trying to copy and paste a dozen people’s work into the source code!
Then came along CVS and, later, SVN (subversion). Both had their drawbacks in that you had a centralized repository for your code: any time you wanted to access the source code, you had to go back to a central server. This could be a problem if you don’t have access to it, such as if you’re off-line. Development, again, resulted in making copies of files, resulting in needless toil and tedium. [1]
Linus Torvalds invented git to help him manage the Linux kernel. This open source project has over 10,000 contributors. It was therefore important to keep many developers together and on the same page when developing code. This lead to the unique choices around decentralized that make git fairly unique. You can watch this video if you’re curious about Linus Torvald’s minset when developing git, though keep in mind that he’s very opinionated.
But what about smaller projects? If github’s popularity is any indicator, then its the source control of choice. I recall working at a hackathon where one of the team members (we were three in total) accidentally introduced “one line that couldn’t possibly break the application.” Because we used git, we could quickly rollback the changes instead of having to hunt and peck for what could’ve possibly gone wrong.
Git can be a really difficult thing to learn at first and often times beginners can wonder “why do this at all?” Git’s power comes from the ability to organize one’s thoughts around snapshots or past states. This means you can see exactly how the code existed at any given state. For example, we can know exactly what code exists on September 7th, 2017 in our production code base. That way, bugs and issues can be tackled quickly, without time wasted.
So let’s begin!
The What
First we’ll create a new local git repo on our computer, then we’ll set it up to sync with one on a remote server. For this we’ll need git
installed on our local computer and an online git repository account of some sort. There are many of these online git repository accounts, which we'll get covered later, but I'll be using Github for this chapter. Other popular choices include Gitlab and Bitbucket with reasons for using one over the other varying tremendously. [2] Given Github's current popularity, I'll be sticking with them.
Then, we’ll use this git repo to start up a simple project. I’ll stick to text files here for the time being to keep things simple, but you can substitute out for any kind of file that uses text. These include code and text files, but generally don’t include binaries like windows executables.
Finally, we’ll push this local git repo to github, complete with our changes. After we make this commit, we’ll make changes and look at what the differences are between files.
During the course we’ll touch on some basic git ideas, but not go into them in depth. These will be reserved for later chapters. The point is to get used to the “rough bare minimum” for git before going into more intense stylings that really leverage the power of the system.
The How
To start, let’s install git. This depends on the platform you’re going to be using. Regardless of which OS you’re using, we’ll be using the command line interface to do this. For Windows this is called “command prompt” and for macOS and Linux that is called “terminal”.
Most of the instructions for Linux and macOS will not vary greatly as they are similar platforms at their core, but Windows is a bit of a tricky beast because it descends from DOS. You can use the Ubuntu compatibility layer in Windows, but the git command line interface should be sufficient.
Windows Install
Start by downloading Git from the Git for Windows Site. Then run the installer. Although there is a git GUI included, I highly recommend against it as git was designed for and runs best as a command line application. If you are uncomfortable with command line applications, this is the best place to start learning how to use one!
MacOS Install
Install git by first installing brew, a package manager for macOS, then running brew install git
. Doing so will install git for you. [3] Alternatively, try opening up terminal and running git --version
to see if you don't already have git installed on your computer. Its common for macOS to install it by default.
Linux Install
You probably already know how to install it, but in case you don’t: use your favorite package manager. You can compile it from source if you’re feeling adventurous, but I always install from the package manager. For Ubuntu, this would be sudo apt-get install git
and for RHEL/Fedora/CentOS, this would be sudo yum install git
.
First Local Repo
Note: from now on I’ll refer to just terminal
when I mean either terminal as it exists on MacOS or Linux systems and for command prompt
as it exists on Windows systems. You can also right click on folders in Windows to open up a bash-like shell for running git commands, which might be preferable for you. All screenshots will be from Ubuntu 16.04. [4]
Fire up your terminal and create a directory called git-workspace
. We'll be using this to hold all of our folders. I tend to avoid spaces in my file names as they can be a pain to type out with escape keys. [5]
In the above screenshot, you can see this empty folder. We’ll be creating our project here.
The next command we will run is git init
. This initializes (hence init
) a git project. [6]
Now that its created, we can start to do some general house keeping. We’re going to create an empty text file and add some text. If we don’t, then git will tell us the repo is empty.
So we’ll add a text file. I’ll be using zile and various unix commands like touch
, but you can create these text files however you like.
We can see that this file is now created in the repo, however we have a slight problem: there are excess files in this repo. My text editor of choice, zile
, unfortuantely creates extra files for backup purposes. You can see these in the screenshot as suffixed with a "~" character. Emacs will also add similar files, which get suffixed and prefixed with "#" characters. This is fine when we're doing normal work, but these files shouldn't be committed.
We therefore have two options here. One, we can remove them. That’ll work here and its what we’ll do for the time being, but another option is to use .gitignore
. This is a text file that sits in the main directory, side by side with .git
and it tells git to ignore certain files. We'll go over this later, but I wanted to let you know that it exists.
For now, we’ll remove the file.
Ok now we’re ready to commit! In your terminal start typing git status
. This will tell you what is in your directory and what git is actively tracking.
Let’s notice a few things about this screen: we can see which branch we’re on, the last commit made, all of our untracked files, and some additional info from git.
Branches are used by git to designate between different paths of development. Generally, different people will keep their own branch and periodically merge back into the master branch. [7] In the untracked files we can see the text file we just added, which gets colored red in my terminal here (this may or may not be the cast for your terminal).
Git does not add all files into a commit by default. This is important for workflows: git does not force you with how you want to structure your commits and tries to be flexible. You may want to separate out different commits from one sessions if they are part of different feature sets or you may not want to commit all your code at once if you’re trying to do a quick fix in one specific area.
For us to stage this one file, we can either run git add first_file.txt
or git add --all
. The latter includes a flag which will add every file that is "untracked" into the commit. The former is more fine tuned for adding specific files. Its generally good practice to add files specifically as you want to, one by one. To keep up these good practices, this is what we'll do.
Our commit now has one file added to it! Git luckily tells us we can remove this file too, which I’ll show as a demonstration below:
Though its not good practice, you can use the --all
flag as I mentioned before. We'll add the file back using this flag just to show you how it works.
Now let’s commit! There are two ways to do this and I’ll go over both. I’m going to touch on a lot of terms, some of which will get full posts devoted to them, so if you feel confused take a deep breath and keep reading. First is to run git commit -m "<message>"
where "" is the message you want to associate with the git commit. This goes in as the header for the git log for that commit. Let's run that and see what happens:
Whoops! Git needs some extra info here. Its asking for us to set up a default email and name. These are important as they help identify the person making the commit. They may not be useful because for now its just us, but as projects build up we’ll want to keep these together for different users. I’ll type in my email and name here.
Success! We can see that git successfully commited our changes and told us what happened. 1 file got “changed” (in this case created) and 3 lines of code were inserted into it. Git will also tell us of deletions if there were any.
Notice the wording here:
[master (root-commit) 6ce670b] my first commit!
Git automatically will chop off the message if its longer than 70 characters, which our message of “my first commit!” is clearly not. It also creates a hash in the form of “6ce70b…”. This may not seem important, but it is one of git’s big features: by using hashes, we can quickly jump between different commits using commands like git checkout
. [8] It allows us to identify the commits made. That hash displayed here is actually just the shortened version, which is good enough for labeling, the longer version can be displayed if we run git log
This is a “prettified” version of the git commit. We can see the full commit hash displayed in brown, the author, the date it was commited, and the message in the log (note again that the colors may or may not display for you). When assembling a long list of code changes in a complex repo, this helps us track down who made changes where, when, and why. We’ll be getting into that more as we progress.
Now to explore the second type of commit: let’s edit our one file, fill it with random text and run git commit -a
which will open up the default text editor we set with git and allow us to type out our git log there as well.
There are a couple of points here to note. One is that we now have a modified file, which git is telling us about. Its currently not staged, so it won’t be added to the commit (in fact, git is telling us there’s nothing to commit because nothing is staged). We also have an untracked file from zile
, my text editor. We could remove that like last time, but I'll leave it in. Git defaults to not touching it, so it will ignore it unless we tell it otherwise.
Let’s stage out changes from “first_file.txt” by running git add first_file.txt
and then committing the changes with git commit
. We first need to set up a text editor for git to call when making changes. [9] I'll use my favorite editor zile
, but you can substitute for whatever editor you like. Popular ones include vi
for Linux and textedit
for MacOS.
git config --global core.editor "zile"
We’ll go over the command word by word. config
refers to changing git's default configuration. The --global
flag refers to the global setting of this configuration, meaning it will apply everywhere. Otherwise, the config changes will just apply for this repo. I like zile
and don't intend to changing it anytime soon so I'll keep this global. Then we change the value of core-editor
to be "zile", my editor of choice.
Now we can run our second commit.
Then we click “enter” and voila, we enter a commit page.
We can see the text editor now open with info about our upcoming commit. As it tells us, the #
symbol is used to denote comments. These will not appear in the git log once we commit. It uses these comments to let us know what changes are being made and which files did not get staged or are still untracked. Because I did not add or stage my first_file.txt~
, it will not appear in the commit. We can add multi-line commits here now, which is pretty neat.
A git log should have both a header and a body: you have the “what” as the header and the “why” as the body. This will allow for others to quickly glance at your code and have an understanding of what’s going on without needing to run through every code change. I’ll fill out the git commit message here with some info. Its considered good etiquette to keep these lines around 70 characters in length so they render well on terminals.
Now when we run status it’ll show that we’re all good with the extraneous file still being untracked
So how can we look at our changes over time? We’ll run git log
to see them.
Look at that! All of our commits sitting nice and prety. We can see our headers and body of messages, which is nice, along with authors, which for now is just us.
If we want to analyze differences in the code, we can run git diff
. The exact phrasing of the command is below. Note that I don't use the full commit hash as the hashes are unique enough that its not necesary.
# git diff <1st_commit_hash> <2nd_commit_hash>
git diff 6ce670b ce8ff52
The exact hashes may be different, so use the commented command if you are following along to this book. You’ll see that this produces something like below, which again is dependent on what’s in your code.
This looks a bit sloppy at first glance, but as you get used to seeing these, the patterns will emerge to make them useful. At top, we see the diff
command details. diff
is actually a very old unix command that will display the difference between two files. What git is telling us here is that its running this diff
command across the two commit files.
Next, we get the specifics of the file in question. “Hello World!” exists in both commits, so its denoted in white with no extra “+” or “-” sign prepended to it. This means its stayed constant across the commits. This is followed by a blank line and then a line with a “-” prepended. We’ve deleted this line from commit 6ce670b
and git is telling us so. We then added the lines prepended with "+", which includes new code and a blank line. Git also lets us know that there's no new line at the end of the code, which is sometimes important.
So what is git doing here exactly? Well its keeping just the changes between the two files. It is not keeping the full files on disk (not usually anyway). When you make a commit with changes on a file, git will remember what got changed and just keep those changes. How it does this specifically is very complex, but it makes use of some clever hashing to do so.
Setting up a Mirror Repository
Now we’re going to setup a mirror repo. This will allow us to keep our local code base on the internet in addition to locally. As stated before, we’ll use the very popular Github.
So we’ll log into Github, whose URL is http://github.com/. There we’ll see the below screen (note that I took this in August of 2017).
Then we’ll register a new account. They will ask for your email and password, its a straightforward process. Note that Github allows for unlimited free public git repos but not any private git repos. For those, one needs to register. You can get free private git repos through either Gitlab or Bitbucket. This guide will focus on Github.
Following registration, one will see a screen like below, which will enable you to start a new project. Note that you will need to confirm your account as well.
Now we can create our repo! Click “Start a project” and start your new project. You’ll need to give your repo a name via the “Repository Name” field and an optional description. Again, Github will only let you make a private repository if you choose to pay for it at $7/month. Do not initialize with a README.md, I will explain why in a minute.
This repo will be empty, which is exactly what we want. Github gives on what to enter, but we’re a little farther ahead so we’ll skip some of it.
First we need to tell our local git repo where the github repo is located. This involves the git remote
command. So go to your git_workspace
directory and run the following command:
# git remote add origin <url_to_repo>
git remote add origin git@github.com:sheepsneck/first_repo.git
Git is flexible in what formats it can use for sending the code. What we’ll use here is the HTTPS format. We’ll touch on the SSH format in later chapters.
git remote add origin https://github.com/sheepsneck/first_repo.git
Now we’ll break down the command. git remote
deals with everything that our local git config is using to talk with remote repositories like the empty one on github. When we call 'add', we are creating a new remote repository for it to talk to. This comes with a nickname like 'origin' (also called an alias) and a URL. The format of the url looks like a website url and that's because its an https format.
Below is a screenshot of the github repo url being added to our local git repo. I ran the command git remote get-url origin
to verify that it is indeed inside of the repo.
Now we’ll do our first push! Run the following command to push to the github remote repository:
git push -u origin master
The command git push
will take whatever is on our repository and attempt to put it into the remote repository. Git will look at the remote repository, look at the local repository, see that they look the same and push. If the two repositories look different, git will notify the user and request that a merge
gets made. We'll go into that in a later chapter. For now, git sees that there's nothing on the remote repository and will push the changes through.
The -u
flag means that this repo will be setup to be tracking. The remote repository will be used to follow whatever is on our local repository.
Lastly, the origin master
portion of the command refer to what nickname for a remote repository we are pushing to and what's the branch we're using. We already know origin
refers to the github repo, so what's master
? Well, that's the branch name. Git uses branches for different development types. We can check these branches by running git branch
As you can tell, we only have a ‘master’ branch. In the future, we’ll tackle projects with many branches, but for now we’ll just keep this one branch. Know that git will setup a ‘master’ branch by default unless we specify otherwise.
At the push, you will be prompted for a username and password. Enter the ones you used to create your github account.
If the git push
was successful, then you'll see information from git letting you know how many objects (read: files) were included in that git push. You'll also see information relating to compression of the files, the url used, and the branch it was pushed to.
Now we can look at github to see our changes.
We’ll notice a few things here. One is that we can see the number of commits present for this repo. We also can see the number of branches associated with it. We also know the number of tagged releases and the number of contributors active. Github has handly utilities for us to clone or download (via tarball) the whole git repo. We can see the timeline on the last git commit and how long ago it was.
Let’s look at the ‘Insights’ tab and see what else is present on this repo.
Well we can see that github defaults to telling us of the community aspect. This is useful for open source members to gauge popularity and use of a codebase, but for us its not so exciting. In fact, it’s not really important outside of the open source community. What is cool is the ‘Network’ button.
It may not look it yet, but this network will be really interesting as time goes on. We can use it to visualize branches by seeing where features came out of, where they got merged in the ‘master’ branch, and what users wrote them. For now, this is what it looks like. If we hover over a node in the network, we can see what committ that node corresponds to.
If we click it, we’ll be taken to our git commit message and a diff.
Wow doesn’t that look much better? The terminal can be confusing to use and many people learn to take a strong liking to visualizing code data with tools from a git repo company like Github, Bitbucket, or Gitlab. The information here is identical to running git diff
or git log <hash_id>
but is presented in a prettier format here. All online git hosting companies will have some feature like this.
We can also scroll down to the bottom of the page and put comments on code. This is invaluable if you are an organizaton: a product manager can make comments on people’s committed code during a code review to let them know where he agrees or disagrees. We’ll get into more detail on this in a later chapter.
Summary & Wrap-Up
So thats your first created git repo! It may seem long and arduous, but as you commit code back and forth, you’ll grow to love how git allows for easy branches, easy merging, and quick reversions in case something goes wrong. By using powerful visualizations like ‘Network’ on Github, we can better leverage these features, creating wonderful images that really show how our code base is coming along instead of just guessing.
And that’s that!
If you liked this post, please spread the good news! You might also like my full length book, which includes this tutorial along with four others and includes two free ebooks on using git in practice. You’ll learn all the hard parts of git involving branching, rebasing, reverting, and working with multiple developers.
There’s a follow up to this post as well on operating as a solo developer.
The Why
Q. Why use git at all? why not use “flat foot” files?
Keep meta data like who made what change when can get difficult quickly when using flat files. It creates bulk if kept as a series of comments and can be almost impossible to enforce if kept as separate text files. By using git, you’ll always know that there are logs and you’ll always be able to find the person to blame. All this is accounted for and can be accessed on a need-be basis instead of having to always look at some marker in the textfile that can change.
It also allows for one line changes. Not discussed here but in later chapters, commands like git rebase
can be used to quickly move the current local git project to another hash which can save the day when that one line of code change causes the whole server to tank!
Q. Why doesn’t git like binaries? why only code?
Git uses special algorithms under the covers to track changes. These rely on only tracking changes when necesary. Git doesn’t bother keeping 6 versions of a text file for 6 different commits, but instead keeps only the changes in lines. This enables for small code bases and quicker changes.
Binaries on the other hand are very large and can change drastically with a small change. So git has to spend a lot of effort with them. Therefore, it just keeps a different version of the binary for every commit. That’s not good: repos will fill up quickly if you keep piling in binaries. Everybody who commits to that repo will be carrying around a very large binary in their source code base as a result, resulting in slower performance and much frustration.
Q. Why use online respositories? (mention hosting your own) differences?
Online repositories allow you to keep copies of your code base for backup as well as collaboration. Different users can now make changes to the code base to enable good collaborative efforts. This gets complicated, so a future chapter of the book is reserved for this.
The different online repositories differ largely on look and feel and less on functionality these days, though many on stackoverflow and other site might disagree with the author on this.
Some, especially large companies, will choose to host their own code repos entirely. Bitbucket is one such provider, you can also just use their service if you don’t have the need to host it yourself. Gitlab is an open source alternative that will allow for relatively quick setup.
Of particular interest is Gogs which is simple to setup and will run on limited hardware like a raspberry pi. Its not covered how to set one up in this book, but you can access setup a git repo using guides like this one if you’re interested.
Q. Why use the commandline and not a GUI?
There are GUIs like SourceTree and GitKraken that one can use instead of the terminal client, but I have never found them useful.
They often times rely on extreme standardization of git practices. That means if you use git in any way outside of how they expect it, then they’ll melt down and can virtually destroy your git repo by making it impossibly controverted. Sometimes mistakes happen in your git code repos and it is a nightmare trying to do the proper surgical procedures. Its impossible if your GUI is working against you.
Also, they can trap you. SourceTree is an Atlassian product along with Bitbucket and there’s little help for you if you’re not using Bitbucket. Gitkraken is a Github product and its likewise true with Github. Thus leaving for another code repo provider can be impossible in such situations.
Lastly, some, like Gitkraken, are not free if you’re using it privately. As you come to rely on tools, it can be annoying to realize you’ll have to pay at some point. Git, by the way, is totally open source under the GPL v2 license.
Q. Why aren’t you using git commit -a
?
Some guides, including git’s suggestions, say to use git commit -a
. This will stage all files for a commit and then commit them. That's not always helpful if you're trying to be fine tuned about your git committing. It can, in some instances, be necesary if you have a large number of files to commit. In our case, this wasn't the situation so we instead chose to commit one by one.
Links to Git Documentation on Commands in this chapter
Footnotes
[1]: Ok, ok, this isn’t strictly true, but I find it generally true. If you want to seek out more info on SVN, CVS, and other source code management systems, I suggest you look them up for yourself. Also of note is Git’s closest antecedent in Bitkeeper. Both SVN and CVS have workarounds and addons and extensions to do what Git does, but its not nearly as simple or intuitive as how Git works.
[2]: Most of this ends up being personal philsophy. Some people don’t like the politics of Github while others detest error tracking in Bitbucket. To each his own!
The important difference as far as I’m concerned is what the free tier allows for: Bitbucket gives you unlimited private repos, Github gives you unlimited public repos, and Gitlab gives you unlimited both.
[3]: Brew is popular for installing packages of all sorts, including Node.js.
[4]: Actually GalliumOS on a chromebook, but its the same thing.
[5]: Spaces need to be typed out with backspaces. This means a folder called “Hello World!” looks like Hello\ World!
.
[6]: In MacOS and Linux, the initialziation will create hidden folders and files will be prefixed by a period (the folder is called .git
).
This hidden folder is .git
and it contains a lot of additional information inside of it (Note: if you just run ls
, then you won't see it). We will rarely touch these files as git does most of the work. I'll go into them on a need-by-need basis. To satisfy any curiosity you might have, I'll explore the folder.
[7]: This will be gone over in later posts but I’ll briefly describe it: The idea is that you have a local in-progress copy of the code base on your personal computer that is separate from the master copy that is your running application. From time to time, you’ll make the master copy of your application what’s currently on your personal computer.
[8]: There are quite a few ways to do this actually. Another one is git rebase
, which allows you to change which branch is the master branch in the process. Managing multiple branches is a key concept in Git and will be explored more fully in the chapters to come.
[9]: Git does have a default text editor, which is sometimes GNU Nano and other times vi
, both of which I dislike personally. If your text editor can be called from the terminal, then it can be set here.