A Visual Introduction to Git

Local and remote repositories, data movement between layers, and options for hosting repositories

Danny Lee
The Startup
7 min readJun 6, 2020

--

source: Presentations — A Visual Introduction to Git (pdf)

So, how does a git repo on my computer get to a remote repo like Azure, Bitbucket, GitHub or GitLab?

If we abstract out all the network technology, it’ll look like this. Our git repo is stored on our personal computers which are connected to a router in our home, work, school or café via a Ethernet network cable or Wi-Fi. The router connects to a modem, which in turn, connects to an Internet Service Provider (ISP). That ISP connects to many other networks which is the Internet. Somewhere, on the other side of the network connection is our remote git repository’s servers and storage devices.

On one side is our Local Repository. That’s just a fancy name for a designated file folder on our computer. This folder contains a .git sub-folder. That hidden sub-folder (aka sub-directory) might have a local .gitconfig and at the top-level there just might be a .gitignore file, along with all your program files and folders.

source: Presentations — A Visual Introduction to Git (pdf)

On the other side is our Remote Repository, which may or may not match the file and folder structure of our Local repo.

What’s going on Locally?

Let’s focus on our machine (laptop/desktop). On our local machine, git keeps track of changes in our Local repo by allocating different workspaces to help us classify and manage our changes. Allowing us to keep our changes neatly organized into separate workspaces that allow us to work on multiple threads in a parallel manner, tracking our changes along the way so we can undo and redo changes we have made.

THE WORKSPACE

Let’s start with the Workspace or Working Directory, this is where you make changes when you use your text editor to write code. This is the copy that git keeps for you to track the changes you are currently making.

source: Presentations — A Visual Introduction to Git (pdf)

THE STAGING AREA

Next, let’s talk about Staging/Index.

This is where we put changes that we would like to keep. Its like a save-point in a video game. You can continue to work in the Workspace after you save, but if you decide you want to return to that save-point, its very easy.

source: Presentations — A Visual Introduction to Git (pdf)

But, you can only work on one copy of the folder at a time. So, what happens if you’re working on something and your boss comes over and she gives you a list of must-do-right-nows?

Git gives you a way to put away your other work, get back to a clean slate (that previous save-point), take care of your must-do-right-nows and later on, bring back the work you stashed away.

Its got a clever name too —

THE STASH

Stuff you stash in the stash can be pulled out later and reapplied in many ways! Speaking of many, git doesn’t just store 1 change, but multiple.

They are stored as a Stack-like structure. The most recently stashed change will go to the top of the stack (i.e. stash@{0}).

Photo credits: code screen by Pixabay from Pexels, card catalog (modified) by Anastasia Shuraeva from Pexels

Some example commands [docs]:

  • git stash save “some text” — saves stash with message “some text”
  • git stash pop — removes the last stash and reapplies to working dir.
  • git stash apply — doesn’t remove and applies changes to working dir.
  • git stash list — see stored stashes, with their messages
  • git stash branch add-new-branch stash@{2} — new branch from stash
  • git stash drop stash@{1} — deletes specified stash
  • git stash clear — clears all stashes

We spend the bulk of our time in the Working Directory (or workspace). And when we have changes that are almost ready, we can add them to the Staging (or Index) area. If we are working on something, and something comes up and we need to stop what we’re doing and take care of it, but also, do not want to lose the changes we’ve made, we can use the Stash to store them.

LOCAL REPOSITORY

Now, that leaves us with just one last piece of the puzzle, the Local Repository.

Git’s M.O. is to snapshot the file system, saving the:

  • state of the files
  • date
  • author
  • commit message
  • a unique hash (sequence of letters and numbers)

And then stores the snapshot in the Local repository, and tracks past and future changes.

An Analogy

Think of it like taking a photo with a Polaroid camera. You can look through the viewfinder and change the composition all you want (coding changes), but until you hold the camera steady (git add) press down that button (git commit), there’s no snapshot. After it prints, and processes, if you write the date, photographer and put it in order in an album, you know what comes before and after that particular snapshot in time.

Once we commit our changes to the Local Repository the snapshot is saved. Git will track what changed in the directory from past snapshots, and also, what changes in the future, relative to the current snapshot.

But its still on your laptop (local computer). Your teammates can’t access your changes.

REMOTE REPOS

Which brings us to Remote repositories. So we can copy our local machine’s changes, and send an exact copy to a remote storage location. The remote repository. From there, others can access it, upload their own changes and view the history of all the changes.

At a future time, when they have completed their changes we can download those changes to our local repository or directly into our Workspace and merge them with our work.

Data Flow Between Layers

A really confusing part of git is understanding how commands like add, commit, push and pull were moving data between the layers of the Local Repository and the Remote.

For some time after beginning to use Version Control Systems like git, its not really clear where data is being saved and how these different storage areas come together to make our lives easier.

They say a picture is worth a 1000 words, and once I saw this diagram at sselab.de git became much more understandable. I’ve designed my own version, with muted colors, using Powerpoint’s wonderful shapes 😊:

source: Presentations — A Visual Introduction to Git (pdf)

Git Remote Repository Hosting

source: Presentations — A Visual Introduction to Git (pdf)

LOCAL GIT REPOSITORY

As you can see from this illustration, there’s a way to host your remote git repository locally. There’s many reasons you might want to do this, besides just the curiosity of learning how it works. The major reason I can see is that an individual, organization or company wants complete control over the security and availability of their data. You don’t even have to connect a self-hosted repository to the Internet or any outside network and it can be available only to those on the local network. Other reasons might include being able to move all your data (as a hard drive), rather than having to clone each repository from an online, remote resource. You also don’t need to worry about companies shutting down or altering terms of service, although maintenance of the source code of the git repository server software is a concern.

Here are some open source software projects that will allow you to get your own local git server up and running. I haven’t gone into depth, but if there’s interest I’ll do more research and put up reviews in the future.

Self-hosted Git Repository Software

HOSTED GIT REPOSITORIES

There are plenty of reasons for not wanting to maintain your own git server. Many of them are the same as for why cloud computing has taken off. Specialists taking care of your server storage maintenance and backup generally do a better job at a lower cost than a generalist who has to care for the server, along with local area networking, and helping your CMO figure out why their email isn’t updating.

GitHub is certainly the most well known, and was recently acquired by Microsoft. So it seems that it will continue to have industry support for many years to come. But! It’s not the only managed remote repo game in town. Popular alternatives include MS Azure, Altassian’s BitBucket, Gitlab and SourceForge.

Here’s a more complete list. Like I wrote about the self-hosted options, if there’s any interest I’d be glad to look deeper into the these different companies to weight the benefits and disadvantages of each. If you’re interested let me know!

Well, that’s it for this week. Thanks for reading and as always, I love to hear comments, suggestions and applause 😉.

--

--

Danny Lee
The Startup

a nyc based coding enthusiast, handyman, farmer at-heart, artist and future surfer