Getting Started with Git

Why Git

Git is a distributed version control system (DVCS) that is used to control the version of files. A distributed version control system is a type of version control in which the whole source code including its history is copied to each developer’s local machine. The figure below shows the concept of distributed version control systems. One of the advantages of DVCS against the centralized version control system is that if the data from the server computer is deleted for some reason, it can be retrieved from any of the deverloper’s local machine. Another advantage of DVCS against the local version control system is that it allows seamless collaboration even when you’re working remotely offline.

Distributed version control system. (source)

How does git work

As shown in the figure below, in order to track the files in a folder, git takes a snapshot of all the files in the folder at the time of committing the changes. So in the figure, there are five snapshots of files as we committed five times. Also, if a file is not changed during the current commit as compared to the last commit, git will store a link to the file in the last commit instead of taking a snapshot (storing a file).

Different snapshots of data stored in git over time (source)
  • Modified: A file is called in this stage, when it is modified but not staged or committed. The file in the modified stage can be tracked or untracked.
  • Staged: A file is in this stage if you make some changes to a file and are ready to commit. The file in this stage is tracked.
  • Committed: A file is in this stage if all its changes are committed and stored as a snapshot. The changes are stored as snapshots that you can access later.
Different stages of tracked and untracked files in git (source)

One time setup

For windows operating systems we will use git bash as a command line interface for git. Git Bash emulates a bash environment on windows. It lets you use all git features in the command line plus most standard UNIX commands. You can download git bash using this link.

git config --global user.name "FirstName LastName"
git config --global user.email "email_address@gmail.com"
git config --global init.defaultBranch "main"

Set up a new repository

Let’s fire up git bash command line interface and make a new directory at your location of interest using mkdir command. After creating the git_repositorydirectory change your current directory to git_repository .

mkdir "E:/git_repository"
cd "git_repository"
git init
touch python_code.pydef git_tutorial():
print("This is the first edit in this file")
if __name__=="main":
git_tutorial()
git statusOn branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
python_code.py
nothing added to commit but untracked files present (use "git add" to track)

Tracking files

Add the python_code.py file to the staging area using the git add command so that we can commit it. Let’s check the status of the directory again. We can see in the output that there are no commits on the main branch and there are some changes in the staging area that needs to be committed.

git add "python_code.py"
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: python_code.py
touch README.md
git commit -m "Initial commit to python_code.py"
git status
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
README.md
nothing added to commit but untracked files present (use "git add" to track)
git add "README.md"
git statusOn branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: README.md
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: README.md
modified: python_code.py
git add --all
git commit -m "the second commit in directory"

Ignoring files from tracking

Let’s create a file Cred.txt using touchcommand, this file will store the credentials of the author. You don’t want git to track this file as this file will contain confidential information. You can ignore this file by putting it into .gitignore file. Open the .gitignore file and write Cred.txt . Now let's commit the .gitignore file after committing.

touch .gitignore
touch Cred.txt
git add .gitignore
git commit -m "Committing .gitigonre file"
git status
On branch main
nothing to commit, working tree clean

Inspecting changes to files

Until now we were using git statusto see which files have been changed. To see what’s changed within a file we can use git diff. Let’s change the python_code.py and README.md file and stage the changes in python_code.py . Basically, I added print("This is the third edit in this file) in the python_code.py file and This is the second change in this file in the README.md file. Let’s stage changes in the python_code.py and check the status.

git add "python_code.py" 
git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: python_code.py
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: README.md
git diffdiff --git a/README.md b/README.md
index 1ee9ca2..abb2fc3 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,2 @@
-This is the first time I changed this file.
\ No newline at end of file
+This is the first time I changed this file.
+This is the second change in this file.
\ No newline at end of file
git diff --stageddiff --git a/python_code.py b/python_code.py
index 0d7c240..134a401 100644
--- a/python_code.py
+++ b/python_code.py
@@ -1,6 +1,7 @@
def git_tutorial():
print("This is the first edit in this file")
print("This is the second edit in this file")
+ print("This is the third edit in this file")
if __name__=="main":
git_tutorial()
\ No newline at end of file
  • --git represents the git version of diff. a/README.md and b/README.mdrepresent the pre-version and post-version of README.mdfile with the help of /aand /b imaginary directories.
  • The code 100644 is the “mode bits,” indicating that this is a regular file (not executable and not a symbolic link).
  • --- a/README.md and +++ b/README.md — the -sign shows lines in the a/ version but missing in the /b version. Similarly, the + sign shows the lines missing in a/ version but present in the /b version.
  • @@ -1 +1,2 @@ the part +1,2 shows that lines start from number 1 and there are 2 lines in the /b version of README.md . Similarly, the first part -1 represents that line starts at number 1 and there is one line in the /a veriosn of README.md file.
  • -This is .... shows the content from the version /a and +This is the ....... shows the content from the version /b of README.md .

Undoing changes

We can use git restore command to undo changes we didn’t want to commit. In the last section, we added a print statement to python_code.py and a text line to README.md . Let’s undo these changes.

git restore README.md
git restore --staged python_code.py
git restore python_code.py
git status
On branch main
nothing to commit, working tree clean

Checking commit history

You can check history of commits using the command git log . You can pass arguments to this command to customize the output, for instance, the argument --pretty=oneline will show the commit history line by line as shown below.

git log --pretty=onlinea28c1d2c61a72ec31457a25f676422dc03ec1b9e (HEAD -> main) committing .gitignore file
6dfd8c4e0a864bd455f6c20ea0645af3f40585e1 "the second commit in directory"
646bc1346e4aacaaf115b16194af8035a71d6ffa initial commit to python_code.py
git checkout 646bc1346e
git switch -

Sharing your work

Until now, we were working on the local computer and all the files were stored on the local disk. You can share your repository with your colleagues using GitHub. First, you have to create a remote GitHub repository using the self-explanatory instructions given here.

git remote add origin https://github.com/YOUR-USERNAME/YOUR-REPOSITORY-NAME.git
git push origin main
git pull origin main
git clone git remote add origin https://github.com/YOUR-USERNAME/YOUR-REPOSITORY-NAME.git

Conclusion

In this article we learned about git and GitHub. We started with setting up git using git bash and understood the workings of git. We also learned about setting up local and remote repositories and commands used to control veriosn of files on local repository. Finally, we learned about the commands used to ignore files from tracking, inspecting the changes, undoing changes, and checking the commit history.

References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pushpendra Sharma

Pushpendra Sharma

MASc | IIT | Data Science Practitioner | Machine Learning Engineer