GitHub for Biologists

Proto Bioengineering
9 min readNov 15, 2022

--

Or, How to Stop Saving 100 Versions of Your Code to Your Cluttered Desktop

Do your research projects look like the folder below? Do you have 5 copies of the same file with different versions of your code?

Do you comment out huge chunks of code just in case you’ll need them later?

Hint: it’s missing an `int` somewhere.

Coding wasn’t supposed to be so messy was it? Didn’t we leave the chaos of pipettes and petri dishes for the cleanliness of a laptop? How do software engineers do this every day?

Well, software engineers actually don’t. They solved this code clutter problem a long time ago with “Git.”

Photo by Yancy Min on Unsplash

Git is a tool that keeps track of all the versions of your code. It stashes every change you’ve made hidden away from your view in secret files. While you only see one file, cell_analysis.py, Git remembers every version you made before it, down to the specific lines and characters that were changed.

Green lines are new code. Red lines are deleted lines of code.

Git allows you backtrack to old versions of your code, save pieces of it for later, or switch between different versions with a few commands. That way you can make 5 different attempts at solving the same problem, and save each of them as a separate version, while keeping 1 working version as the “main” version.

Is Git the Same as GitHub?

Git and GitHub are used in concert with each other, but are not the same thing.

Git is a “command line tool.” GitHub is a website.

Git is a tool on your computer that is used to track versions of your code. It can be used via the command line (shown below) or by a visual tool. If you have a code project for analyzing gecko genomes, you will make a Git “repository” (basically a folder of code) and save your gecko-analyzing code to that repository. Repositories are free, they live on your computer, and you could make a million of them if you wanted to.

GitHub is a website that people around the world use to showcase and collaborate on code and is based off of Git. Bitbucket and GitLab do the exact same thing as GitHub. People take their own Git repositories and upload them to GitHub, Bitbucket, GitLab, or any similar site to share with the world, their coworkers and labmates, or keep it private to themselves.

How to Actually Use Git

This article makes heavy use of the command line. If you’re new to the command line, check out our beginner article on using it here.

The Steps for Using Git

First, download Git.

After that, there are only 4 steps you really need to know to start.

  1. Create a Git “repository” (basically a folder of code)
  2. Tell Git which files to add to the repository
  3. Take a “snapshot” of the code files as they are
  4. Send the repository to GitHub

And every time you make a change to your code after this, you will tell Git to take a new “snapshot,” just in case you want to go back to that snapshot later. Git will keep track of all the snapshots.

We have to tell Git which files to add, because maybe we don’t want Git to accidentally copy our 100 GB database of DNA or the Word document with all of our secret research ideas and upload them to GitHub.

The 4 Steps in Code

The first time you make a repository will look like the following. Type the commands below in either Terminal (Mac) or Command Prompt (Windows).

You can pick a folder on your computer with some of your own code to try these steps (though its safest to make a new folder with some blank files, since you’re new to Git).

Navigate to that folder in the command line and run the following:

  1. Create a Git repository with:
git init

2. Tell Git which files to add to this “version” of the repository.

git add my_code1.py my_code2.py

3. Save a “snapshot” of them with a short message explaining any changes that you made. The -m means “message”.

git commit -m "I made a new git repo"

4. Push your repository to GitHub. This is the hardest part, since it requires some linking of accounts, so it is covered in detail in the section below.

git remote add origin git@github.com:USERNAME/REPOSITORY.git
git push -u origin main

An example of some commits and their messages:

Make your commit messages more detailed than these. These are vague as an example.

The first 3 steps above are 80% of what you need to know to use Git. The 4th takes some setup at first, but after that, is as easy to do as writing git push.

The steps of Git are merely:

  1. make code changes
  2. tell Git to add the changes to the current “commit” (or snapshot)
  3. commit them

Change your code, add, commit, repeat.

After that, we can send our repository to GitHub (which is optional).

How to Publish Code to GitHub

The 4th step from above, where we actually publish the code to GitHub, is the most complicated at first (account creation, etc.), but once the setup is done, it will only require one step:

git push

git push tells your computer to send your updated code to GitHub. Each time you make a new commit , you must also push it to GitHub for GitHub to know about the change.

To publish code to GitHub, we need to:

  1. Create a GitHub account
  2. Create a new empty repository on GitHub
  3. Tell Git on our computer where the link to our GitHub repository is (git@github.com:USERNAME/REPOSITORY.git)
  4. Use git push from the command line to send the code from our computer to GitHub

Setting up GitHub for Windows and Mac OS

These articles and videos cover the technical aspects of connecting your computer’s Git to your GitHub account:

The YouTube videos above are free and cover in-depth technical things, like authenticating with SSH (i.e., getting GitHub and your computer to trust each other so that bad guys can’t mess with your repositories).

Using Git in the Real World

Now that you have GitHub set up, here is an example of using Git and GitHub day-to-day as a scientist.

Say we have a code project that analyzes gecko genomes and all our gecko code is in a folder called Gecko Genome Analysis on our computer.

We have these files in the Gecko Genome Analysis folder:

To create a Git repo and start tracking changes to our code, we do the following:

  1. Open Terminal or Command Prompt.

2. Navigate to the Gecko Genome Analysis folder via the command line. If your Gecko Genome Analysis folder lives in your Documents folder, you can get to it by typing cd Documents then cd Gecko\ Genome\ Analysis. (The \ symbols help the command line know that the spaces are part of the folder name.) If you’re lost, pwd will tell you which folder you’re in.

Get to your Gecko code folder with `cd`, then verify that you are in the right folder with `pwd`.

3. Type git init and press Enter. This creates a new Git repo to track the current folder (Gecko Genome Analysis).

4. Type git add . to add all of our files to the stage. The . is important. On Linux and Mac computers it tells Git to add everything in the current folder.

5. Type git commit -m "First commit" to commit the files. Our message, "First commit", lets others looking at the repo know that this is the first code ever added to this repository. Later commits might say “Fixed user input” or “Added lizard analysis option.”

Now, we have our first Git repo!

But what if we want to make a change to our code? The steps are almost identical.

If we want to add a new file — maybe one that allows analysis of lizard genomes for comparison — we can create that file, put new code in it, then git add and git commit that file.

For example, if we make a brand new lizard.py file, we can:

Adding a new file to our Git repo via `git add/commit`.
  1. Stage it with git add lizard.py (first line on the right image above)
  2. Commit it with git commit -m "Added basic Lizard class" (second line above)

That’s it. We have made our second commit!

To send our second commit to GitHub, if we setup GitHub according to the sections above, we can write git push and it will send our commit to GitHub. (If you get an error, try git push -u origin main).

Then, to make a change to lizard.py and send that change to GitHub as our third commit, we’ll do the same thing:

  1. git add lizard.py
  2. git commit -m "Added another function"
  3. git push
The process for making a change to our lizard code and committing it to Git.

Bonus: To see the difference between the old and new changes, type:

git diff --staged

The output of this is shown in green above on the right.

How to Undo Changes in Git

This article from Atlassian covers git reset, the command for undoing changes in Git.

There are a few ways to use git reset, such as git reset --soft , which uncommits things, but let’s you keep all the code. And there is git reset --hard which deletes everything, all the unwanted commits and their code. More details and options for backtracking to old code are available from Atlassian.

Why Share Your Code with the World on GitHub?

GitHub is the largest website for sharing Open Source code with the world.

Companies, like Google and Netflix, share code from some of their big projects with the world. Check out this list of Open Source projects from big tech companies.

In the same way that you publish your research to journals so that others can use your methods or cite your research, software engineers publish their code so that others can build off of it for as little a cost as a coffee donation or free.

From Homebrew to Linux to Python to JavaScript, the world is open source.

If you want to contribute to projects across the world, you can make your own copy of somebody else’s repo (“fork” it), make some changes, and offer that code back to the original creator (by making a “pull request”), who can merge that code into their repo if they like what you wrote. Likewise, other programmers can contribute to your public repos via the same fork/pull request workflow, if you like what they wrote as well. But this is entirely up you, the owner of the repository.

Questions and Feedback

If you have questions or feedback, email us at protobioengineering@gmail.com or message us on Instagram (@protobioengineering).

If you liked this article, consider supporting us by donating a coffee.

Further Reading

Terminology

Command Prompt — the command line tool on Windows

Commit — (verb) to add a snapshot of your code to your Git repository

Commit — (noun) a snapshot of code in a Git repository

Git — a tool to track your code that can be accessed via Terminal (Mac) or Command Prompt (Windows)

GitHub — a website that allows you to share Git repositories with your coworkers or the world

Terminal — the command line tool on Mac OS

--

--

Proto Bioengineering

Learn to code for science. “Everything simple is false. Everything complex is unusable.” — Paul Valery