Step-by-Step Guide to Contributing to Open Source Projects

Dirk Avery
16 min readDec 5, 2018

Contributing code, tests, or documentation to your favorite open source projects is very rewarding. You’ll gain experience. Or, you might need a feature that no one else has implemented yet. Regardless of your reason, you’ll feel a sense of accomplishment when your contribution is accepted and used by the community.

Whether you’re developing in Rust, Kotlin, Python, TypeScript, or Go, or writing documentation, adding tests, or contributing to a different project, the process of contributing will be the same as I outline here. The specifics I’m using to make this guide more concrete are as follows:

  1. Go language, one of the hottest “new” languages
  2. Terraform’s AWS provider, an open-source project
  3. MacOS development platform
  4. GitHub Git host
  5. Command-line Git because, frankly, Git should always be command line

If you can’t describe what you are doing as a process, you don’t know what you’re doing. — W. Edwards Deming

1. Install Git

Git is the de facto way of tracking project changes. Most projects are hosted on either GitHub, BitBucket, or GitLab. To communicate with these hosting services, we’ll need to install Git.

Because Git is updated regularly, we’ll use a package manager to make life easier. On MacOS, that means Homebrew.

$ brew install git

Now, whenever we want to freshen Git up, we simply run these commands. The first updates brew and the second upgrades Git.

$ brew update
$ brew upgrade git

To avoid having to enter the username and password for our chosen Git host (i.e., GitHub) over and over, we can connect Git with the osxkeychain helper.

$ git config --global credential.helper osxkeychain

2. Install the Development Language

Since in our example we’re going to use Go, that’s what we’ll install. If you’re aiming at another language, install that instead.

To begin, we’ll open a terminal and setup the environment.

$ export GOPATH=~/go
$ export GOROOT="$(brew --prefix golang)/libexec"
$ export PATH="${PATH}:${GOPATH}/bin:${GOROOT}/bin"
$ mkdir -p "${GOPATH}/src/github.com"

We’ll add GOPATH, GOROOT, and PATH to our profile so that when we open a new terminal, it will already be ready to go. On MacOS, there are several profile files. We’ll make the changes to ~/.bash_profile and add these lines, being careful to not upset any existing commands.

export GOPATH=~/go
export GOROOT="$(brew --prefix golang)/libexec"
export PATH="${PATH}:${GOPATH}/bin:${GOROOT}/bin"

Now that our environment is ready, we can install Go:

$ brew install go

While we’re at it, we should probably install godoc and golint.

$ go get golang.org/x/tools/cmd/godoc
$ go get github.com/golang/lint/golint

3. Fork the Open Source Project

Forking a project (i.e., repository) means we’re going to create a remote copy of the project for our own purposes. This is different than cloning which will create a local copy.

For this step, we’re going to need a GitHub account. If you’ve made it to this point in life without one, you should have a long think about your priorities and then head to GitHub and create one.

Creating our fork is easy. Go to the project we’re going to work on and click “Fork.”

Using GitHub to fork a project.

After a few minutes, we’ll have our very own copy (i.e., fork) of the project on GitHub. Our fork has the same repository (i.e., project) name, but is under our account instead of the original user account.

Making the fork and making changes to the fork have no impact on the original repository (except projects that get forked a lot look cooler).

A forked project on GitHub

4. Clone our Fork

Forking the project gave us a remote copy of the project. We could edit the project remotely using the GitHub interface. However, for anything other than the simplest changes, we’ll want a local copy so that we can edit and test locally.

It is very important to be in the correct directory when you run the clone command. Git will, by default, put the clone in subdirectories of the current directory. Go is quite picky about where files are located so we’ll be careful to get it right.

Start by drilling down from the GOPATH and we’ll make a new directory for our clone.

$ cd $GOPATH
$ cd src/github.com
$ mkdir terraform-providers
$ cd terraform-providers

The path $GOPATH/src/github.com is a common parent directory for Go open source projects hosted on GitHub. We made a new directory terraform-providers that matches the original user/account of our forked repository. Even though we’re going to clone from our user/account, the code internals will expect the original account name.

With the parent directory setup, we’re almost ready to clone the repository. The URL for cloning is easy to find on the GitHub page for our forked repository.

Getting a clone URL from GitHub

With the URL copied, we are ready to clone, or, in other words, create a local copy of the repository.

$ git clone https://github.com/YakDriver/terraform-provider-aws.git
$ cd terraform-provider-aws/

5. Setup a Remote

We have now had our hands on three repositories: the original, the fork (our remote copy), and the clone (our local copy).

Right after we fork and clone, our copies are exactly the same as the original. However, time stops for no open-source contributor. Within days, hours, or possibly even seconds, the maintainers of the open source project will make changes to the original project. Our fork and clone will not automatically receive these changes and so we will fall behind. To avoid this happening, we first need to tell Git about what we’ve done. Then, Git can help us keep up to date.

Because of the way we forked first and then cloned our fork, Git already knows about one remote repository: our fork. Run this command to verify you already have one remote setup. (YakDriver will be replaced with your GitHub username.)

$ git remote -v
origin https://github.com/YakDriver/terraform-provider-aws.git (fetch)
origin https://github.com/YakDriver/terraform-provider-aws.git (push)

We see that we have one remote called “origin,” with push and fetch endpoints. Now, the clone and fork dots are connected.

However, we also want Git to know about the original, or “upstream,” repository so that we can get updates.

$ git remote add upstream https://github.com/terraform-providers/terraform-provider-aws.git

We’ve added a new remote called upstream. (The remote could have been named anything, but to keep life simple, we’ll call it what it is.)

6. Refresh the Main Branch

The main branch, which is really the trunk of our tree, is usually called master. But, projects vary and there’s no rule that it be called master. These steps assume the main branch is called master, but substitute another name if necessary.

This step should be the reentry point for successive iterations of the process if we make more contributions to the same repository.

Since we want to start with the latest and greatest code from the main repository when we start making our changes, we’re going to use the remote to grab what we need. For example, if we made our fork and clone a year ago, but now we’re ready to go and want to bring them up to date, we can use Git and the remote we just setup in Step 5.

NOTE: Pulling changes from the main repository to our repository can overwrite our hard work! To avoid losing hard work with a tool designed to prevent us from losing hard work, follow these Cardinal Rules.

Git Cardinal Rules of Not Destroying Hard Work:

  1. Only pull to the main branch
  2. Never make changes to the main branch

“Pulling” means that we are going to download changes from the upstream (i.e., original or main) repository and overwrite (i.e., merge) any local changes. Thus, always make sure that you are on the main branch before pulling.

$ git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
$ git pull upstream master
remote: Enumerating objects: 85, done.
remote: Counting objects: 100% (85/85), done.
remote: Compressing objects: 100% (43/43), done.
remote: Total 96 (delta 59), reused 56 (delta 42), pack-reused 11
Unpacking objects: 100% (96/96), done.
From https://github.com/terraform-providers/terraform-provider-aws
* branch master -> FETCH_HEAD
c8861c1be..50667a712 master -> upstream/master
Updating c8861c1be..50667a712
...

We’ll make sure that our fork’s main branch is also up to date by uploading to it with a push.

$ git push

7. Create Your Very Own Branch

Now that the main branch is up-to-date, we can create a branch from it. This branch will have the latest, greatest code from the original repository that we can build on.

$ git checkout -b my-new-branch
Switched to a new branch 'my-new-branch'

We are now in our very own safe space. We can make changes without worry. Since we will never pull without switching to the main branch, our changes won’t be overwritten.

A branch is like a separate copy of the entire repository. To the file system and development platform (e.g., Go and VSCode), it only looks like one set of files, but whenever we switch branches, Git switches out all the files in the project.

For example, assume we have a project with 1,000 files and only one branch — the main branch called master. If we create a second branch called with-a, we can add the letter ‘a’ to the end of each of the 1,000 files. When we switch back to master, all the letters ‘a’ we added will instantly disappear. When we switch back to with-a all the letters ‘a’ would reappear.

Thus, a branch tracks all the changes we make to a project within an isolated cocoon. We could have 10 branches for different experiments or features. Branches give us freedom to explore and we can always go back to the original if it all goes wrong.

We can see all our branches easily:

$ git branch

We can switch to an existing branch:

$ git checkout my-existing-branch

Or, we can create a new branch:

$ git checkout -b my-new-branch

New branches are based on where you are when the new branch is created. Thus, we’ll go back to the main branch before create our new branch for real. In order to remember what the branch is for, we’ll name it after what we’re working on:

$ git checkout master
$ git checkout -b gosimple-vpn

8. Make Your Changes

Using VSCode or Atom (or your preferred IDE), we’ll make changes to the code.

We see that in the file aws/resource_aws_vpn_connection_route.go there is a bit of code that could be simplified. On line 131, we see:

if err != nil { 
return err
}
return nil

We simplify that to one line:

return err

We save the file and we’re ready to test.

How much should we do in a single commit or branch?

Opinions and circumstances differ. However, nearly everyone would agree that keeping changes focused and simple is a good idea, especially when we’re newbies. A good rule of thumb is that commits should be atomic and branches should contain a single fix or enhancement.

An “atomic commit” is a change that is so small it is not splittable, since, you know, atoms cannot be split. Consider the maintainers of a project. Can they accept or reject a single commit without it affecting the other commits in a pull request? (For an explanation of what a “pull request” is, see Step 15 below.) On the other hand, a commit should not be a trivial change. Trivial commits should be squashed.

9. Test Your Changes

At this step, we ask ourselves, does a test already exist in the project that tests our contribution? If so, we’ll need to re-run the test or tests to make sure that we haven’t flubbed the proverbial dub.

If no test exists, we next ask, what type of testing is the project using? Integration? Unit? Acceptance? Which harness (e.g., Testify, Go test, Pytest)? We don’t want to reinvent the wheel by setting up a new testing harness for the project, unless that’s our contribution, so we’ll follow what other contributors have done.

If adequate testing of our contribution does not exist, it is our responsibility to add one or more tests. If we’re adding a fix, our added tests should fail without the fix we’re providing. If we’re adding an enhancement, we’ll add tests to thoroughly exercise the new functionality.

In our example, there are already acceptance tests covering our contribution so let’s make sure that our changes haven’t bollixed anything up.

The Terraform AWS provider uses make to perform tests. Before testing, we’ll need valid AWS credentials set in our environment and, warning, running tests will create AWS resources and we will be charged. Since we modified the file aws/resource_aws_vpn_connection_route.go, we’ll want to run the related test(s):

$ make testacc TESTARGS='-run=TestAccAWSVpnConnectionRoute_basic'

This command is specific to the Terraform AWS provider. Other projects use Go built-in testing or other testing frameworks (i.e., pytest for Python). An example of running a test with standard Go is as follows:

$ go test -timeout 30s github.com/aws/aws-sdk-go/aws/credentials/processcreds -run ^TestProcessProviderStatic$

If our testing fails, we’ll go back to Step 8 and fix it.

10. Document Your Changes

Most projects include documentation. At this step, we ask ourselves, does this contribution change anything already documented or add new functionality? If the answer to either is “yes,” we should add documentation.

Often documentation is in the same repository. It might be part of the code (e.g., godoc). It might also be in separate HTML files.

In our simple example, our code fix does not change anything already documented and does not add new functionality. Thus, in this rare instance, we get a pass on documentation.

11. Stage and Commit Your Changes

Now that we’ve verified that our change works and passes testing, we are ready to stage and commit the change.

To see what files have changed:

$ git status
On branch gosimple-vpn
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: aws/resource_aws_vpn_connection_route.gono changes added to commit (use "git add" and/or "git commit -a")

Git, helpfully, tells us exactly what’s going on and what we probably want to do going forward.

NOTE: When contributing to any project, but especially a public project, we need to make sure that everything Git tells us in status is correct. If files have changed that we don’t recognize, don’t stage and commit them! When we build or compile a project, it’s common for files to be changed or created that are not related to the contribution we’re making. Sometimes these are extraneous that should not be part of the code repository. These are ignored based on a list of file name filters in the .gitignore file.

If the build or compile process changed a file, or we changed a file that we did not mean to change, we can undo for a specific file with these commands:

$ git reset HEAD <file to undo>
$ git checkout -- <file to undo>

We want to stage files so we’ll follow Git’s advice:

$ git add aws/resource_aws_vpn_connection_route.go

If, instead, we wanted to stage all changed files, we could use this command:

$ git add .

To commit the staged files with only a commit summary:

$ git commit -m "Add a message in imperative mood"

Commit messages are very important to the longterm success of a project. The commit message consists of a summary and a body. For simple, practically self-explanatory commits, you can omit the body. For all other commits, include a body describing the what and why of the change rather than the how, which can be found in the code itself:

$ git commit

This opens the default editor. The first line of the file is the summary (what you get if you type git commit -m "<Commit Summary>"). Leave a blank line after the summary and then add the body.

Guidelines of Good Git Summaries

  • Less than about 50 characters
  • Sentence case (First letter is capitalized but not every word)
  • No period at the end
  • Imperative mood (i.e., it can fill in this blank: If applied, this commit will _____)

12. Update Your Branch with Changes Made to Original Repo

If our development takes any time at all, chances are good that changes will have been made to the original repository that aren’t in our branch. This can be handled with a fast-forward rebase. Basically, we’re going to grab all the changes made to the original and then apply our changes on top of them.

We are not breaking the cardinal rules in this step since we are not pulling to our branch.

NOTE: Making mistakes in a rebase can make for a bad day. Before rebasing, we’ll make sure that any windows we have open have completed all commits. Or, even better yet, we’ll close all windows (i.e., terminals) that are in the repository directory except one. If we type git status in that one window, we should see this message:

On branch <our branch>
Your branch is up to date with 'origin/<our branch>'.
nothing to commit, working tree clean

“You know, Hobbes, some days even my lucky rocket ship underpants don’t help.” — Calvin & Hobbes

With that all checked out, we can feel confident in rebasing our branch:

$ git fetch --all
$ git rebase upstream/master

13. Upload Your Changes

So far we’ve only made change to the local copy (i.e., clone) of the repository. In order for those changes to reach our remote copy (i.e., fork), we’ll have to push them.

Careful readers will have noticed that in addition to our changes only existing locally, our new branch also only exists locally. To verify, we can go to GitHub before executing the next command and we won’t find any sign of the new branch. In order to create the branch remotely, which only needs to be done once per branch, we’ll use this command:

$ git push --set-upstream origin gosimple-vpn

After the branch is created remotely, we can subsequently push to it with a simple command:

$ git push

If we performed a rebase, Git will complain. We rewrote history with the rebase, even though it was only local history, so we’ll have to force push our commit (think Jedi mind-trick):

$ git push -f

NOTE: git push -f is a powerful command. And, “with great power, comes great responsibility.” If you’re working on a branch with others and you push -f on that branch, it would be bad.

Try to imagine all life as you know it stopping instantaneously and every molecule in your body exploding at the speed of light.

Only when you are sure that you’re the only one working on a branch, is it safe to push -f. (Of course, that’s not entirely true. Sometimes you have to cross the streams to save the planet. But, be cautious and carefully coordinate with anyone else working on the branch so that they know what you’re doing and what they need to do.)

14. The Pre-Contribution Checklist

A lot of maintainers are very patient and helpful if you make mistakes. However, not all are and most are extremely busy. To save them and ourselves time, we’ll use this checklist before submitting our work in the next step.

Commits and branches on GitHub
  • Everything is there? Is everything in the branch that should be? It is a common mistake to not stage a change before committing. git status would tell us that files have been changed that aren’t included in a commit. It’s also easy to check on GitHub. By making sure we’re on the correct branch and then clicking on the commits, we can see exactly what is in a branch and see the changes to the code. See Step 11.
  • Nothing extra? Is there anything in the branch that shouldn’t be? Did any files sneak in, such as build artifacts or unrelated changes? Again, checking the GitHub UI is easy and often uncovers errors. See Step 11.
  • Testing? Has the contribution been thoroughly tested? We want to contribute good code and the only way to see if it stands up is testing.
  • Tests? Do tests exist to cover this contribution? Thoroughly testing a contribution can usually be made easier and more robust by using a testing framework or harness. That way as we progress, re-running tests becomes trivial (and a nice time to grab a beverage). See Step 9.
  • Documentation? Does this contribution change anything already documented or add new functionality? If so, we need to include documentation. (Also, if the code needs documentation for any tricky bits, we should add that too.) See Step 10.
  • Latest code? Is the contribution sitting on top of the latest code in the main repository? Not rebasing onto the latest code can affect testing and hide conflicts. If the newest code throws off tests or conflicts with our contribution, it’s not the end of the world and we can fix it. However, we should do that before submitting! See Step 12.

15. Create a Pull Request

At this point, we’ve made changes to our local repository (i.e., our clone). We then pushed those changes to our remote repository (i.e., our fork). What remains is making changes to the original repository. Unless you’re a maintainer, you have no control over the original repository.

A pull request (or “PR”) is our request to the maintainers of the original repository that they pull our wonderful new changes from our remote branch (i.e., fork) into their repository.

They have no obligation to pull our changes and may ignore our request or close the PR without even looking at it. However, if we’ve made good changes and communicate well with the maintainers, we may have our contribution pulled by them. That’s the goal anyway. Once they pull our contribution, it then becomes part of the original repository (i.e., the maintainers will merge our contribution into the main codebase).

To create a pull request, go to GitHub, select your fancy new branch, and click “New pull request.”

After that, follow the prompts, enter a compelling comment, and submit your PR.

16. Rinse and Repeat!

Now that we’re all setup, it will be much easier to submit additional PRs to the same repository. Go back to Step 6, and repeat.

--

--

Dirk Avery

Cloud engineer, AI buff, patent attorney, fan of cronuts. AWS Certified Solutions Architect — Professional. Go, Python, automation. https://www.hashicorp.com