Working with Git/GitHub When Contributing to an Open Source Project
Every open source project has its own guidelines whether for improving the documentation, submitting bug reports, writing a feature request or contributing to the source code. Usually, the project has a contribution guide listing the practices to follow. Let’s take a look at the pandas
Python package. Among other things, their contribution guide explains the procedure for submitting a pull request. Before getting there are several technical hurdles to go through and concepts to get familiar with.
Workflow
Let’s summarize the workflow here. You will work with Git and GitHub. Git is a version control system that allows developers to track changes in their code. It is installed and maintained on your local machine. GitHub is a cloud-based hosting service for Git repositories. There, you can share your code and allow contributors to make revisions and edits.
You will use forks to propose changes. GitHub has a good tutorial on their utilities. Briefly, a fork is a new repository on your GitHub account that shares code and visibility settings with the original “upstream” repository. The entire workflow is illustrated below.
Get the Repository
Go to the repository you wish to fork on GitHub. Then, click on Fork in the top right corner of the page to fork your own copy of the repository to your account. Finally, create a local clone of your fork with:
git clone https://github.com/YOUR_USERNAME/pandas.git
We assume that you forked the pandas
repository in the command line above.
Sync Your Copy
Configure Git to sync your fork with the “upstream” repository:
git remote add upstream https://github.com/pandas-dev/pandas.git
Your local Git client can keep track of many different remote versions of the same repository. By default, when you clone a repository from GitHub, the first remote is named “origin”. The command above adds another remote and names it “upstream”. This will be useful when the upstream version of the repository has code changes, and you want your local branch to include those changes, so that the only difference between your branch and the original repository is the code changes for your feature. To summarize, “origin” will point to the fork located on your GitHub account and “upstream” to the original repository. This is clearly shown on the image of the workflow above.
Branching
The Git feature that really makes it stand apart from nearly every other source code management is its branching model. It allows and encourages you to have multiple local branches that can be entirely independent of each other. I like working as follows:
- Branch off
main
(could bemaster
or any other name depending on the project you work on) to create a feature branch:
git checkout -b YOUR_USERNAME/FEATURE_NAME upstream/main
I recommend using YOUR_USERNAME/FEATURE_NAME
for the name of the branch to make it clear you are the main developer on this branch.
- Keep it up-to-date by moving your branch to the newest
HEAD
ofmain
via:
git pull --rebase upstream main
Note that the more you wait to rebase the more you risk having to deal with merge conflicts, especially if the project has a large number of contributors. We recommend that you rebase onto main
frequently.
Don’t Git Pull
…unless you understand how this command behaves and you’re sure that’s what you want.
By default, git pull
will perform two distinct actions:
- Making your local Git client aware of the latest commits in the default remote (running
git fetch
). - If there are any differences between the commit history of the currently checked-out branch in your local Git client and the latest commits of the corresponding branch in the default remote, they will be merged (running
git merge
).
If the commit histories of the two branches have diverged (i.e. each branch has at least one commit that’s not present in the other), then Git will automatically create a merge commit. This will make integrating your code back into the codebase more difficult. If there are no commits in your local branch that aren’t present in the remote, then the git merge
command will result in a ‘fast-forward’ merge, where the commit history of your local branch is identical to the remote (this is good).
If you do want to run git pull
, we encourage it to be run in a non-default mode with different behavior:
git pull --ff-only
: this will rungit fetch
as normal but only execute thegit merge
step if it can be completed with a fast-forward merge (i.e. without creating a merge commit). This will only work if there are no new commits in your local branch.git pull --rebase
: this will runget fetch
as normal and then attempt to rebase any new commits in your local branch (any commits since the history deviated from the remote) after the new commits of the remote branch. This will only work if the distinct commits in the two versions of the branch don’t have any instances of editing the same part of the same file.
If neither of these steps can be completed automatically, then your local branch’s commit history will need to be reconciled in a more manual way, e.g. rebasing and manually resolving conflicts.
For more information, see the git pull documentation.
Git can be configured to set either of these behaviors as the default behavior when git pull
is called. To configure git pull
to use fast-forward-only by default: run git config pull.ff only
. To instead configure git pull
to use a rebase to resolve the commit history by default: git config pull.rebase true
. By default git config
changes configurations on a per-repository basis, but it can alternatively configure behavior across all repositories via a --global
flag, e.g. git config --global pull.ff only
or git config --global pull.rebase true
.
For more information, see the git config documentation.
Commit Message
Some project requires commit messages to be structured in a certain way. On the current open source project I am working on, the commit messages follow this semantic:
feat: add hat wobble
^--^ ^------------^
| |
| +-> Summary in present tense.
|
+-------> Type: chore, docs, feat, fix, refactor, style, or test.
chore
: (updating grunt tasks etc; no production code change)ci
: (changes to the CI configuration files and scripts)docs
: (changes to the documentation)feat
: (new feature for the user, not a new feature for build script)fix
: (bug fix for the user, not a fix to a build script)perf
: (code change that improves performance)refactor
: (refactoring production code, e.g. renaming a variable)style
: (formatting, missing semicolons, etc; no production code change)test
: (adding missing tests, refactoring tests; no production code change)
This is a good way to keep the commit history of the project clean as shown below:
* 1614d2e Merge pull request #652 from Breakthrough-Energy/ben/import
|\
| * 3d17f0f fix: add geographical coordinates to branch and plant data frames and fix bus assignment/naming (#703)
| * 0c7d0d5 Merge pull request #682 from Breakthrough-Energy/ben/profile
| |\
| | * adf9309 refactor: add inflow to column name of carriers with inflow profiles
| | * 0444b31 refactor: simplify logic
| | * 9a4d290 feat: normalize inflow profiles by max
| | * 9c7f629 test: write tests for profile extraction
| | * 7b2de36 feat: extract profiles from pypsa network
| |/
| * 04f2d4d feat: support hydro inflow functionality (#691)
| * b805325 feat: extract substation from arbitrary pypsa networks (#674)
| * 16b190d Merge pull request #675 from Breakthrough-Energy/ben/grideq
| |\
| | * 9707a82 fix: enable grid equality for back converted pypsa networks (#689)
| | * c134725 docs: format docstring and remove note
| | * 835e48e test: add test for storage
| | * 9ff3873 fix: enable roundtrip conversion (#678 and #685)
| | * bcf3bed feat: make FromPyPSA object a Grid object
| |/
| * 3e103fc refactor: create library of constants for grid object, casemat file and pypsa translators (#667)
| * 308d946 ci: update gitignore
| * f7d3899 feat: convert PyPSA storage_units/stores to Grid storage_data (#657)
| * e99c87c feat: convert pypsa Network object to Grid object and profiles
|/
* 4a46483 Merge pull request #701 from Breakthrough-Energy/jen/linearize
|\
| * 152c5b5 feat: port ramp_30 modifications from REISE.jl
| * 1a510d2 refactor: remove overload of linearize_gencost
| * 28d6424 fix: loading grid in analyze state
| * 066deb2 fix: don't scale coal pmin
| * 36f95b1 fix: fillna to prevent downstream errors
| * 68b1f1e fix: invocation of linearize_gencost
| * a06826d feat: wip port grid modifications from reise.jl
| * df4fc45 test: port test case for pmin = pmax
| * 3fa8ca4 test: port linearize_gencost tests from julia
| * a89b0a3 feat: move pmin overrides and cost curve linearization to client side
|/
* 35bd7d4 refactor: generalize area type in check function (#702)
* d83ac6c refactor: generalize generator type in the MockProfileInput class (#699)
* 5a17941 chore: update dockerignore (#700)
* f11f64c Merge pull request #698 from Breakthrough-Energy/ben/dependencies
|\
| * 92ddfbd fix: handle FutureWarning raised by pandas
| * b77b3ed fix: use list instead of set to create column names in data frame
| * b4373b4 ci: generate pipenv lockfile
|/
* b2df44a Merge pull request #697 from Breakthrough-Energy/ben/zenodo
|\
| * 763b1ac ci: remove zenodo_get package
| * 6d49050 feat: allow user to download any version of pypsa-eur
| * 70492f0 ci: update gitignore
| * 5448fd9 feat: create zenodo download manager
|/
* c5ee9e5 refactor: combine hydro and PHS into inflow in model immutables (#695)
* e989af9 style: add slack badge to README (#696)
Clean Up Personal Commit History
If you did not follow the commit message convention or your commit history is messy, use the interactive rebase tool (see this website for more details) to revise your commit history. You will be able to reorder, reword, drop and meld commits. In short:
git rebase -i upstream/BRANCH
where BRANCH
is the name of the branch you branched off, e.g., main
.
Pushing Changes
When you want your changes to appear publicly on your GitHub page, push your forked feature branch’s commits
git push origin YOUR_USERNAME/FEATURE_NAME
Now your code is on GitHub, but it is not yet a part of the project you are contributing to. For that to happen, a pull request needs to be submitted on GitHub.
Pull Request
PRs are critical to good software development by:
- Reducing code defects
- Keeping the team up to date with new code in the code base
- Teaching each other how to get better at coding
Open a PR as follows:
- Navigate to your repository on GitHub
- Hit the “Compare & Pull Request” button
You will find below the tasks that you usually have to go through for the PR:
- Keep your PRs simple (< 400 lines) — Short PRs get reviewed faster, get better feedback, and more bugs are caught
- Make sure your commit history is clean
- Ensure you have appropriate tests
- Ensure that checks (e.g. linting and testing) are in a green state
- Fill out the form when creating the PR if the project set up a PR template
- Keep branch up to date during the entire process
- Perform a merge commit once your PR is approved
Other things to keep in mind
We just talked about Git and GitHub here. When contributing to an open source project, you will have to document your code (e.g. using docstrings), format your code (e.g. using black
, flake8
, isort
) and write unit tests. Each project will have guidelines for these too.