Repository organization for clean codebases

Jolene Langlinais
Jun 19 · 12 min read

Version control with Git provides for powerful collaboration, whether it be a tight-knit tech team or a distributed network in an open-source format. Still, proper care must be taken.

Git is used widely within the software development field and can manage a constantly evolving codebase. Most of its usefulness comes from the ability to branch off new work from an existing project to work on it in isolation until ready to integrate back into the project.

Good quality software consists of, among other things, code which is robust, resilient, secure, and performant.

These attributes are achievable by maintaining a foundation of good quality code and a solid history of documentation. Anyone should be able to join the process and easily figure out and track the status of the project.

This is where Git and GitHub come in.

Skip to the following:


Approach

My experience as a contributor to the Accord Project, an open-source project for smart legal contracts, has led me to adopt and implement a standard of Git workflow and etiquette which I find thorough and effective.

I will be sharing this as a useful reference for others, as well as a historical reference for myself in the future. As is readily apparent by now, I am utilizing both Git and GitHub.

For perspective, you can reference GitHub’s official workflow proposal (GitHub flow).

The majority of my work at the Accord Project is in JavaScript, with some domain-specific language mixed in, but the principles displayed here should apply to any language.


Branch and Fork

Branching

Pros

  • All project work is centralized.
  • Ease of collaboration.
  • Single remote to handle.

Cons

  • Deprecated branches may clutter easily.

Forking

Pros

  • Increased separation between user branches.
  • Primary repository cleanliness.

Cons

  • Difficulty to track branches.
  • Collaboration requires extra steps.
  • Accessibility is lower for less-experienced Git users.

While there are benefits to both of these approaches, my general rule consists of forking in an open-source project and branching in a smaller or insular team.

There is more incentive for keeping the main repository clean and tidy in open-source, and less chance of quick-and-dirty collaboration on a branch.

Conversely, a tech team would benefit from a centralized repository with trackable branches. Either way will necessitate strict organization.

As an open-source project, the Accord Project follows the fork workflow model.

Still, the master branches should be kept in sync (guide for syncing), and every feature, bug fix, release, or code change should occur in a different branch. This will be discussed more in the next section.

Naming a new branch will be one of the first steps in keeping a consistent tracking system of issues, pull requests, and Git history. The important factor here is consistency.

name/issue-tracker/short-description

  • name: Anything from initials to full name to GitHub username.
  • issue-tracker: Reference the issue from GitHub or some other agile-user stories source.
  • short-description: One to three words describing the main goal of this branch, separated by hyphens.

Example

irmerk/issue7/new-feature

In the case of collaborating on a single feature, maintain a single master branch for the feature and individual branches from it. This can follow the previous naming convention:

master/issue14/routing-service // team branch
irmerk/issue14/routing-service // my branch
someone/issue14/routing-service // someone else’s branch

Personal branches can be merged into the master team branch, which will then be merged with the overall master through a pull request. Delete branches after they are merged.


Rebase and Squash

Common practice for teams is to squash or condense long commit message chains into one or a few commits before merging into master.

This is useful when, like me, someone commits frequently and thus would clutter a Git log. Squashing serves to maintain a readable Git log.

Prior to merging a feature branch into the main (master) branch, it should be rebased from the up-to-date master.

A pull request, discussed later, will be where all the commits of this branch are squashed down to a single buildable commit and merged into master.

Rebasing essentially ports a branch (master) into your current branch by applying all of your commits on top of that branch (master), then replacing your branch with the revised version. This catches you up to a branch (master) by rewriting Git history in your local Git.

Think of it as moving your branch to the tip of master instead of having branched off from an earlier version of master.

Instead of needing to heavily investigate individual commits from a branch, each merge commit to master should contain all the code for a feature or bug fix. This allows for a much easier investigation process.

While squashing prior to rebasing reduces conflicts due to fewer steps of conflict resolution, it literally changes the history of the repository as documented on GitHub for everyone, and thus is not the most accurate representation.

Rebasing before squashing retains the Git log tidiness and you don’t change history prior to documenting it on GitHub.

Interactive rebase

git rebase -i is super useful in several circumstances, such as needing to quickly remove a commit.

If your team has a policy in which any feature commit must also contain tests for that feature, squashing several commits into one can be helpful.

This would involve git rebase -i HEAD~n and replace n with the number of commits — replace pick on those commits’ lines to squash.

If your project, like the Accord Project, requires a Developer Certificate of Origin sign-off, you may find yourself needing to rapidly change the messages on a series of commits.

Similarly to before, change pick to edit on the commits needing to change and simply git commit --amend -s and git rebase --continue for each commit.

git push -f into the branch for the pull request. Caution: Force pushing can have dire consequences if not used properly, consult others if unsure how to use this.

Generally, if you do not have large, confusing conflicts, -i (interactive rebase) will be overkill.

Rewriting history

While rewriting history with git rebase is extremely useful in some cases, caution should still be taken. Make sure to not interrupt other people’s history and commits on accident.

Caution: Avoid force pushing to a remote branch other people are working on.

Fear

I still fear rebase and squash. Merge conflicts can be more frequent and seem more difficult. I’ve had experiences of losing work due to incorrectly rebasing.

However, if you frequently commit ongoing work, rebasing complications should be infrequent. Fixing conflicts and git rebase --continue can feel intimidating at first, but continue working with it.

A merge commit will be made with these conflicts, but that is important history on how a conflict was resolved. You can always try again if you feel it is going poorly with git rebase --abort and try again — this reverts to before the rebase attempt.

Flow

My recommendation for a general workflow:

  1. Ensure you are currently in master.
    → If working in a fork, fetch:
    git checkout master
    git fetch --all --prune
    git merge --ff-only upstream/master
    git push origin master

    → If working in a branch, pull:
    git checkout master
    git pull origin master
  2. Create a new branch for your feature or bug fix
    git checkout -b branchName.
  3. Make changes with as many commits as necessary. The final commit should build and pass tests.
  4. Make sure your branch sits on top of master (as opposed to branch off a branch). This ensures the reviewer will need only minimal effort to integrate your work by fast-forwarding master
    git rebase upstream/master.
  5. If you have previously pushed your code to a remote branch, you will need to force push. If not, omit the -f tag.
    git push origin branchName -f
  6. Open a pull request in GitHub from this forked branch. Once this has been merged to master, remember to clear out your branch locally
    git branch -D branchName

Merge

As a non-destructive operation, merging can be nice. However, if the master you are working on is quite active, the Git log can be polluted quickly by the extra commit created by a merge operation.

While many developers are uncomfortable with rebasing and resort to merge master on top of their changes, merging is not always the best option.

If merging locally instead of rebasing, at the very least squash a pull request to merge into master on GitHub. This allows for greater control of the commit messages in master for the Git log.


CLI

The Command Line Interface (CLI) is a place where you can run all Git commands.

Git graphical user interfaces (GUI) such as GitKraken and Tower are great. I opt to not use them due to them being priced, as well as the vast majority of solutions found online involving Git are for CLI.

Moreover, knowing the CLI method of Git allows you to easily navigate a GUI, but the reverse is not necessarily the case.

Learning to interact with Git and GitHub via the CLI will be a great use of your time, especially if you work with open-source projects.

Diff

A good habit to adopt is utilizing git diff prior to committing anything. This allows you to ensure the code being committed is what you expect, all debugging statements are removed, and no junk is included.

Log

Logs reveal a history of everything that happens in a repository. This tool has many options for displaying commit history in specific ways.

A full log contains a commit hash, author, date, and message. My preferred way of logging builds is an ASCII graph representing the branch structure:

git log --graph --decorate --pretty=oneline --abbrev-commit

Blame

A method for inspecting who changed what and when in files is git blame. If you code in VS Code, I strongly recommend looking into GitLens, which makes this inspection inline and extremely efficient.


Commits

A single logical change should be captured in a commit. More than one logical change in a commit — an instance in which you may find yourself writing “and” in a commit message — is a good indicator of needing to split into two separate commits.

Commit often — do not let yourself get too far without committing. Small, incremental, and self-contained commits are easier to follow or revert in the future.

While ordering commits logically would be ideal, I recommend committing in the order in which you are working — this is a chronological history of what you did.

Messages

Take a moment for this process — a commit message should not be rushed.

The description of a commit should be well documented and thus will prove invaluable to whoever reads this in the future in attempt to understand why a change was made, even if they have little to no context.

This accessibility is a vital goal for a thorough Git history and workflow.

Include external information references such as issues or pull requests. Anything that will be helpful to others or your future self should be reasoned out now.

The long term success of a project relies on the maintainability of the code and log. A hassle at first pays off as a healthy habit.

Use the terminal, not the editor, when writing a commit message. Committing from the terminal encourages a mindset of describing a change in an incremental way, as well as keeping commits atomic — commits should not need a paragraph of explanation.

This will assist you in creating a pull request message, where an overall change should be captured. More on this later.

Concise and consistent commit messages should be captured by always including the -m <msg> flag to a git commit.

Formatting

A properly formed Git commit subject line should always be able to complete the following sentence:

If applied, this commit will your subject line here.

type(scope): subject footer

Types

  • feat — A new feature.
  • fix — A bug fix.
  • docs — Changes to only documentation.
  • style — Changes to formatting (missing semicolons, etc.).
  • refactor — A code change that neither fixes a bug nor adds a feature.
  • test — Adding missing or correcting existing tests.
  • chore — Change to build process or auxiliary tools, or maintenance.

Scope:

  • Focal point of new code or best description for where changes can be found.

Subject:

  • Imperative description of changes, kept under 50 characters (not capitalized and no period).

Footer:

  • GitHub issue reference ID.

Examples


Pull Requests

A pull request (PR) is one of the best ways to share information. While an issue describes what may be wrong or a feature, a PR provides a medium for the changes that are actually occurring to the codebase.

Moreover, it is excellent for peer reviews and accountability, as it encourages quality commits. When done well, the commits that construct a PR tell the whole story to those who review the code or examine it in the future.

PRs should consist of a complete addition to the code which contains value. Because the commits inside follow a pattern, the title should be an extension or summary of all the commits inside.

Thus, earlier, when I spoke of squashing, a Git log will retain the pattern of each commit even after it is tidied up. To emphasize, a PR title should follow commit message formatting described above.

As a GitHub workflow tool, the innards of a PR are less important than the title maintaining the consistency and efficiency of formatting. This allows Git logs to remain efficient with or without GitHub.

Similar to commits, PRs should be small. A PR which attempts to do multiple things (find yourself writing “and” in the title?) should be split up.

Cleanup

As discussed previously, rebasing prior to creating a PR is a good tidying habit.

Take a moment to merge any extra commits created along the way, or reword commits for clarity. Every commit in a PR should be directly working towards the goal of the title of the PR, or even the related GitHub Issue.

Formatting

GitHub PRs are in Markdown

Keep in mind: What does this change do to address the issue and which side-effects may it have? Why was it necessary? Try to pre-empt a reviewer needing to ask questions, be thorough in your information.

Drafts

GitHub offers a useful option for a PR which is not ready to be reviewed quite yet.

If you want to have a source of truth in a PR for others to interact with, prior to actual review, or even just to ensure the code is saved on GitHub and not only on your local machine, open a draft pull request.

Reviews

Setting a standard for PR reviews is important, and being thorough equally so. A reviewer is a guardian of the Git history and code quality. This cannot be stressed enough.

What seems obvious now will surely not be so in months’ or years’ time. Do not feel bad for requesting changes or having changes requested of you. It is better to have pristine code merged into master than to rush through out of desire to be done with a feature.

Strike a balance between the flow of getting PRs through and not holding up further edits or production, and maintaining quality. Every reviewer should make a judgment on whether an issue is sufficient enough to block a PR.

However, anyone who contributes to the PR should not be reviewing what is merged into master. Keep a healthy separation between these roles. Generally, it is faster for the PR author to fix the code than for a reviewer to be involved.


GitHub

Issues

Very easy to maintain, issues should be used liberally. Any question, idea, or bug — duplicate or not — should be reason enough to open an issue.

These are the foundation of conversations in projects, so insert points-of-view and establish a concrete record of discussions which can be searched and linked.

Speaking of which, search through a project before opening an issue. While duplicates can be easily closed, this is a courtesy to other contributors.

Also, be thorough and provide as much context as possible. Maybe even take a moment to read the project’s documentation on contributing, they may have guidelines on formatting an issue.

Noise

Stay mindful of what you are presenting for others to read. Is it worth their time? Avoid posting short, rushed answers or responses. Take the time to be worthwhile to yourself and others in your contributions.

Provide as much context as possible while still being helpful. Too little and too much context are both quite awful.

If a significant amount of time has passed, send a gentle bump to reviewers. It can be easy to forget about an active PR.

Open-source

I recommend this guide for contributing to open-source.

GitHub provides great functionalities which should be utilized for the best possible experience of contributing developers.

Issues can be tagged for ease of navigation, protections can be enforced to prevent direct commits to master, and multiple reviewers can be required.

This guide is aimed at providing a framework which fosters a healthy, collaborative atmosphere. All of this can be applied and tweaked for any open-source or proprietary tech team environment.


Conclusion

Contribute the work you would like to see. Politeness, respect, thoroughness, and efficiency are all things we appreciate, so develop and maintain good habits with Git and collaborations will be the best they can be.

Perception is important in open-source, and how others view the Git log can make a big difference. Take time to care for the log with efficient commits and Git etiquette. This is also true for a company and a new hire.

Feel free to contact me with any questions or feedback.

Better Programming

Advice for programmers.

Jolene Langlinais

Written by

Full Stack Engineer. Chef. Consumer of improv.

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade