When making a private GitHub repository public, audit the pull requests

Ministry of Justice Digital & Technology
Just Tech
Published in
3 min readMay 31, 2019

by Mat Moore (Software Development Profession)

The GOV.UK Service Manual encourages us to make source code open source. This is a great policy because it holds us to account and allows code to be reused by other teams. It’s also common for software-as-a-service solutions that work with GitHub to charge extra for private repositories, so open source makes it cheaper and easier for us to deliver software.

If you make your code open from the start of the project, it’s easy to do it securely. But what about making existing source code open? How to open up closed code by Anna Shipman explores 3 approaches for managing this:

  • Cycle all the credentials
  • Rewrite the git history to remove sensitive information
  • Move the code to a new repository bit by bit

When my team decided to make one of our private repositories public, we decided to go with option 2 — Rewrite the git history to remove sensitive information — because it would be easier for us than rotating credentials. It turned out this was not enough to avoid disclosing credentials because of the way GitHub references diffs in pull requests.

GitHub’s own advice states that:

Once you have pushed a commit to GitHub, you should consider any data it contains to be compromised.

What went wrong

When the project was originally created, we’d mostly avoided committing credentials to the repository. Instead, we used AWS Parameter Store to manage configuration and credentials for each environment — these get passed into the application as environment variables.

Unfortunately, the repository did contain one credential. We depend on a shared development database, and we’d committed a password for this in the docker-compose.yml file, which is used to run the application locally.

To address this we ran git filter-branch to completely remove the file from the git history.

We then recreated our docker compose configuration using a separate docker-compose.override.yml file, which is encrypted with git-crypt.

Unfortunately, the first pull request for this didn’t set up git-crypt correctly, and when reviewing it, I added a comment to point this out. What I didn’t realise is that this comment would persist even after the commit had been rebased and the branch deleted. Whenever code is commented on, GitHub retains the diff and just adds an “outdated” badge if the code is no longer in the repository.

This ultimately caused us to leak the credential when we made the repository public. Luckily, we noticed this shortly after and changed the credential.

How to avoid this

Many automated tools for detecting secrets focus on the git repository itself, rather than the project page on GitHub.

The problem is that any comment ever made in a code review could potentially expose secrets. In our case the repository was fairly new, so we could have manually audited GitHub issues and pull requests to check for anything sensitive, but this might not always be feasible.

You could avoid this problem by choosing either of the other two approaches Anna describes in her blog post:

  • cycling all the credentials
  • starting a new repository entirely.

You could also combine rewriting history with moving to a fresh repository. This way, you preserve your code’s history, but you get rid of existing issues and pull requests. This means you lose some of the context around the code, but you only need to sanitise the git repository itself before making the repository public.

If you enjoyed this article, please feel free to hit the👏 clap button and leave a response below. You also can follow us on Twitter, read our other blog or check us out on LinkedIn.

If you’d like to come and work with us, please check current vacancies on our job board!

--

--

Ministry of Justice Digital & Technology
Just Tech

We design, build and support user-centred digital and technology services for the justice system.