Getting a Overview of the Secrets Stored as Plaintext in Your Github Private Organization

Jérôme Dupuis

Published in

Decathlon Digital

7 min readOct 27, 2022

Summary

Launch a simple one-time scan — Get a quick overview of the exposed secrets.

Deployment guide — Step-by-step technical guide.

Next steps — Go further securing your Github organization.

Context

Public repositories are continuously scanned by Cloud Providers such as AWS or GCP to alert customers when authentication secrets are leaked publicly.

What about secrets exposed internally in private Git code repositories (SaaS or On-Premise)?

What if an internal Github account is leaked? What if a developer leaks a Github personal access token or a ssh key?

The risk could be the leak of your organization code, the potential vulnerabilities inside it, or worse, it could provide bad actors plaintext secrets stored in code and open multiple doors on your Information System.

Launch a simple one-time scan

In this article, I propose a simple and quick way to evaluate the quantity of secrets that could be present in your repositories, at scale.

The result of this scan will enable you to:

Evaluate to what extent your teams are aware of the risks that represent plaintext secrets available in code.
Collect all the needed data to help you initiate a remediation campaign.
Evaluate your needs in terms of long term security tooling on Github.

Two existing tools are used for this scan:

The open source Github scanning engine Gitleaks: https://github.com/zricethezav/gitleaks

This tool parses a git clone output to match patterns that you will define for your organization.

The open source SecureCodeBox framework, which permits to automate execution of various well-known security tools by using Kubernetes: https://www.securecodebox.io/docs/scanners/gitleaks/

This tool orchestrates scans at scale:

Repository retrieval
Gitleaks execution
Result export

In addition, you could use Cloud services to run Kubernetes, store results, analyze data and output relevant information.

The below diagram represents the complete worklow used for this scan :

Depending on the solutions used within your company and the corresponding criticality, you will have to define regular expressions matching your environment to feed the gitleaks configuration file.

For a first scan, I recommend targeting the most critical secret types for your organization. Otherwise, you could get an excessive amount of data to deal with, as the default configuration file contains a lot of expressions, including generic ones.

This is an example which targets GCP service accounts, and Github tokens:

#gitleaks.toml[[rules]]id = “Github Personal Access Token”description = “Github Personal Access Token”regex = ‘’’ghp_[0–9a-zA-Z]{36}’’’[[rules]]id = “Google (GCP) Service-account”description = “Google (GCP) Service-account”regex = ‘’’\”type\”: \”service_account\”’’’

This is a minimal working example, but you can read Gitleaks documentation for advanced configuration.

To test this configuration file, you have to create a repository containing fake secrets you are looking for to verify that they will be detected (needless to say that these credentials should be, either deleted first or without any permission).

Now that you are able to match some patterns, you will need to check if they are usable by trying to connect to the provider’s APIs using dedicated functions.

Deployment guide

Step 1: Deploy SecureCobeBox operator and gitleaks scan type. The operator chart is customized with a parameter that permits uploading data to a bucket with S3 protocol.

# Add the secureCodeBox Helm Repohelm repo add secureCodeBox https://charts.securecodebox.io# Create a new namespacekubectl create namespace securecodebox-system# Install the SecureCodeBox Operator & CRD’shelm — namespace securecodebox-system upgrade — install securecodebox-operator secureCodeBox/operator -f value-operator.yaml# Install SecureCodeBox Gitleaks scan typehelm — namespace securecodebox-system upgrade — install gitleaks secureCodeBox/gitleaks

Modify the following file to match your storage endpoint and bucket name :

#Value-operator.yamlminio:# disable the local minio instance enabled: falses3: enabled: true bucket: gitleaks-results endpoint: storage.googleapis.com/s3.eu-west-1.amazonaws.com/ keySecret: bucket-credentials

Step 2: Configure k8s secrets and configmaps

# Deploy ssh key used to git clone repositorieskubectl create — namespace securecodebox-system secret generic github-ssh — from-file=/home/username/.ssh/id_rsa# Deploy credentials used to upload gitleaks results to a bucketkubectl create — namespace securecodebox-system secret generic bucket-credentials — from-literal=accesskey=”XXXXXXXXXXXXXXXXXXXXXX” — from-literal=secretkey=”XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”# Deploy gitleak configuration filekubectl create configmap — from-file gitleaks-config.toml gitleaks-config -n securecodebox-system

Step 3: Deploy functions that will verify credential validity based on your needs

Below is an example of Python code hosted in a Cloud Function, that will test the validity of Github personal access tokens, and upload data to GCS in case it is:

from google.cloud import storageimport jsonimport base64import timefrom github import Github, GithubException, BadCredentialsExceptiondef hello_gcs(event, context):  """Triggered by a change to a Cloud Storage bucket.  Args:    event (dict): Event payload.    context (google.cloud.functions.Context): Metadata for the    event.  """  file = event  if "id" in event:    if "findings" in event["id"]:        # Initialise a client        storage_client = storage.Client("<hosted GCS project>")        # Create a bucket object for our bucket        bucket = storage_client.get_bucket("<bucket name")        # Create a blob object from the filepath        blob = bucket.blob(f"{event['id'].rsplit('/', 3)[1]}/{event['id'].rsplit('/', 3)[2]}")        # Download the file to a destination        result = blob.download_to_filename("/tmp/findings.json")        with open("/tmp/findings.json") as f:          github_keys=[]          data = json.load(f)          upload_data=[]          for entry in data:            # Retrieve github results            if entry["name"] == "Github Personal Access Token":              github_keys.append(entry["attributes"]["offender"])          #Test token against api          github_keys = list(set(github_keys))          for key in github_keys:            if gh_token(key):              for entry in data:                if entry["attributes"]["offender"][-41:] == key :                  upload_data.append(entry)              gcs_upload("github_pat", upload_data)              upload_data.clear()def gcs_upload(type: str, data: list) -> None:  """Upload data on GCS bucket  Args:    type (str): secret type    data (list): list of data to be uploaded  Returns:    None  """  storage_client = storage.Client("<hosted GCS project>")  bucket = storage_client.get_bucket("<bucket name>")  blob = bucket.blob(f"{type}/{time.strftime('%Y%m%d-%H%M%S')}_{type}_{data[0]['attributes']['email']}.json")  blob.upload_from_string(json.dumps(data))def gh_token(token: str) -> bool:  """Test Github PAT validity.  Args:    token (str): github PAT  Returns:    bool: token is valid or not  """  g=Github(token)  try:    g.get_organization("<Your github ORG")    return True  except BadCredentialsException as e:    return False

Step 4 : Build YAML file

I have modified the securecodebox template to handle Github authentication:

# SPDX-FileCopyrightText: the secureCodeBox authors## SPDX-License-Identifier: Apache-2.0apiVersion: “execution.securecodebox.io/v1”kind: Scanmetadata:  name: “pods-name”  annotations:    metadata.scan.securecodebox.io/git-repo-url:“https://github.com/<githubORG>/template"spec:  scanType: “gitleaks”  # Define a volume and mount it at /repo in the scan container  volumes:    - name: repo    emptyDir: {}    - name: ssh    secret:      secretName: github-ssh      defaultMode: 0400    - name: “gitleaks-config”    configMap:      name: “gitleaks-config”  volumeMounts:  - name: repo    mountPath: “/repo/”  - name: “gitleaks-config”    mountPath: “/config/”# Define an init container to run the git clone for usinitContainers:    - name: “git-clone”      env:      - name: GIT_SSH_COMMAND        value: “ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no”      image: bitnami/git      # Specify that the “repo” volume should also be mounted on the      # initContainer      volumeMounts:        - name: repo          mountPath: “/repo/”        - name: ssh          mountPath: “/root/.ssh/”# Clone to /repo in the init container      command:        - git        - clone        # Use the — mirror clone to get the complete repository, otherwise findings may be        # incomplete. See https://wwws.nightwatchcybersecurity.com/2022/02/11/gitbleed/        - “ — mirror”        # Add access token to the URL for authenticated HTTPS clone        - “git@github.com:<githubORG>/template.git”        - /repo/      # Pull the access token into an env variable      resources:        requests:          cpu: 32m          memory: 64Mi  parameters:    # Run Gitleaks in “detect” mode    - “detect”    # Point it at the location of the repository    - “ — source”    - “/repo/”    # Point it at your own config file    - “ — config”    - “/config/gitleaks-config.toml”

Use a simple loop to get one yaml file per repo in a new folder named “generated”:

#!/usr/bin/env bashmkdir generated && cp template.yaml repo_list.txt generated && cd generated || exitwhile IFS= read -r linedocp template.yaml “${line}.yaml”sed -i “s/template/$line/g” “${line}.yaml”cleanline=”$(echo “$line” | sed ‘s/_//g’ | sed ‘s/-//g’ | tr ‘[:upper:]’ ‘[:lower:]’)”sed -i “s/pods-name/$cleanline/g” “${line}.yaml”done < “repo_list.txt”rm template.yaml && rm repo_list.txt

Step 5 :

Deploy yaml files on k8s cluster in order to run scan

kubectl apply -n securecodebox-system -f .

After job execution, gitleaks output will be uploaded to a bucket and will trigger serverless code to check credential validity.

Valid credential will be uploaded as an input data for remediation.

Example of valid credential file:

{  “name”: “Github Personal Access Token”,  “description”: “The name of the rule which triggered the finding: Github Personal Access Token”,  “osi_layer”: “APPLICATION”,  “severity”: “MEDIUM”,  “category”: “Potential Secret”,  “attributes”: {    “commit”: “https://github.com/<your github org>/<repo name>/commit/<commit ID>”,    “description”: “Github Personal Access Token”,    “offender”: “ghp_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”,    “author”: “<commit author email>”,    “email”: “<commit author email>”,    “date”: “<commit data>”,    “file”: “<commit file>”,    “line_number”: 1,    “tags”: [],    “line”: “ghp_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”  },  “id”: “<scan ID>”,  “parsed_at”: “<scan date>”}

Next steps

Now that you have information on potential secret values that could be stored in your private repositories, your second step is to remediate. Deleting commit content can be fastidious, and you should consider a secret as leaked once it’s readable by a set of people, so all secrets must be rotated.

The best practice to remediate these plaintext secrets is, after rotation, to store them in a centralized secret management platform, validated and referenced by the security team.

Once this is done, you will have to find a way to bring this topic closer to developers in order to detect and remediate such issues earlier in the development lifecycle.

To go further in securing your Github infrastructure, you should consider other topics such as:

Do you communicate enough to share best practices and security guidelines with your teams to raise awareness?
Are permissions attributed following a RBAC definition? Are users leaving your organization automatically removed from your Github organization?
Are you able to detect and auto remediate abnormal behaviors like code repositories exfiltration?
Do you have a way to scan your repositories for vulnerabilities?
Do you implement a scan to detect secrets in your CI/CD pipeline?

Please consider and always keep in mind that security is a continuous set of actions, in which proximity with teams is a key success factor.