Getting a Overview of the Secrets Stored as Plaintext in Your Github Private Organization
Summary
Launch a simple one-time scan — Get a quick overview of the exposed secrets.
Deployment guide — Step-by-step technical guide.
Next steps — Go further securing your Github organization.
Context
Public repositories are continuously scanned by Cloud Providers such as AWS or GCP to alert customers when authentication secrets are leaked publicly.
What about secrets exposed internally in private Git code repositories (SaaS or On-Premise)?
What if an internal Github account is leaked? What if a developer leaks a Github personal access token or a ssh key?
The risk could be the leak of your organization code, the potential vulnerabilities inside it, or worse, it could provide bad actors plaintext secrets stored in code and open multiple doors on your Information System.
Launch a simple one-time scan
In this article, I propose a simple and quick way to evaluate the quantity of secrets that could be present in your repositories, at scale.
The result of this scan will enable you to:
- Evaluate to what extent your teams are aware of the risks that represent plaintext secrets available in code.
- Collect all the needed data to help you initiate a remediation campaign.
- Evaluate your needs in terms of long term security tooling on Github.
Two existing tools are used for this scan:
- The open source Github scanning engine Gitleaks: https://github.com/zricethezav/gitleaks
This tool parses a git clone output to match patterns that you will define for your organization.
- The open source SecureCodeBox framework, which permits to automate execution of various well-known security tools by using Kubernetes: https://www.securecodebox.io/docs/scanners/gitleaks/
This tool orchestrates scans at scale:
- Repository retrieval
- Gitleaks execution
- Result export
In addition, you could use Cloud services to run Kubernetes, store results, analyze data and output relevant information.
The below diagram represents the complete worklow used for this scan :
Depending on the solutions used within your company and the corresponding criticality, you will have to define regular expressions matching your environment to feed the gitleaks configuration file.
For a first scan, I recommend targeting the most critical secret types for your organization. Otherwise, you could get an excessive amount of data to deal with, as the default configuration file contains a lot of expressions, including generic ones.
This is an example which targets GCP service accounts, and Github tokens:
#gitleaks.toml[[rules]]id = “Github Personal Access Token”description = “Github Personal Access Token”regex = ‘’’ghp_[0–9a-zA-Z]{36}’’’[[rules]]id = “Google (GCP) Service-account”description = “Google (GCP) Service-account”regex = ‘’’\”type\”: \”service_account\”’’’
This is a minimal working example, but you can read Gitleaks documentation for advanced configuration.
To test this configuration file, you have to create a repository containing fake secrets you are looking for to verify that they will be detected (needless to say that these credentials should be, either deleted first or without any permission).
Now that you are able to match some patterns, you will need to check if they are usable by trying to connect to the provider’s APIs using dedicated functions.
Deployment guide
Step 1: Deploy SecureCobeBox operator and gitleaks scan type. The operator chart is customized with a parameter that permits uploading data to a bucket with S3 protocol.
# Add the secureCodeBox Helm Repohelm repo add secureCodeBox https://charts.securecodebox.io# Create a new namespacekubectl create namespace securecodebox-system# Install the SecureCodeBox Operator & CRD’shelm — namespace securecodebox-system upgrade — install securecodebox-operator secureCodeBox/operator -f value-operator.yaml# Install SecureCodeBox Gitleaks scan typehelm — namespace securecodebox-system upgrade — install gitleaks secureCodeBox/gitleaks
Modify the following file to match your storage endpoint and bucket name :
#Value-operator.yamlminio:# disable the local minio instance enabled: falses3: enabled: true bucket: gitleaks-results endpoint: storage.googleapis.com/s3.eu-west-1.amazonaws.com/ keySecret: bucket-credentials
Step 2: Configure k8s secrets and configmaps
# Deploy ssh key used to git clone repositorieskubectl create — namespace securecodebox-system secret generic github-ssh — from-file=/home/username/.ssh/id_rsa# Deploy credentials used to upload gitleaks results to a bucketkubectl create — namespace securecodebox-system secret generic bucket-credentials — from-literal=accesskey=”XXXXXXXXXXXXXXXXXXXXXX” — from-literal=secretkey=”XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”# Deploy gitleak configuration filekubectl create configmap — from-file gitleaks-config.toml gitleaks-config -n securecodebox-system
Step 3: Deploy functions that will verify credential validity based on your needs
Below is an example of Python code hosted in a Cloud Function, that will test the validity of Github personal access tokens, and upload data to GCS in case it is:
from google.cloud import storageimport jsonimport base64import timefrom github import Github, GithubException, BadCredentialsExceptiondef hello_gcs(event, context): """Triggered by a change to a Cloud Storage bucket. Args: event (dict): Event payload. context (google.cloud.functions.Context): Metadata for the event. """ file = event if "id" in event: if "findings" in event["id"]: # Initialise a client storage_client = storage.Client("<hosted GCS project>") # Create a bucket object for our bucket bucket = storage_client.get_bucket("<bucket name") # Create a blob object from the filepath blob = bucket.blob(f"{event['id'].rsplit('/', 3)[1]}/{event['id'].rsplit('/', 3)[2]}") # Download the file to a destination result = blob.download_to_filename("/tmp/findings.json") with open("/tmp/findings.json") as f: github_keys=[] data = json.load(f) upload_data=[] for entry in data: # Retrieve github results if entry["name"] == "Github Personal Access Token": github_keys.append(entry["attributes"]["offender"]) #Test token against api github_keys = list(set(github_keys)) for key in github_keys: if gh_token(key): for entry in data: if entry["attributes"]["offender"][-41:] == key : upload_data.append(entry) gcs_upload("github_pat", upload_data) upload_data.clear()def gcs_upload(type: str, data: list) -> None: """Upload data on GCS bucket Args: type (str): secret type data (list): list of data to be uploaded Returns: None """ storage_client = storage.Client("<hosted GCS project>") bucket = storage_client.get_bucket("<bucket name>") blob = bucket.blob(f"{type}/{time.strftime('%Y%m%d-%H%M%S')}_{type}_{data[0]['attributes']['email']}.json") blob.upload_from_string(json.dumps(data))def gh_token(token: str) -> bool: """Test Github PAT validity. Args: token (str): github PAT Returns: bool: token is valid or not """ g=Github(token) try: g.get_organization("<Your github ORG") return True except BadCredentialsException as e: return False
Step 4 : Build YAML file
I have modified the securecodebox template to handle Github authentication:
# SPDX-FileCopyrightText: the secureCodeBox authors## SPDX-License-Identifier: Apache-2.0apiVersion: “execution.securecodebox.io/v1”kind: Scanmetadata: name: “pods-name” annotations: metadata.scan.securecodebox.io/git-repo-url:“https://github.com/<githubORG>/template"spec: scanType: “gitleaks” # Define a volume and mount it at /repo in the scan container volumes: - name: repo emptyDir: {} - name: ssh secret: secretName: github-ssh defaultMode: 0400 - name: “gitleaks-config” configMap: name: “gitleaks-config” volumeMounts: - name: repo mountPath: “/repo/” - name: “gitleaks-config” mountPath: “/config/”# Define an init container to run the git clone for usinitContainers: - name: “git-clone” env: - name: GIT_SSH_COMMAND value: “ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no” image: bitnami/git # Specify that the “repo” volume should also be mounted on the # initContainer volumeMounts: - name: repo mountPath: “/repo/” - name: ssh mountPath: “/root/.ssh/”# Clone to /repo in the init container command: - git - clone # Use the — mirror clone to get the complete repository, otherwise findings may be # incomplete. See https://wwws.nightwatchcybersecurity.com/2022/02/11/gitbleed/ - “ — mirror” # Add access token to the URL for authenticated HTTPS clone - “git@github.com:<githubORG>/template.git” - /repo/ # Pull the access token into an env variable resources: requests: cpu: 32m memory: 64Mi parameters: # Run Gitleaks in “detect” mode - “detect” # Point it at the location of the repository - “ — source” - “/repo/” # Point it at your own config file - “ — config” - “/config/gitleaks-config.toml”
Use a simple loop to get one yaml file per repo in a new folder named “generated”:
#!/usr/bin/env bashmkdir generated && cp template.yaml repo_list.txt generated && cd generated || exitwhile IFS= read -r linedocp template.yaml “${line}.yaml”sed -i “s/template/$line/g” “${line}.yaml”cleanline=”$(echo “$line” | sed ‘s/_//g’ | sed ‘s/-//g’ | tr ‘[:upper:]’ ‘[:lower:]’)”sed -i “s/pods-name/$cleanline/g” “${line}.yaml”done < “repo_list.txt”rm template.yaml && rm repo_list.txt
Step 5 :
Deploy yaml files on k8s cluster in order to run scan
kubectl apply -n securecodebox-system -f .
After job execution, gitleaks output will be uploaded to a bucket and will trigger serverless code to check credential validity.
Valid credential will be uploaded as an input data for remediation.
Example of valid credential file:
{ “name”: “Github Personal Access Token”, “description”: “The name of the rule which triggered the finding: Github Personal Access Token”, “osi_layer”: “APPLICATION”, “severity”: “MEDIUM”, “category”: “Potential Secret”, “attributes”: { “commit”: “https://github.com/<your github org>/<repo name>/commit/<commit ID>”, “description”: “Github Personal Access Token”, “offender”: “ghp_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”, “author”: “<commit author email>”, “email”: “<commit author email>”, “date”: “<commit data>”, “file”: “<commit file>”, “line_number”: 1, “tags”: [], “line”: “ghp_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX” }, “id”: “<scan ID>”, “parsed_at”: “<scan date>”}
Next steps
Now that you have information on potential secret values that could be stored in your private repositories, your second step is to remediate. Deleting commit content can be fastidious, and you should consider a secret as leaked once it’s readable by a set of people, so all secrets must be rotated.
The best practice to remediate these plaintext secrets is, after rotation, to store them in a centralized secret management platform, validated and referenced by the security team.
Once this is done, you will have to find a way to bring this topic closer to developers in order to detect and remediate such issues earlier in the development lifecycle.
To go further in securing your Github infrastructure, you should consider other topics such as:
- Do you communicate enough to share best practices and security guidelines with your teams to raise awareness?
- Are permissions attributed following a RBAC definition? Are users leaving your organization automatically removed from your Github organization?
- Are you able to detect and auto remediate abnormal behaviors like code repositories exfiltration?
- Do you have a way to scan your repositories for vulnerabilities?
- Do you implement a scan to detect secrets in your CI/CD pipeline?
Please consider and always keep in mind that security is a continuous set of actions, in which proximity with teams is a key success factor.