GitHub Recon- For Finding Sensitive Information

Pawanrawat
5 min readFeb 17, 2024

--

Credit: https://miro.medium.com/v2/resize:fit:1125/0*N1fmHtI8gmkH_2Vu.png

Hello Hackers, Myself Pawan Rawat and this is my first blog. So this post is about my approach to GitHub recon.

GitHub ?

GitHub is a developer platform that allows developers to create, store, manage and share their code. Apart from this it also contains API keys, passwords, customer data etc. Basically it contains a lots of sensitive information which can be useful for an hacker. This sensitive information leaks can cost a company thousand dollars of damage.

What’s the new update in GitHub Search ?

GitHub remove the sorting feature and it’s just showing 5 pages of old code search in new update. You can read more about it here https://github.blog/changelog/2022-11-09-introducing-an-all-new-code-search-and-code-browsing-experience/

I will be covering Manual way of GitHub Dorking :

Manual Code Search OR GitHub Dorking

GitHub Code Search is a quite powerful and useful feature that can be used to search for sensitive data like passwords, API keys, , credentials, auth tokens, database files etc on repositories using GitHub dorks. We can search for code within a particular repository or organization.

To search for code across all public repositories, we must be signed in to a GitHub account. Apart from repositories we can also check for code, commits, issues, discussions, packages, wikis and users.

It’s not easy to find sensitive information on GitHub we need to spend a lot of time and need to check for each repository of a particular company or user. GitHub Dorking reduces your effort of searching sensitive information.

We have to use boolean operators like OR NOT AND and we Can also use Regex for finding leaks in GitHub Repos.

How to use ?

  1. Use specialized qualifiers, such as path: , language: user: path: org:and repo:

To search for an exact string we can surround the string in quotes , use Regex or we can also use quoted strings in qualifiers. we can filter out demo or example data using multiple NOT operator like NOT example NOT guest NOT localhost NOT fake NOT 1234 NOT 127.0.0.1 NOT test.

path:**/.npmrc _auth
path:*/*config.sh password abc.com NOT example.com
path:*.pem private key
path:src/*.js
language:Python
owner:octocat abc
repo:repo owner name

For example:-

path:*.env ( NOT homestead NOT root NOT example NOT gmail NOT sample NOT localhost NOT marutise) password outlook.com

We can craft our search query according to our needs and use any keywords with it like password,authtoken,pwd and we can also search for other files.

For example:-

(path:*.xml OR path:*.json OR path:*.properties OR path:*.txt OR path:*.log path:*.config OR path:*.conf OR path:*.cfg OR path:*.env OR path:*.envrc OR path:*.prod OR path:*.secret OR path:*.private OR path:*.key) AND (access_key OR secret_key OR access_token OR api_key OR apikey OR api_secret OR auth_token OR authsecret) AND (“sk-” AND (openai OR gpt))

For more info:- https://gist.github.com/win3zz/0a1c70589fcbea64dba4588b93095855

You can read more about Github Code search Syntax here https://docs.github.com/en/search-github/github-code-search/understanding-github-code-search-syntax

2. Using regular expressions (Regex)

We can use some regex to craft a pattern that will match our string or keyword by surrounding the regex in slashes.

Here are some example:-

For stripe keys
/([srp]k_live_[0–9a-zA-Z]{24})/

/sk_live_[0–9a-zA-Z]{24}/

For Slack
/https:\/\/hooks\.slack\.com\/services\/T[a-zA-Z0–9_]+\/B[a-zA-Z0–9_]+\/[a-zA-Z0–9_]+/

For domain search
/https:\/\/[A-Za-z0–9-_]+\.dell\.com\/+/

For searching password
/password=[A-Za-z0–9-_]+/

/:password=[A-Za-z0–9-_]+/ NOT example NOT guest NOT localhost NOT fake NOT 1234 NOT xxx NOT 127.0.0.1 NOT test

for finding ip ranges
/35\.21\.[0–9]{1,3}\.[0–9]{1,3}/

We can also use regular expressions with many qualifiers or boolean operators to search.

For example:-

/:password=[A-Za-z0–9-_]+/ NOT example NOT guest NOT localhost NOT fake NOT 1234 NOT xxx NOT 127.0.0.1 NOT test traget.com

/access_token=[A-Za-z0–9-_]+/ NOT example NOT guest NOT localhost NOT fake NOT 1234

We can craft regex according to our needs. We can use regular expressions to search for exact strings that contain characters that you can’t type into the search bar.

You can look for usefull regex here:- https://github.com/databricks/security-bucket-brigade/blob/3f25fe0908a3969b325542906bae5290beca6d2f/Tools/s3-secrets-scanner/rules.json

I have received this question many times:-

Ques:- How to perform GitHub recon more accurately after the recent update of 5 pages ?

Many times we will get a 100+ or 1k+ code search result, and we only have access to 5 pages in the new GitHub update. On one page we see 20 Repos, which only means 100 GitHub repos on 5 pages. For reducing the code , I used more operators with regex to filter out more useful information. We can also filter out using language: qualifiers like what language we want or not like If I don’t want markdown or html code in search result I used this- NOT language:Markdown, NOT language:html

I hope you understand it !

For key Exploitation Must check Keyhack repo-https://github.com/streaak/keyhacks

Reference:-
https://docs.github.com/en/search-github/github-code-search/understanding-github-code-search-syntax#using-regular-expressions

https://www.sshell.co/github-code-search

https://docs.github.com/en/search-github/github-code-search/understanding-github-code-search-syntax

I hope you like this :) Feel free to DM me your Queries !

Thanks to all !

Follow me for more updates:-

https://twitter.com/PawanRa20262178

https://www.linkedin.com/in/pawan-rawat-00111819b

https://www.instagram.com/chaudhary_pawan_rawat/

--

--