The Easiest Way to Find CVEs at the Moment? GitHub Dorks!

Florian Walter
7 min readFeb 8, 2024

--

In this article, I will demonstrate how I used GitHub dorks to find 24 vulnerabilities in popular open-source projects in just a few weeks while only spending time in the evenings and the weekends (see https://github.com/dub-flow/vulnerability-research for information on all my CVEs).

Before starting this journey, I had already found one CVE: A stored XSS vulnerability in Apache Spark. Around last July, I decided to start looking for some more CVEs again.

This blog post covers some aspects of my sponsored talk at Black Hat Europe 2023 in London, titled “CVE Discovery at Scale: Simplifying Complex Vulnerabilities through Source Code Analysis”, in more detail while focusing on my methodology to find the CVEs — and how I came up with the methodology in the first place.

But Where to Start and How?

I decided to start with PHP (and later Node.js and .NET) apps and randomly chose to start with the Finance sector. Based on this, I searched for somewhat popular and relevant open-source projects in that area. For me, this meant, ideally, at least a few hundred stars on GitHub, 10k+ downloads (e.g. from NPM or GitHub), and a recent commit (I didn’t want to report a CVE for a project that hasn’t been maintained since like 2014). Since I knew I could not spend too much time, I initially cloned a variety of interesting projects and first did some basic string searches against them to decide if I should take a closer look at them.

This meant searches like "SELECT , "INSERT , exec(, echo $ (i.e., SQL Injection, XSS, OS Command Injection, etc.), and pretty much skimming through all code lines that had a hit. Leveraging this approach, I could check for low-hanging fruits in quite a short amount of time. My idea was to perform a proper white box penetration test on a project once I identified a potential vulnerability in the source code. And, to my surprise, this approach worked quite well!

First Hit: Economizzer

Economizzer (https://github.com/gugoan/economizzer) is a personal finance manager. By searching "SELECT in the Economizzer project, I got an interesting-looking hit:

Looks like $category_id comes directly from the URL, and is concatenated into a SQL query without any validation or prepared statements! At this point, I knew I was going to perform a proper pentest on Economizzer and was able to exploit the SQL Injection, and in total found 7 CVEs.

This got me thinking: What if I could optimize my searches and perform them on a massive amount of projects?

(So far, I hadn’t quite figured out yet how to scale big-time, and was just cloning lots of projects and using VSCode for my searches)

Coming up with Proper GitHub Dorks

At this point, I started thinking about some good search terms and used ChatGPT to help me formulate regular expressions for Github Dorks. Some examples are:

  • PHP XSS: /\becho\b.*\$_GET\b/ or /echo\s+\$_REQUEST/
  • PHP XSS: /^.*\becho\s+\$_GET\b.*$/
  • PHP XSS (most FP-prone): /^.*\becho\s+\$\b.*$/
  • PHP SQL Injection: /(SELECT|INSERT|UPDATE|DELETE)\s(.*\$_POST|.*\$_GET|.*\$_REQUEST)/
  • PHP OS Command Injection: /(exec\(|system\(|shell_exec\(|passthru\()(.*\$_POST|.*\$_GET|.*\$_REQUEST)/
  • And, my favorite, Host Header Injection (Node.js & PHP): req.headers.host path:*pass* and /\$_SERVER\['host'\]|gethostname\(\).*(reset|forgot)/
  • .NET Host Header Injection: /(Request\.Headers\["Host"\]|Request\.Host\.Value|HttpContext\.Current\.Request\.Headers\["Host"\]|HttpContext\.Request\.Host\.Value)/ forgot
  • Host Header Injection generic: host path:**/*forgot*/**
  • Insecure Deserialization in PHP: /(unserialize\()(.*\$_POST|.*\$_GET|.*\$_REQUEST)/

So what’s this all about?

The first one searches for $_GET and echo on the same line or, respectively, $_REQUEST and echo (which would be a prime candidate for XSS). The second one is a bit more explicit and searches for echo $_GET. The third one searches for echo $, which is of course very FP-prone but may allow you to find more complex XSS. The next one searches for the strings SELECT, INSERT, UPDATE or DELETE, with $_GET, $_POST or $_REQUEST on the same line (which would be a strong indication of SQL Injection). The next one looks for exec(, system(, shell_exec( or passthru(, combined with $_GET, $_POST or $_REQUEST(i.e., candidates for OS Command Injection). I didn’t have any luck with this one though. The next ones, which were also my favorite, check for indications of Host Header injection in the forgot-password functionality (e.g., the first one looks for req.headers.host in a file that has pass in its name). The last one looks for Untrusted Deserialization in PHP.

I used these regexes in the GitHub search and boy it worked well.

You can use any of these dorks and just put them into the GitHub search:

You can find a list of all my GitHub dorks here: https://github.com/dub-flow/github-dorks (I may update this repo over time).

Proof of Concept — 1: couch-auth

couch-auth (https://www.npmjs.com/package/@perfood/couch-auth) is an NPM library that handles user authentications for NodeJS/Express using CouchDB or Cloudant, with around 20k NPM downloads (at the time of writing).

It had a Host Header injection vulnerability in the forgot password functionality which allowed an attacker to reset any user’s password. For this, you would send a request like:

POST /auth/forgot-password HTTP/1.1
Host: attacker-controlled.com
[redacted for brevity]

{ “email”: “joesmith@example.com” }

This would send out a password reset email with a URL like https://attacker-controlled.com/password-reset/<token>. If the victim clicks on the link, the password reset token will be leaked to the attacker, which would allow them to reset the password. Note that some email clients such as Outlook may automatically inspect all links in emails to check for malicious content. This means that user interaction isn’t even necessarily required to leak the password reset tokens because the link would be clicked automatically.

The reason for the vulnerability is the following Nunjucks template which is used to render the password reset email:

I found this vulnerability through multiple of my GitHub Dorks:

  • First, via req.headers.host path:*pass*
  • And req.headers.host password-reset also reveals it

I got another CVE for this vulnerability.

Now I know what you might be thinking: Wasn’t the Nunjuck template also vulnerable to SSTI? Well, that’s what I thought. But it turns out that Nunjuck does a pretty good job of sanitizing data as long as the ‘safe’ pipe isn’t used (like {{ username | safe }}).

Proof of Concept — 2: phpPgAdmin

Another interesting hit we got was for phpPgAdmin (https://github.com/phppgadmin/phppgadmin). For this, we used our PHP untrusted deserialization dork and got:

This was just one of the multiple deserialization vulnerabilities in the project and got me another CVE.

Proof of Concept — 3: PHP-Login-System

One of my favorite types of projects to look at is user management libraries. Because of that, I checked out PHP-Login-System (https://github.com/msaad1999/PHP-Login-System), and used the more FP-prone XSS regex from our above list: /^.*\becho\s+\$\b.*$/.

This gave me this hit (among many others of course):

I happily admit that this was a lucky one because this search result shows us the whole data flow of the XSS vulnerability (as in, the above preview provided by GitHub, shows the whole data-flow from source to sink).

Both XSS vulnerabilities can be trivially exploited.

Proof of Concept —4: openSIS-Classic

openSIS-Classic popped up when I was using the SQL Injection regex: /(SELECT|INSERT|UPDATE|DELETE)\s(.*\$_POST|.*\$_GET|.*\$_REQUEST)/

This hit alone was enough for me to decide to perform a proper assessment on openSIS-Classic. Ultimately, I wasn’t able to exploit the SQL Injection because of some character block lists that I was unable to bypass. However, I found 7 CVEs in other places.

That brings home an important observation of my research: Web apps that have code security smells (like SQL queries not using prepared statements but some character block lists) are likely to have vulnerabilities in other places.

Proof of Concept — 5: Head Start

The next project we look at is Head Start (https://github.com/OpenKnowledgeMaps/Headstart). Using /\becho\b.*\$_GET\b/ got me two interesting hits:

Both ended up being trivially exploitable XSS vulnerabilities.

Proof of Concept — 6: OpenSTAManager

Now let’s look at a simple XSS: OpenSTAManager (https://github.com/devcode-it/openstamanager). Using one of the XSS regexes, /^.*\becho\s+\$_GET\b.*$/, I found yet another XSS vulnerability:

Final Words

The reason I wrote this post was to demonstrate that with a bit of creativity, you can find a ton of vulnerabilities out there — even in popular open-source projects.

Of course, you could (and should!) come up with more GitHub Dorks for different vulnerabilities, and adjust them for different programming languages. Also, it’s important to always run the app and confirm exploitability. You cannot just fire up a GitHub search, find some potentially vulnerable source code, and report it as a vulnerability — you have to prove exploitability!

If you made it this far, I would like to ask you a favor. If any of these strategies worked for you, I would love to know this and hear from you on LinkedIn. You can find me at: https://www.linkedin.com/in/florian-ethical-hacker/.

--

--