Hunting for XSS with CodeQL
What is CodeQL
Some months ago I was introduced to CodeQL by scrolling through my Twitter feed and I fell in love with it ever since. As the name suggests, CodeQL is a query language. However, instead of querying entity records in a database, you query a code repository for interesting patterns. For example, imagine you have the following sample NodeJS application.
Now, let’s say you want to know in which places of your project, a property with the name foo is read from or written to. The following codeQL query will give you the results you are looking for.
Once you covered the basics, you will learn that CodeQL already has a lot of built-in queries that can be used to hunt for the most common types of coding bugs, XSS included. With that said, one way to start to hunt for bugs using codeQL is to simply run the existing queries against an open-source code repository you cloned.
For the sake of exemplification, you can take the following steps to find the DOM-based XSS vulnerability I discovered in the Discourse project using CodeQL. To replicate the finding, clone the Discourse repository, checkout commit bb2c48b0657f6182b852ab76fc190825df5d2b7f, and create a codeQL database from it.
After the database is created it is possible to use the XssThroughDom.ql query to look for potential DOM-based XSS vulnerabilities. If you set up your Visual Code+CodeQL integration you should get results that look like the following.
The results will be comprised of paths. Each path connects a source to a sink. In other words, some input to a potentially sensitive function parameter or property assignment value. I won’t get into the details about the vulnerability itself as this is out of the scope of this article. I just used this as an example to prove one can find real-world vulnerabilities in a widely used open-source project by simply using codeQL’s built-in queries.
According to GitHub, LGTM is:
A code analysis platform for finding zero-days and preventing critical vulnerabilities
LGTM allows you not only to query a lot of the projects hosted on GitHub, but it also constantly assesses them using CodeQL’s official queries. The results are available to whoever wants to see them. In fact, the vulnerability I found on the Discourse project was available in LGTM as well to whoever had the interest to analyze the alerts.
A good starting point for those interested in the platform and what it has to offer is analyzing the results for the intentionally vulnerable web application Juice-Shop.
Once it was done, I started querying all of my favorite open-source projects that were available in LGTM with it. I was able to find bugs in GitLab and in one of Github’s dependencies. Both with patches available at this point.
Once I got real results from my query I decided to take a shot at contributing to the CodeQL project itself. At the time of this writing, my query is still under review but I am confident it will be accepted and that eventually the clipboard API source will be merged into the standard XSS queries.
Tackling closed-source web applications
CodeQL works great for open-source projects, especially the ones already on GitHub. But what about using it to assess closed-source web applications?
I developed a manual approach to it that consists of the following steps:
- Install the Save All Resources Chrome extension (no royalties involved)
- With the extension enabled use Chrome to navigate to a web application
- Use the extension to download the page’s resources. Make sure to select the options highlighted below
- Unpack the zip file to a directory of your choosing and create a codeQL database from it
- Have fun querying the project
I would love to automate my process to assess closed-source web apps with CodeQL and make it scalable but I simply don’t have the time right now. I hope this article will incentivize the automation monks (e.g. TomNomNom, Jason Haddix) out there to make it happen and that they remember to tell me about it when they do ;)