AI vs. Human: Who is the better Vulnerability Researcher? 🧠🤖

Florian Walter
3 min readNov 7, 2023

--

In the evolving landscape of cybersecurity, the rise of AI tools like ChatGPT has opened new horizons in code analysis and vulnerability research. Like many others, I have been playing around with tools like ChatGPT to assess its capabilities in understanding and writing code — and its ability to identify application security flaws. And like many others, I have been thoroughly impressed.

So I was asking myself if it’s time to hang up my vulnerability researcher boots and just let AI do its thing?

Well, maybe not quite yet…

🕵️ Human Expertise: The Manual Discovery of CVE-2022-31777

Last year, I found CVE-2022-31777, a set of stored cross-site scripting vulnerabilities in Apache Spark, solely via source code analysis. This led me to question: Could ChatGPT also have uncovered this vulnerability solely through source code analysis?

This is one of the XSS vulnerabilities:

cleanData is added into the DOM using the risky .append() function. It is derived from data which is the HTTP response body from an AJAX request to the URL defined by getRESTEndPoint(), which is /log. The /log endpoint is defined in WorkerWebUI.scala, and returns the result of logPage.renderLog().

render.Log() is defined in LogPage.scala and sends back data from the log files without HTML encoding. Log file data is usually partially derived from user input, i.e. attacker-controlled.

If you want more information on CVE-2022-31777, I recommend checking out my write-up.

🤖 AI Expertise: Time for ChatGPT to Try

Curious about the capabilities of AI in vulnerability detection, I turned to ChatGPT and uploaded the 3 files involved in the vulnerability’s data-flow (from https://github.com/apache/spark/releases/tag/v3.3.0, which is one of the vulnerable versions).

You can see my conversation with ChatGPT here: https://chat.openai.com/share/2db62d5b-d526-4c37-aeb6-2652c219a6c4.

The AI demonstrated an impressive grasp of potential XSS patterns, flagging concerns with .html() and .prepend() functions in log-view.js and recognizing the implications of unescaped user input in the Scala backend files. It also correctly linked the frontend behavior with backend data handling.

However, this is where the limitations became apparent. While ChatGPT provided valuable insights, it only mentioned the risky .prepend() but missed out on the equally risky .append() function and required precise prompts to navigate the analysis. The AI lacked the ability to independently understand the full context and the complexity of data-flow that a seasoned human researcher could grasp with a holistic view of the codebase.

Also, keep in mind that I only uploaded the 3 relevant files to ChatGPT which limited the scope dramatically and made finding the vulnerabilities much easier.

I repeated the same experiment with the whole codebase of Apache Spark 3.3.0, and ChatGPT was thoroughly overwhelmed and required lots of help to provide meaningful results. It also wasn’t able to find the connection between log-view.js and WorkerWebUI.scala anymore. You can find this conversation here: https://chat.openai.com/share/94cdef5b-b7c2-4917-a9c0-a6be08a340de.

đź’ˇ Conclusion: Advanced Yet Incomplete

While AI has advanced significantly, it still falls short of the seasoned intuition and complex reasoning that human experts bring to the table. AI can scan, suggest, and support, but it cannot yet replace the nuanced investigation that human eyes conduct.

Happy to hear your thoughts on the subject! You can reach me via https://www.linkedin.com/in/florian-ethical-hacker/.

--

--