LLM Agents can Autonomously Hack Websites

Daniel Kang
4 min readFeb 13, 2024

--

LLMs have dramatically increased in their capabilities over the past few years and can now aid in legal planning, solve Olympiad-level geometry problems, and even support scientific research. As their capabilities have increased, so has their potential for dual-use, or harmful uses. Despite this, all of the known harmful capabilities of LLMs are easily searchable:

However, LLMs have recently been embodied as agents capable of taking actions, which has raised questions about the potential for dual use.

In our recent work, we show that LLM agents can autonomously hack websites, answering the question of whether or not LLM agents are capable of concrete harm. These LLM agents can perform complex hacks, such as blind SQL union attacks. Performing such attacks requires the agents to navigate websites and take up to 45+ actions to perform the hacks. Importantly, we show that only GPT-4 and GPT-3.5 are capable of these hacks, with no open-source LLM able to hack websites. Our findings raise questions about the broad deployment of frontier models.

System diagram for enabling LLM agents to hack websites

In the remainder of this blog post, we provide an overview of LLM agents, describe our experimental findings in-depth, and conclude with thoughts on the deployment of LLMs. See our full paper for more details!

LLM Agents

In the past year, researchers have developed methods to give LLMs the ability to take actions. When LLMs can take actions, they are commonly called LLM agents. The most common way for LLMs to take actions is to use APIs via function calls. To do so, the LLM must produce text conforming to the function call API and parse the results.

Beyond calling APIs, LLM agents can also be enhanced with extended functionality. In our work, we focus on the ability to read documents and use extended context. To read documents, we can produce embeddings for the documents as a form of retrieval-augmented generation (RAG). Documents can encourage the LLM to focus on specific topics. To use extended context, we simply add the result of the previous function call to the running context. We implemented this functionality via the OpenAI assistants API and LangChain.

Enabling LLM Agents to Autonmously Hack Websites

Agent Setup

In order to hack websites, we used a variety of LLMs with the ReAct framework. To implement these agents, we used LangChain. For OpenAI’s models, we used the Assistants API. We additionally gave the agents 1) the ability to interact with websites via Playwright, 2) six documents to read from (covering web hacking), and 3) detailed system instructions.

Hacking Websites

In order to ensure that our hacking attempts did not harm any real websites or people, we started by creating 15 sandboxed websites with 15 different vulnerabilities. These vulnerabilities ranged from simple SQL injections to complex, multi-step vulnerabilities requiring taking several actions to exploit. See our full paper for a complete list of all of these vulnerabilities.

We used a total of 10 LLMs, including GPT-4, GPT-3.5, and 8 open-source models rated highly on Chatbot Arena. We attempted to hack each website 5 times and recorded a success if any attempt succeeded:

Success rate for LLM agents hacking websites.

As we can see, GPT-4 can hack 11 out of the 15 websites. However, every open-source LLM fails on every vulnerability. Our results show a strong “scaling law” with the capability of LLMs.

Understanding LLM Agent Capabilities

GPT-4 is capable of performing complex hacks, such as a difficult SQL union attack. Performing such an attack requires taking many steps, including:

  1. Navigating between pages to determine which page to attack.
  2. Attempting a default username and password.
  3. Using the resulting information to attempt a SQL injection.
  4. Reading the source code to determine that the SQL query contains a _GET parameter.
  5. Determining that the query exposes a SQL union attack.
  6. Performing the SQL union attack itself.

These attacks can take as many as 48 steps to complete, showing the capability of GPT-4.

Hacking Real Websites

Finally, we turned to hacking real websites. We carefully ensured that GPT-4 did not compromise any real websites or personal data by only detecting the vulnerability and not exploiting it. We curated approximately 50 real websites to test and applied our agent to these websites.

GPT-4 was able to find a vulnerability in one of the websites, showing that GPT-4 can hack real websites.

Conclusions

Researchers have speculated about the effects of advanced LLMs on a range of domains, including cybersecurity. In our work, we show that LLM agents can autonomously hack websites, showing the potential for these agents in cybersecurity offense. We further show a strong scaling law for the ability of LLMs to hack websites, with every open-source model failing but GPT-4 achieving a 73% success rate.

As LLMs become more capable, cheaper, and easier to deploy, the barrier for malicious hackers to use these LLMs will drop. Although the actual deployment of LLMs for these circumstances has yet to be seen, other offensive technologies have been widely deployed.

Similar to other dual-use technologies, we believe it will become increasingly important for deployers of LLMs to carefully consider their use. In particular, the unfettered deployment of highly capable open-source LLMs may further exacerbate the potential uses of LLMs for hacking. Although we do not have answers to these considerations, we hope our work spurs discussion in this direction.

Written by Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, and Daniel Kang

--

--