Introducing Hakrawler: A Fast Web Crawler for Hackers

Luke Stephens (@hakluke)
Jan 3 · 3 min read
Hakrawler Output Example

Hakrawler?

For a long time, I’ve wanted a tool that can extract all URL endpoints from an application and simply dump them to the command-line. So I created one!

Here’s the tool: https://github.com/hakluke/hakrawler

The URLs are extracted by spidering the application, querying wayback machine, parsing robots.txt files and parsing sitemap.xml files.

The tool also collects any subdomains it finds along the way. As far as I know, this subdomain enumeration method is not currently used by any other popular subdomain enumeration tools, so it may help to uncover some additional targets.

For installation and usage details, see the repository’s readme.

Features

  • Easily chainable with other tools (accepts hostnames from stdin, dumps plain URLs to stdout using the -plain tag)
  • Collects URLs by crawling each page the application, following links
  • Collects URLs from wayback machine
  • Collects URLs from robots.txt
  • Collects URLs from sitemap.xml
  • Discovers new domains and subdomains belonging to the target as it finds them during the crawling process
  • Written in Golang
  • Variable scope can be set to narrow down or expand results
  • Can export results into files containing raw HTTP requests, which may be parsed by other tools such as SQLMap

Alternatives

There are other tools that provide similar outcomes to Hakrawler, but they weren’t quite what I needed. The most obvious choice is the Burp Suite “spider" option, which is my first choice for an exhaustive crawl of a single application, but it is tightly coupled with the rest of the Burp Suite application, resource intensive, and not designed for crawling large lists of domains, which is extremely useful for bug bounties or wide-scope pentests.

The closest alternative is Photon by s0md3v. Much of the functionality in Hakrawler tool was inspired by Photon. It’s close to what I was looking for and it has a lot of great features but it isn’t ideal for crawling large domain sets or tool chaining. Photon is also written in Python. While Python is my native language, I really wanted this tool to be in Golang for speed’s sake and to (hopefully) cut down on system resource usage. If you’re looking for a similar tool in Python though, check out Photon!

Contributions

In the end, I decided to write a custom tool for this in Golang. I chose Golang for speed — it does not rely on an interpreter and has native support for concurrency.

The only problem was that I’ve never coded anything in Golang. I watched a Golang tutorial on YouTube, and then spent a week or so coding hakrawler while Googling stupid Golang questions. If you’re looking to learn Golang, this is the video I watched:

https://www.youtube.com/watch?v=YS4e4q9oBaU

Currently, the code is a bit messy and could be more efficient, but it works! I’m releasing it as a beta in the hope that the infosec/Golang community will help me to improve on it over time. If you want to contribute, there are a bunch of feature requests and bugs you can work on in the “Issues” section of the repository. I’ll add anyone who makes a significant contribution to the “Contributors" section of the Github readme and if we cross paths IRL, I owe you a beer/coffee!

Luke Stephens (@hakluke)

Written by

Pentester | Hubby | Musician

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade