The Pandora Bucket Unleashed
How to enumerate buckets (with no brute-force) for your personal bucket search engine
TL;DR Spoiler alert! Here we illustrate techniques for enumerating bucket names (not brute-force based).
In February 2017 we went online with an unstable version of the website www.buckhacker.com. In the last days we discovered a new project that is doing almost the same thing as our original project, but with also several improvements (I personally love the part of filtering file extensions and don’t limit the index to the first 1000 results, we come across the same ideas).
In the last months, www.buckhacker.com was accessible to a limited group of friends in order to participate to Bug Bounty Programs. On top of this, I’m looking for a new job so I have less time to work on the public version of the project (P.S.if you have any tip for a cool job position drop me a message).
Just one note: we are not related to www.thebuckhacker.com (copycat). This site appearead after few days we decided to go offline.
Now buckets enumeration is not a new thing. I am here to disclose our techniques to enumerate buckets. Currently it seems we have more buckets than any other website/project. We made a check this morning and the list is around 220.000 unique bucket names.
So how do we get this list of buckets? Here is the “magic” explained.
Project Sonar FDNS Dataset
FDNSv2 data set is wide known among Bug Bounty Hunters for subdomain enumeration. Basically what you can do is to download the dataset and grep (zgrep) for amazonaws.com. Why grep for amazonaws.com and not s3.amazonaws.com ? Because the buckets could have different domain names e.g.
With this technique on one of the largest FDNSv2 data sets, you can find around 60.000 buckets names. Anyhow one suggestion is to download all the datasets also from the past years (February 2017) where you can find a lot of bucket names. Another resource is the old project FDNSv1 dataset. I suggest to use also all these datasets for the same operations. We already have done all this job and parse all the data to drop duplicates, but since rapid7 open data website requires a registration, we have no intention to release this data set public. With this technique, we were also able to discover a large number of subdomain takeover vulnerabilities as described here.
Since you grep in these files, we recommend also to grep for other cloud provider services (e.g. digital ocean, google, etc).
Other Subdomain Discovery Services
From the FDNS dataset we can extract the amazon s3 bucket domain names, and run the tools on this list.
Here we perform some improvements, like how to bypass the 100 limit results on virustotal, or how to use different REST API to extract other bucket names. Probably we will release our internal tool (YellowSubDomain, ysd.py) for bucket/subdomain enumeration soon.
With this method you can find around other 60.000 bucket names (of course there will be some duplicates with the FDNS dataset).
I suggest to run this scan periodically and to pay for the premium version of some of the websites for more results.
Actually, we did not perform any brute-force technique. Anyhow we think it is quite easy to implement an incremental name bucket brute-force (buckets names start with 3 letters/numbers).
On our github you can find a word-list of common bucket prefix that could be used for brute-force specific domain names.
Super Secret Technique :)
We developed two other different techniques to enumerate buckets and these gave us the most important results from the bug bounties programs. These are the only techniques that we will not disclosure, for now, also because currently are not fully implemented and automated. I’m sure that if you play with bucket enumeration you will come across these techniques too.
I think with the cloud era we need to think about vulnerability discovery in a different way. The FDNS dataset and subdomain discovery techniques are powerful tools to look for several types of vulnerabilities on cloud providers. We are entering a new era where finding vulnerabilities could look like mining different TB of data of domain names.