Hunting After Secrets Accidentally Uploaded To Public S3 Buckets
Intro
As part of our security team routine, we are reviewing publications of new security breaches to learn about the root cause and map them to our existing controls.
I was assigned to examine the latest SEGA’s assets exposure research, along with the causes, outcomes and possible remediations.
My research found that one of the main issues was that they mistakenly stored in a public S3 bucket, files containing secrets that allowed access to their cloud environment.
For this purpose, I developed the S3cret Scanner — a tool designed to perform scheduled or on-demand scans to hunt after various types of secrets over the organization’s publicly accessible buckets and to trigger alerts upon findings.
Could it be That Easy?
Since we often hear about incidents that involve misconfigured S3 buckets, I was intrigued to see how many open buckets with sensitive data I could find.
While leveraging buckets.grayhatwarfare.com (a database for open buckets), I was able to manually browse through many public buckets and reveal sensitive information in some of them. At that point, I decided to automate the process.
The Proof of Concept script was developed to simply fetch the buckets’ address from the grayhatwarfare database, recursively iterating through the buckets directories, downloading any textual object it finds and scanning them for secrets by using TruffleHog3 (Open Source Scanner).
The results were surprisingly interesting: Secret tokens including private keys, AWS keys, Github and JWT tokens.
I even found on a random public S3 bucket a note from a white hat hacker, urging the owner of to properly configure their bucket ACL:
PoC Insights
By default, AWS will disable public access for newly created buckets and objects. However, during my investigation, I noticed that many users assume that by not setting the bucket to “Public”, it automatically means that the objects are private, which is a common mistake.
There are two permissions that lead to file exposure:
- Public — Everyone has access to one of the following permissions: list objects, write objects, read and write permissions.
- Objects can be public — The bucket is not public, but anyone with the appropriate permissions can grant public access to objects.
Here are few real life examples:
- Some users simply upload files or whole directories without paying much attention to the content (usually through automation) to public buckets.
- DevOps configures bucket ACLs using Infrastructure as a code (Terraform, for example) which may lead to miss-configuration as Terraform skeletons are shared among peers.
Research Takeaways
As part of our company’s security posture, we’re following AWS security best practices and are also using commercial cloud security tools to monitor our infrastructure.
We’re also developing playbooks to automatically enrich alerts, trigger to the relevant user who performed the suspected action and initiate remediations.
However, what can we do to validate that no sensitive data being uploaded to buckets that is public by design?
To increase the organization’s security posture and avoid similar incidents — it’s mandatory to conduct a periodic internal scanning of the public S3 buckets to hunt after files containing secrets.
So I started by searching for an open-source tool to implement in our organization. Unfortunately, I couldn’t find any framework that is both flexible and scalable enough to scan through multiple changing accounts, which is necessary to fit our needs.
Therefore, we decided to develop an in-house tool for this purpose — the S3crets Scanner.
The open source code is available on Github — https://github.com/Eilonh/s3cret_scanner
S3crets Scanner Workflow
The flowchart below illustrates how the scanner should be implemented in an organization. As an alternative, note that I added an option to replace the first step of getting the public buckets from CSPM to reading the account’s name and ID from a CSV.
Technical Breakdown
- Mapping the Public Buckets
To obtain a list of relevant buckets, and their corresponding accounts dynamically — we use a Cloud Security tool to generate an API query for the relevant data.
The request generated a GraphQL query for all the S3 buckets and validated the results against the OPA policy engine for any one of the following configurations, set to False -
- “BlockPublicAcls”
- “BlockPublicPolicy”
- “IgnorePublicAcls”
- “RestrictPublicBuckets”
** Can also be achieved by using AWS SDK get_public_access_block method to query the above configuration.
The results are stored in two dictionaries:
- Account_id_mapping — a key value pair of the account names and the corresponding ids — {myaccount:12345}
- Buckets_results — a keys and a list of values of the account names and the bucket names — {myaccount:[mybucket_1, mybucket_2]}
2. Validating Files Exposure
Once we will have the public buckets and the buckets with the “Objects can be public” configuration, we will perform the following actions:
- Filter the objects that were last modified 24 hours ago, using JMESPath.
- Iterate over the objects.
- Download the textual files.
- Scan for secrets.
- Create a meaningful alert on the results.
These operations are performed using Boto3 S3 low level client different methods.
- Listing the objects:
The ListObjects method is limited to 1K results, which, in some cases, is insufficient. Therefore, we used the client’s get_paginator method to overcome this limitation.
- Listing the object ACL
To determine if the object is public, we will look for one of two values: - “allusers” — the object is exposed to public users.
- “authenticatedusers” — the object is exposed to all users who have an AWS account and have an active token.
3. Downloading & Scanning the Public Files
The automation will normalize the file name (essentially replace the forward slashes in the full path with underscore) and download the public files to the downloads folder in the script’s current directory.
Also, there will be an alert on encrypted files that cannot be scanned, but are highly sensitive — such as .p12 and .pgp.
Then, we will use Trufflehog3 to scan files in the “downloads” folder, with a set of predefined entropy and regex based rules, and a set of custom rules that we designed, that will match specific internal sensitive tokens and PII.
Conclusion
Storage as a Service is being commonly used, but can be the entry point for many organizations, as it can easily expose sensitive information with a single configuration mistake.
Developing an automated scanner, as an additional layer of security, can aid organizations prevent the next breach.
Our SecOps team is constantly brainstorming what could be done next to better secure our company, and automating these ideas.