X9: How Resultados Digitais can automatically detect sensitive data in its code repositories

Vitoria Rio
Ship It!
Published in
4 min readOct 9, 2020

As companies become bigger, with an ever-growing amount of features, developers constantly create and integrate new pieces of code as commits, pull requests, etc. With the rise in popularity of agile methodologies, a more fast-paced solution for vulnerability analysis is becoming a must. As soon as code is committed to the repository, if that commit contains sensitive information, actions must be taken to address it immediately, since it may eventually lead to data leaks.

However, delving into hundreds of commits and pull requests is no easy task even for a team of skilled security engineers, that’s where X9 comes in. X9 is an open-source tool created to automate the detection of sensitive information in the organization’s GitHub repositories. It can automatically analyze code and create notifications through Slack in near-real time.

Architecture

Github allows us to read several events such as commits and pull requests through the use of Github Webhooks. For instance, when a developer creates a PR, an event is created on the Github platform, which then sends a message payload to a pre-configured route (in this case, the X9 interface) containing detailed information regarding that event. This approach works best when we consider that Github allows us to configure webhooks for the whole organization, allowing X9 to analyze all repositories from this scope.

By configuring the events for the whole organization, X9 not only guarantees that all repositories are being analyzed, but also streamlines the analysis process, giving security engineers more flexibility and control over the company’s potential security threats.

Upon receiving the event payload, the app clones the repository’s branch and performs several security tests. Each event is processed by an independent worker which can be scaled up in case of higher throughput. All scanned vulnerabilities are then stored in a PostgreSQL database and notified on a Slack channel. The Slack interface allows one to directly create an issue on Github or flag the vulnerability as a false-positive.

Analysis

X9 uses specific signatures and patterns to detect sensible data in code. These signatures are regular expressions tailored for this specific purpose such as r'[a-zA-Z0-9]*\@emailexample.com' which may detect personal emails or the following, which detects AWS Access Keys inside the file contents (defined further ahead):

The application also allows the user to configure custom regular expressions, enabling a more domain-oriented approach. There are four types of signature contexts for X9:

  • filename: Searches for complete filenames
  • extension: Searches for specific extensions (e.g.: .pem, .db)
  • path: Searches for the complete path of a predetermined folder of a file on the repository
  • contents: Searches for patterns inside the files

Notifications and actions

Through Slack Webhooks, X9 sends detailed information for each finding to a user-defined channel. It is important to note that any possibly sensitive information is properly obfuscated, as seen below:

The interactive components (Open Issue and False Positive buttons) allow for direct issue creation on GitHub and disregard the vulnerability as a false positive respectively. If an issue needs to be created from the message, the following issue format is created on the vulnerable repository:

Dashboard

X9 also has a dashboard for vulnerability visualization. This dashboard is an optional separate application that does not interfere with the security analysis. It uses OpenID Connect for login authentication.

The dashboard is a table view that allows filtering by repository name and vulnerability type. If created, the issue’s link will be shown as well.

Final Thoughts

X9 is still a work in progress and is actively being updated, contributions are welcome :).

Check it out at github.com/ResultadosDigitais/x9.

References

This project was inspired by Shhgit and Gitleaks, which provided most of the signatures and analytic strategy.

--

--