Whispers: Advanced secrets detection

Skyscanner Engineering
6 min readApr 27, 2020

--

Introduction

Skyscanner’s Continuous Integration and Delivery (CI/CD) pipeline is built to support tens of thousands of deployments per day. The frequency of production deployments cannot come at the expense of security. In fact, security processes need to be integrated within the CI/CD pipeline. That’s why we added continuous security validation at each step of the pipeline, from development to production, to help ensure our applications stay secure.

We’ve previously discussed Skyscanner initiatives for improving code security:

As part of the pipeline, we began using SonarQube™ for code quality purposes. This validation happens before the developer commits their code as SonarQube™ is integrated in the developer’s IDE. We decided we could leverage SonarQube™ further by also checking for vulnerable coding patterns. During this process we identified great existing plugins like Findsecbugs for Java, but we also noticed the lack of static code analysis plugins for Python and Node.js. We decided to write the missing plugins in order to achieve full coverage of our standard main languages (Python, Java and Node.js). We started with Sonar Secrets to provide early feedback to developers, alerting them of security risks associated with using hardcoded credentials. Providing developers with feedback early on allowed us to shift our security controls to the left, enabling developers to meet our internally-defined Security Standards before production code goes live.

Sonar Secrets plugin for SonarQube™ — https://github.com/Skyscanner/sonar-secrets — is built by Skyscanner Product Security Squad and is designed to identify hardcoded secrets such as passwords, API tokens, AWS credentials, and others. However, this was still not enough, since we were not getting coverage on static text files that were frequently being used for service configuration specification, and often contained hardcoded secrets.

So we decided to rethink this process and this is how Whispers came to be.

Whispers

Whispers is a static code analysis tool designed for parsing various common data formats in search of hardcoded credentials and dangerous functions: https://github.com/Skyscanner/whispers

We’ve been running Whispers in production for some time now as part of our CI/CD pipeline. Much like the SonarQube secrets plugin Whispers performs the same functionality but extends coverage to include static text files, allowing for the detection of passwords, API tokens, AWS keys, private keys, hashed credentials, and authentication tokens.

The intended usage was static structured text file analysis, such as JSON, YML, XML and so on. The program works by parsing known formats and extracting key-value pairs, where key is the field name and value is the potentially hardcoded secret assigned to the given key. The pair then goes through a series of validation steps to determine whether a hardcoded secret is present. This process knows how to distinguish between static text, variables, placeholder, function calls, and so on, which greatly improves detective capabilities and reduces the number of false positives. During the run Whispers will report JSON-formatted items with hardcoded secrets that it finds.

As a nice “side effect” of Whispers being written in Python, it becomes trivial to parse Python source code and traverse generated Abstract Syntax Trees (ASTs) looking for secrets. Some work and additional parsing is required to cover both Java, JavaScript and other languages. Python AST parsing is currently implemented at a very basic level looking for dangerous functions, like eval, exec, os.system and others that allow arbitrary code execution.

Features

Configuration

There are several configuration options available in Whispers. It’s possible to include/exclude results based on file path, key, or value. File path specifications are interpreted as globs. Keys and values accept regular expressions and several other parameters. There is a default configuration file built in that will be used if you don’t provide a custom one.

You can use this as a template for creating your own config. The fastest way to tweak detection (i.e. remove false positives and unwanted results) is to copy the default config into a new file, adapt it, and pass it as an argument to Whispers.

Plugins

All parsing functionality is implemented via plugins. Each plugin implements a class with the pairs() method that runs through files and returns the key-value pairs to be checked for secrets.

Plugins generate key-value pairs that first go through different checks to establish whether there is a hardcoded value present. Next, Whispers uses rules to know what it should report.

Rules

Rules specify the actual things that should be detected. There are several common ones that come built in, such as AWS keys and passwords, but the tool is made to be easily expandable with new rules.

Rule configuration allows defining different levels of severity per rule, which enables us to block deployments depending on the reported severity level.

Execution

Whispers can be executed from the command line, and the output will look like this:

Alternatively, you can execute it from Python:

Make sure to check out https://github.com/Skyscanner/whispers/blob/master/README.md for more information!

Github Integration

Our main objective was to get more complete coverage of hardcoded secret detection across our entire code base. In order to provide feedback to developers as early as possible, we integrated Whispers with Github for commenting PRs and alerting the findings:

At Skyscanner we have this implemented as part of our security build step within our CI/CD pipeline. We created a custom config file to tailor results to our particular scenarios, and pass it to Whispers along with the project source code directory.

whispers -c config.yml /project/src

Results are fed into our internal vulnerability management solution, which allows for the aggregation of statistics across the entire organisation providing a more efficient incident response.

What’s next?

Please join the project and submit more plugins for other known filetypes you might have in your systems!

Looking forward, we have plans to release more security tools we’re using at Skyscanner. Stay tuned!

About the authors:

Our names are Artem Tsvetkov and Christian Martorella, we are part of the Security Tribe, we are both based in Barcelona. We look after at the security of our Products, Software Development Lifecycle and Security Engineering to protect our travellers.

--

--

Skyscanner Engineering

We are the engineers at Skyscanner, the company changing how the world travels. Visit skyscanner.net to see how we walk the talk!