Improving product reliability by imposing constraints as a part of CI/CD process

Published in

Revolut Tech

14 min readNov 3, 2023

At Revolut, we always aim to consistently provide efficient, high-quality, and secure services. Our primary goals are to enhance product quality, increase customer satisfaction, and reduce business risk.

To achieve those goals, we should aim to be as proactive — rather than reactive — as possible. This means fixing any point of failure before it reaches production by identifying, reporting, and potentially blocking deployments that can impact the reliability of our products.

In this article, we’ll explain how we achieve those goals using automated risk calculation, data analysis, and imposing constraints to push for the mitigation of open vulnerabilities and the fix of reported bugs identified in our products.

Challenges faced by Revolut

Each application/service has its own specifications, with different technology stacks and architectures, bringing a diversity of security challenges and different vulnerabilities. That being said, application risk cannot be based only on the reported security vulnerabilities but also in the context of that specific application.

Continuous scanning, reporting, visibility, and risk evaluation are essential to providing the best security advice and automated security controls.

Nowadays there are several types of security scanners (SAST, SCA, DAST, IaC, etc.) that help security professionals to identify and report vulnerabilities, where each of them can belong to a different third-party provider. For further information on these, read our article on continuous security.

Data scraped from different sources creates friction when it’s needed to group findings and provide mitigations. This can also decrease mean time to identify vulnerabilities and mean time to provide mitigations. Also, to avoid such negative workflow impact, security must be shifted left.

Without a centralised source of truth for security-related application data, most of the time is wasted on searching for information/findings in segmented platforms and finding the correct people to provide the necessary details for the security assessments.

It’s also important to note that applications being deployed in production should also be controlled. By calculating the dynamic risk of applications and collecting the number of present bugs during CI/CD stages, it’d be possible to block the most vulnerable applications from being deployed. This would make sure that no critical vulnerabilities or bugs reach production, improving product reliability.

Therefore, the main problems were raised:

How to push for the mitigation of existing vulnerabilities?
How to push for the fix of existing bugs?
How can data from different sources be centralised?
How can a risk score be dynamically calculated using a mathematical expression that considers not only security findings but also the context of the application?
How can that mathematical expression be easily scalable, flexible, and tuned in different situations?
How to control applications being deployed using constraints?
When emergency situations occur, how to effectively bypass a constraint decision?

Proposed Ecosystem Solution — DARC

As explained in this article, Security Drone is an all-in-one security scanner framework that can be integrated in the SDLC, developed by Application Security Team. Security Drone serves an HTTP/API that receives webhooks from our code repository to initiate a security scan against the source branch of each Pull Request (PR). Taking advantage of this existing solution, it was decided to implement constraints at PR level, shifting left as much as possible. This way Security Drone, besides scanning for security vulnerabilities, will also gain blocking functionality at each PR.

Although this solution was already in place, it wasn’t enough. We still needed to collect and store all necessary data from different data sources to provide the most accurate decisions based on risk and number of bugs. With that in mind, over the last year, we created a new application called Dynamic Application Risk Calculator (DARC).

DARC is an automated security platform that provides constraining decisions based on the correlation of internal data sources.

DARC is used to calculate the risk associated with each application, as well as evaluate how many bugs a team has open. With those computational capabilities, a blocking mode is implemented at the PR stage using Security Drone integration to scan every new and updated PR.

To support such features, DARC provides company-wide integration with several internal data sources, from which data is frequently collected.

DARC is divided into the following sub-components:

API — a backend service responsible to manage requests and integrations
Collector — schedule tasks (apscheduler) used to update DARC DB with data from internal data sources, defined to refresh data at regular intervals (depending on the data source)
DB — a database to store necessary data
UI — a frontend to present all data

Added value

Blocking Mode

Two different types of blocking mode were defined in our ecosystem: one related to application risk score and another to the number of open bugs. Although both have the same main logic flow and outcomes, they should be distinct from each other to improve audit and traceability.

Risk-based blocking mode performs assessments based on the target application of the PR, which Security Drone can directly map using the repository slug. Then, the target application is sent in a request to DARC API, where the risk score will be obtained based on the security-related information associated with that target application.

Bugs-based blocking mode works by team. Security Drone maps the owner of the PR to their assigned team, then a request is sent to DARC API where the number of open bugs for that team is obtained.

For positive blocking decisions — besides blocking the PR — a message explaining why the PR was blocked is added to the target PR with a link to DARC-UI where engineers get insight into the main inputs that caused the blocking mode to trigger.

For emergency or justifiable reasons, we also implemented a derogation mechanism, that allows high-role engineers to bypass the blocking mode.

DARC also focuses on improving visibility and awareness around blocking mode, so besides adding feedback in the PR, it also sends weekly messages to global internal communication channels with the teams and applications that are currently above or near the thresholds.

Also, to improve traceability, an audit log mechanism was created that allows us to keep track of all blocked PR’s (bugs or risk-related) and their respective derogations.

Blocking Mode Sequence Diagram

Risk-Based Blocking Mode

The main goal of risk-based blocking mode isn’t only to reduce the number of vulnerabilities, but also to prevent vulnerabilities from reaching that production environment, consequently reducing the mitigation cost and decreasing business risk.

In DARC, the risk score is stateless, meaning that it is only calculated at the moment of each request using previously collected data related to the application context and vulnerabilities waiting to be fixed.

Risk is usually defined as likelihood multiplied by the impact. But at a lower level, it should depend on several factors, including the context of the application and the number of vulnerabilities found in each scanner and their associated severity. However, if the approach to define application risk scores requires manual work and does not have flexible metrics, it’d be hard to scale and maintain. For instance, if we want to integrate another scanner, vulnerabilities found from that scanner should be seamlessly integrated into the risk calculation. The same applies if another useful context attribute from the context of the application needs to be integrated. Therefore, a flexible and scalable mathematical expression is needed, where we can define different heights for each input.

Risk Factors

To calculate the risk score, DARC considers the findings from our security scanners and internal findings, evaluating the severity of each finding based on CVSS score. We ensure that all reported vulnerabilities have a CVSS score and have the target application defined, to guarantee that later we can associate each finding to its respective application. Security findings are used to calculate the Risk Factor Vulnerability (RFV) value.

Most of our scanners’ rules are created and tuned internally, with a target of less than 10% false positive rate. This is a trustworthy value that enables us to calculate RFV with precision and very low uncertainty.

Another component for the risk calculation is the Risk Factor Context (RFC). RFC will contain several child risk factors (RF) from the application context, where as a baseline we included:

[RF1] Data Store or Processed — Related to data sensitivity level, personal data and regulations applied. Risk should increase at the same level as the sensitivity level of data that the application handles
[RF2] Exposure — Internal-accessible services with internal registered DNS aren’t exposed to the public, and therefore risk should be lower in such cases. Other types of network specifications can be used in this section. ‌For example, types of firewall rules, or if it has integration with third party services
[RF3] Maintenance / Lifecycle — Security Drone scans new code. Therefore, if a service isn’t actively maintained and new code pushed, we won’t have recent scans. New vulnerabilities appear everyday and we’re constantly creating new rules under our scanners, so it’s important to scan regularly. Besides that, services that aren’t actively deployed won’t have the last version of internal updated packages (OS or software). Those are the main reasons risk should increase when an application isn’t actively maintained
[RF4] Confidentiality, Integrity, and Availability (CIA) — This is the core of information security and has a higher impact on each, as higher risk must be
[RF5] Security — From our security monitoring, we can extract some valuable information from this RF. For example, the number of security incidents or identified attacks on an application. This can increase the risk, as we know that malicious actors frequently target the application

Risk Mathematical Expression

To be flexible and resilient, the DARC mathematical expression is based on a weighted arithmetic mean.

The weighted average (x) is equal to the sum of the product of each of the RF weights (wi) multiplied by the RF value (xi) and divided by the sum of the weights:

RFs can have different weights, which gives us the ability to manipulate the importance of some attributes compared with others. This makes sense depending on the amount of value that we want to give to an attribute in the final risk value, allowing us to adapt source metrics weight in a way we understand to be more meaningful. It also provides a dynamic variation, and can never be null in case some (but not all) of the RFs aren’t applicable or available.

This approach ensures scalability in terms of new source metrics that need to be added.

The approach to this problem was obtained from the following scientific investigation in the article A Delphi Study to Categorize Security of Health Data and Provide Risk Assessment for Mobile Apps.

How DARC Score is Calculated

Baseline metrics

The following baseline metrics were defined:

Output value is defined between min=0 and max=100
Weights for each RF have values low=1, medium=2 and high=3

And also the following thresholds:

0–50 → Pass
50–75 → Warning
75–100 → Block

Calculating RFC

Values available for impact of each RF in RFC are:

Low=0
Isolated=30
Medium=60
High=100

Weight values were defined as the most meaningful values, giving more importance to the RF values that we feel are most important. Now, let’s consider the following sample data inputs related to a random application context and their respective impact values to each RF:

[RF1] Data Store or Processed (weight = 3)

Confidential data and is regulated by PCI DSS → High=100

[RF2] Exposure (weight = 3)

Internal service (not exposed to public access) but integrated with one third-party provider → Isolated=30

[RF3] Maintenance / Lifecycle (weight = 1)

Actively maintained and deployed every week → Low=0

[RF4a] Confidentiality (weight = 3)

High=100

[RF4b] Integrity (weight = 3)

High=100

[RF4c] Availability (weight = 3)

Medium=60

[RF5] Security (weight = 1)

No open incidents and some tentative of attack in last months → Isolated=30

By applying previous RF values and their respective weights, the RFC value with weighted arithmetic mean can be obtained:

RFC = (3x100 + 3x30 + 1x0 + 3x100 +3x100 + 3x60 + 1x30) / (3+3+1+3+3+3+1)

= 70.6

Previous PoC shows that — besides being easily scalable — this isn’t a static approach. The weights and impact values of each RF can be easily adapted whenever meaningful reasons are proven, giving us the ability to evolve further down the road. It was also made sure that low-impact values for RF are rarely used to have a better balance between RFC and RFV — otherwise we could end with RFC considerably decreasing the final score.

Calculating RFV

RFV is calculated based in reported vulnerabilities, using the respective severity of each security finding. All of them have a CVSS score defined, and it’ll be used as a base for a risk multiplier. The risk multiplier makes sense to avoid low values and to increase the output when high or critical vulnerabilities are reported. Considering this, rules for RFV calculation are:

Target application has a critical vulnerability (CVSS >= 9) → RFV = 100
Target application has a high vulnerability (7 <= CVSS <= 8.9) → RiskMultiplier = 3
Target application has a medium vulnerability (4 <= CVSS <= 6.9) → RiskMultiplier = 2
If only low vulnerabilities (0.1 <= CVSS <= 3.9) are present → RiskMultiplier = 1

Then, the final mathematical expression for RFV is:

RFV = sum(vulns[CVSS]) x RiskMultiplier

For rule number 1, RFV will automatically be a maximum of 100 since any critical vulnerability presents a maximum risk.

Now let’s consider an application that has four reported vulnerabilities:

SAST finding with 8.2 CVSS score
Internal finding with 7.9 CVSS score
SCA finding with 6.5 CVSS score
DAST finding with 5.3 CVSS score

RFV = (8.2 + 7.9 + 6.5 + 5.3) x 3 = 83.7

DARC score

To obtain the final DARC score, RFC and RFV values must be correlated. The same logic of weighted arithmetic mean is used, allowing us to change the importance of each by manipulating the weights or even adding another risk factor. This allows us to be adaptable and easily scalable. At the moment, both RFC and RFV have the same weight (3) since we felt they have the same importance and will bring a better balance.

Considering the previous values of RFC=70.6 and RFV=83.7:

DARC score = (3x70.6 + 3x83.7) / (3 + 3) = 77.15

The obtained DARC score is above the defined threshold (75), therefore all the PRs for the application in this example would be blocked.

Bugs-Based Blocking Mode

The main goal of bugs-based blocking mode is to reduce the number of bugs by prioritising bug resolution over future development, increasing product quality and customer satisfaction.

The flow for this blocking mode is similar to the logic behind the risk blocking mode, but it’s more simple to obtain the final decision.

DARC Collector frequently gets all reported bugs and stores them in DARC-DB, associating each bug to the respective team. Then, when requested by Security Drone to validate bugs blocking mode, DARC computes how many bugs each team has in some states — triaged and not closed.

If the team of the PR owner has more bugs than the allowed threshold, the PR will be blocked until bugs are fixed and the threshold is respected.

Relaunch Blocking Validation

Each approver action in our code repository will trigger a blocking validation flow by sending a webhook to Security Drone. This action is used to relaunch blocking validation and create derogations.

If the owner of a PR wants to relaunch the blocking validation flow after decreasing the risk or fixing bugs (to remove the blocking mode if thresholds are now respected), the owner can just approve their own PR. This will validate a new risk score or number of bugs against the thresholds. At this stage, the PR still can’t be merged since all of the PRs need to be reviewed and approved by another engineer as peer review logic.

Derogation Mechanism

In exceptional or urgent situations where the blocking mode needs to be bypassed, a derogation mechanism was created to overwrite the blockage.

This mechanism is only used in exceptional or urgent situations and not as a utility to skip fixing vulnerabilities or bugs. The main requirements for this mechanism are:

Authorisation: only engineers in the allowed approvers list will be able to create a derogation and unblock the PR. We keep this list short and restricted to AppSec, Heads of Engineering, and Directors
How to create a derogation: an authorised engineer just needs to approve the blocked PR. A webhook will be sent to Security Drone that’ll validate with DARC-API if the approver is in the allowed list. If yes, the PR will be unblocked
Tracking: DARC-DB will store each derogation occurrence for audit purposes
Alerting: each time a derogation occurs, we receive a notification in a dedicated channel

Derogation Sequence Diagram

What have we achieved?

To allow teams and engineers to comply with the thresholds and create awareness around DARC, we deployed the first version in warning mode, meaning that teams and applications above the thresholds only received warning messages in the PRs. This warning period lasted for 2 weeks, and a positive effect was felt — the number of non-compliant applications and teams decreased.

After the activation of the blocking mode, we achieved:

An automated and flexible way to quantify risk for each application
An effective way of increasing the quality of our products, with constraints implemented within a shift-left approach (at the PR phase). It takes approximately 5 seconds ‌to block a PR after its creation or update
More visibility and control over the global risk and quality of our products
An ecosystem to quickly identify and report (PR and weekly notifications) applications and teams that aren’t compliant with our thresholds for risk and bugs, respectively, increasing global awareness
UI that provides us with means to quickly access and filter all necessary security data for our daily work
A forecast feature that allows us to predict applications and teams that are currently not compliant with the threshold
Audit logs for every relevant action that’s taken under the ecosystem
All PRs are evaluated and triaged automatically. Since the activation of the blocking mode (~2 months), DARC evaluated 165,670 PR and blocked 1,832
Currently only 1% of our applications are above the defined risk threshold — the percentage was higher when blocking mode was announced
Decrease in the number of open bugs by ~ 31%
Clear metrics that the teams’ proactivity related to risk and bugs increased

Next Steps

DARC will always be under development as it can be widely integrated and its use cases are enormous. On our roadmap, we have various points, some of which include:

Decrease thresholds to achieve even better results and therefore products
Add more risk factors to DARC risk calculation, from internal sources
Improve DARC user interface
Use DARC capabilities to implement regular scans and on-demand scans
Integrate with more custom security scanners
Add other types of blocking modes

Credits

Credits go to every Revolut AppSec engineer involved in the design and development of DARC and Security Drone:

Pedro Moura, Arsalan Ghazi, Krzysztof Pranczk, Alejandro Ulises, Roger Norton, Kevin Borras