Mitigating Risky Pull Requests with Monocle Risk Advisor (Part 2)

By David Trejo, a member of Chime’s Security Engineering Team

Talent at Chime
Life at Chime
6 min readApr 20, 2023

--

In case you missed it, check out Part 1: Monocle: How Chime creates a proactive security & engineering culture (Part 1)

At Chime, we’re big fans of Gitops, where all application and infrastructure changes are done via pull requests. In this article, we’ll share the guardrails and tooling we’ve put in place to prevent mistakes before pull requests are merged.

When a pull request is merged, there’s a chance the new code will introduce bugs, misconfiguration, or vulnerabilities. Following security best practices should be made easy for developers, so we created tooling that gives security feedback on pull requests, where it’s simple to make edits.

Chime’s guidelines for pull requests

We expect all production repositories and pull requests to follow a few rules to reduce the risk of defects, with an emphasis on reliable and secure software.

For now, we want all pull requests (PRs) to satisfy the following prior to being merged (this changes over time):

  • PRs must have 1 or more code reviews
  • PRs to Ruby repositories must not introduce dependency confusion vulnerabilities
  • Repositories must use security-approved base images, which we patch automatically on a cadence
  • New code does not introduce critical or high vulnerabilities (keep an eye out for our upcoming article on Overwatch, with details on how we find and fix these)

How we apply rules to production pull requests with Risk Advisor

Here’s a demo of how we set up the Monocle GitHub app to receive pull request events and post an informative Risk Advisor status to pull requests. If a person merges a pull request while failing a Risk Advisor rule, then a Risk Acceptance JIRA ticket is created. Their team channel is informed, as well as the security team’s on-call person, making it easy for everyone to follow up and see when the engineer resolves the risk.

Education and Reporting

We are very happy with how Risk Advisor interacts with engineers. Risk Advisor is not a blocking check, which allows engineers to get their work done even during an incident. We realize that we don’t always have the context of why something is being done. For example, our security scanners might be down, and if an engineer responds to an incident they might merge a Pull Request without having our scanner successfully complete. We let them go through with the change and follow up later to make sure the code was written safely.

As engineers work in Github they get real-time feedback from Risk Advisor statuses at the bottom of their pull requests, where they can click through to learn about the rules and why they’re important.

Reporting is simple because Risk Advisor creates JIRA tickets. JIRA tickets can be easily transitioned, resolved, and put into dashboards for the Security team and Compliance team to monitor.

Trends and Data

We find that the majority of risk acceptances come from two places:

  • An engineer creates a new project and merges PRs without setting up all the controls
  • We’ve recently rolled out a new rule that teams are still working to follow

After getting in touch with an engineer who has accepted a risk, it’s typically very quick to enable the control and return things to a good state. Many engineers enable the control on their own, without a reminder from a security person.

Since putting Risk Advisor into production in late 2021, it has created 81 tickets, out of which 48 were remediated within 1 business day of creation. The remaining 33 tickets were closed out as low risk by the service owners. Reactions from engineers are positive and prompt.

We only see ~3 risk acceptances a month across all ~70 services, so the workload is relatively low for engineers, security people, and compliance.

Audit Controls (e.g. SOC1, SOC2)

Before Risk Advisor, our auditors reviewed random samples of pull requests for whether the controls were followed; this would take many hours. At other organizations, we’ve seen as much as 2000 hours per year of program management and engineering time spent on audit preparation.

Thanks to Risk Advisor, our auditors love us. The tooling we’ve built allows them to easily track deviations from our controls and see them fixed. Shout out to our Audit and Security Risk Governance teams, who worked extremely hard and excelled in their recent audits!

Want to skip the technical details? Jump to the conclusion: “Future Work: Policy Builder UI.”

Risk Advisor Architecture Diagram

Sequence of events

A couple of implementation details: Risk acceptances are specific to the final commit on the pull request (if you add more code and then merge, you’ll need to accept risk again). Rule assertions happen at the moment the pull request is merged–this allows Risk Advisor to catch disabled Github branch protection settings, for which we don’t receive webhook events. Depending on the rule, e.g. 1+ approvals, Risk Advisor only checks the branch protection settings–Github is still responsible for blocking merges without approvals with its built-in functionality.

Rego Policies

Here’s a sample of the code for a policy; it’s relatively simple to write once you become familiar with Rego. Rego is used heavily in other parts of the organization.

default branch_protection = false

branch_protection {
some i
some j
input.repo.scores[i].score_type == "security"
input.repo.scores[i].score_factors[j].name == "default_branch_protected"
input.repo.scores[i].score_factors[j].pass == true
}

rules_raw := [
{
"name": "branch_protection",
"pass": branch_protection,
"reason": "All code should have at least one code review, including administrator merges.",
"call_to_action": call_to_action("branch_protection"),
},
# [...]
]


rules = [rule | rule := rules_raw[_]; is_boolean(rule.pass)]


merge {
passing := {i | rules[i].pass}
count(passing) == count(rules)
}

Future work: Policy Builder

We hope to allow teams to customize the Rego rules that run on their repositories using a web UI. Some exciting work has been done in this area, about which you can learn more in this talk: Chime: Getting OPA the Data it Craves with Custom Rego Batch Loading Functions by Ed Paget and Donovan Lampa.

Ideally, it’d be easy for teams to create rules like “no PR merges outside of EST business hours, or after 4 pm on Fridays.” or “no PR merges if the new code introduces a Critical or High Vulnerability that can be fixed”. While we currently use Risk Advisor to add security and compliance guardrails, the implementation is generic and enables any team to add their own custom policies.

I hope you enjoyed this article! Please feel free to send follow-up questions to us using security åt chime dot com, or ask me at @dtrejo@infosec.exchange on Mastodon. I’ll be giving a talk about Monocle at BSides SF ’23 on April 22nd, so if you’re around, consider dropping by!

Have a great week,
David Trejo

In case you missed it, check out part 1: Monocle: How Chime creates a proactive security & engineering culture (Part 1)

--

--