Detection Development Lifecycle

Snowflake’s implementation of the Detection Development Lifecycle

Photo by Anton Maksimov 5642.su on Unsplash

This post was co-authored with Tammy Truong and Michele Freschi.

TL;DR

This post highlights Snowflake’s implementation of the Detection Development Lifecycle, which is a well-defined process to produce and maintain threat detections. Our lifecycle is broken down into the six phases: Requirements Gathering, Design, Development, Testing and Deployment, Monitoring, and Continuous Testing. Establishing a well-defined Detection Development Lifecycle enhances the quality detections built, provides robust documentation, helps with scaling the team, and serves as the foundation for program metrics.

What is the Detection Development Lifecycle?

Software engineers follow the Software Development Lifecycle (SDLC) in order to build successful applications, but what about detection engineers? Security teams tasked with spotting and stopping threat actors should carefully consider how they build and maintain detection logic. Otherwise, detections will suffer from low fidelity, false positives will cause alert fatigue and more attacks will go undetected until after the breach blows up.

Detection engineers at leading security teams are adopting Detection-as-Code and applying it within the “Detection Development Lifecycle”. This is a critical component for any threat detection program and it means having a well-defined process to produce and maintain detections. In our previous post on the Threat Detection Maturity Framework, we highlighted the Detection Development Lifecycle and, given its importance, we wanted to share the process that we use here at Snowflake.

Threat actors are increasingly well funded, highly capable, and constantly innovate to take advantage of new technologies and paradigms like cloud and serverless computing, so threat detection teams will never be able to build detections for every possible adversary technique. As such, successful defenders need a repeatable process for how detections are built with risk and intel-based prioritization, continuous monitoring and testing to ensure high fidelity. At a macro level these processes divide into two areas: Detection Creation and Detection Maintenance. The diagram below covers the Detection Development Lifecycle used at Snowflake:

Detection Development Lifecycle implementation at Snowflake

As shown in the diagram above, the Detection Development Lifecycle is made up of six phases:

  1. Requirements Gathering
  2. Design
  3. Development
  4. Testing and Deployment
  5. Monitoring
  6. Continuous Testing

Let’s dive into each phase of the Detection Development Lifecycle:

Requirements Gathering

The purpose of this phase is to provide a single point of entry for the Threat Detection Team to receive and prioritize all detection requests. It is essential that any team with a role in protecting the enterprise is aware of the process for submitting requests to the Detection team. At Snowflake, detection requests are submitted by the following teams:

  • Product and Corporate Security
  • Threat Intelligence
  • Incident Response
  • Audit and Compliance
  • Red Team
  • Threat Detection (internal)

During this phase, technical details are collected from relevant stakeholders in order to streamline the build process of requested detections. During the intake process at Snowflake we collect information such as the detection goal, target system and its function, vulnerability, risk, threat model, and desired alerting methods (Slack, Jira, etc). After a detection request is submitted, it is prioritized according to a risk and intel-based prioritization process. This methodical approach enables strategic planning, resource allocation, and effective work delegation. In addition, metrics are collected around detection coverage to drive improvements to the prioritization process.

At Snowflake, we have learned that it is key for Threat Detection to engage Product and Corporate Security teams early on to identify requirements for effectively monitoring new features and systems. We work closely to achieve an understanding of what is being built in order to assist with risk identification and mitigation. This includes capturing logging requirements, validating mitigations, reviewing threat models and attack trees, identifying detection opportunities, and designing systems to maximize detection efficacy.

Design

Every detection should have a clearly defined goal. Once work commences on a detection, the goal is converted into a detection strategy. We use Palantir’s Alerting and Detection Strategy (ADS) Framework for this, which provides a robust, uniform documentation framework for all of our detections. Since Threat Detection and Incident Response (IR) are separate teams at Snowflake, we found it critical during this phase to share and request reviews from the IR team, as the recipient of the alerts. This has helped enable IR ownership and awareness of built detections and their ability to influence the detection pipeline. IR involvement during the ADS design also provides their team time to review documentation and build playbooks for triage.

Development

After a new detection’s design has been completed, it is converted into code. Standardization is vital to maintaining quality and effectiveness, which is why we have built a template ensuring every detection has a set of common fields and a link to the ADS to clearly define the goal of the detection in the code. At Snowflake, we use Panther as our detection platform sitting atop our security data lake, which enables us to build both scheduled queries and stream-based detections. One of the most important aspects when developing detections is tagging with metadata such as the MITRE tactic, technique, and sub-technique, system/platform, data source, and audit finding. Tagging has not only helped us during audits and other detection inquiries, but it is also foundational for measuring detection coverage.

Testing and Deployment

Once the code for a detection is developed, it is tested for accuracy, precision, and alert volume. At Snowflake, we conduct historical testing and real-time testing. Historical testing involves running the detection against past data. Real-time testing consists of enabling the detection in a test queue to ensure it meets the acceptance criteria. The criteria is established between the TD and IR teams for what is considered an acceptable alert volume and false positive rate. If the logs that are needed to validate the detection are not present, a tool like Red Canary’s Atomic Red Team or the Red Team can help in testing. After testing is completed, detections are peer reviewed and managed in a version control system. This is crucial to ensuring that Detection as Code principles are followed.

After a detection is deployed, it needs to be maintained. As with any code in production, there are always updates that need to be made in order to address bugs or other issues found. The Threat Detection Team needs to partner closely with all relevant stakeholders to ensure there is a strong feedback loop in place.

Monitoring

The purpose of this phase is to continuously monitor the performance of deployed detections, review assumptions and gaps, and decommission if needed.

Monitoring is supported by the following core processes:

  • Detection Improvement Requests (DIR): Any code can have bugs and there needs to be a process to identify, track and resolve them. At Snowflake, we call these Detection Improvement Requests and they are submitted to resolve bugs, tune false positives, enrich detections, update the ADS, or even rebuild the detection logic. Additionally, collecting metrics on the DIRs allows us to gain insight about the quality and performance of our detections. For example, recurring DIRs may indicate a low fidelity detection that should be flagged for review.
  • Detection Decommission Requests: For Threat Detection Engineers, disabling a detection can feel instinctively wrong — no one wants to be responsible for disabling the detection that in retrospect could have detected the successful attack or adversary. However, from the SOC or IR perspective, alert fatigue carries its own risks and sometimes disabling a noisy detection is the right decision. For checks and balances, we established a Temporary and Permanent Detection Decommission process that includes documenting each request with the justification and supporting information. Building a robust process here can empower the team to make the right call on disabling detections.
  • Detection Reviews: Deployed detections are regularly reviewed to ensure that they remain applicable to the organization. This is a crucial process in the Monitoring Phase because it ensures there is no accumulation of noisy or irrelevant detections as the Threat Detection Team matures and the number of detections built increases. Another benefit of this process is that it creates learning opportunities. Engineers read and review other team members’ code and have the ability to gain knowledge about a system or platform they may have never had the opportunity to work with before. We established an annual review process to evaluate various components of a detection such as the goal, scope, relevance, assumptions, detection logic, etc. The reviews are documented with the output of this process being either no action as the detection needs no updates, a Detection Improvement Request (DIR) or a Detection Decommission Request.

Continuous Testing

Continuous testing of detections is how mature threat detection teams ensure that each detection is accomplishing its intended goal. There are many tools out there to help with this, but we partner with our Red Team and conduct Purple Teaming exercises to help us test. Conducting Purple Team exercises not only helps foster a strong relationship between Red and Blue Teams, but also serves as a learning opportunity. The output of Continuous Testing can be no action if the detection alerts appropriately, a Detection Improvement Request, or a request for a new detection.

Benefits and Adoption

There are many benefits to establishing a Detection Development Lifecycle for the Threat Detection Team and we want to highlight a few:

  • Quality. Having and following a robust Detection Development Lifecycle will lead to building high quality detections.
  • Metrics. Collecting metrics such as detection coverage and quality metrics, as well as defining key performance indicators, will help identify areas of improvement and drive program planning.
  • Documentation. “Code is the documentation” does not work because non-technical individuals are also consumers of internal processes. Quickly whipping together documentation around how the team went about building a detection or why a detection was disabled is painful and not scalable. Having sound documentation will allow the SOC or IR Team to better understand and triage the detection, reduce the time a Detection Engineer needs to review, understand and update a detection, and provide support for compliance and audit requests.
  • Scale. This is one of the first documents that a new Detection Engineer should read when joining the team. After reviewing, they should have a thorough understanding of the Threat Detection process end to end. This also keeps the team accountable and ensures processes are repeatable.

The Detection Development Lifecycle discussed in this article reflects the Threat Detection Team’s implementation at Snowflake. Every organization is unique and implementations will vary. Even within our organization, the Detection Development Lifecycle is a living document that we regularly update and improve as we face changes and make improvements. Creating a robust lifecycle is not free of challenges as it will require the team to forge strong partnerships with stakeholders, establish roles and responsibilities, and have resources to support these processes. We’d love to connect with our peers to exchange ideas and learn how others are solving similar challenges so please feel free to reach out to Haider Dost, Tammy Truong, and Michele Freschi.

--

--