From soup to nuts: Building a Detection-as-Code pipeline

Part 2 of 2

David French
threatpunter
11 min readJul 27, 2023

--

Image generated by Bing Image Creator

Welcome to part two of this blog series where I’m sharing an end-to-end process for building and implementing a Detection-as-Code pipeline.

Part one provides an introduction to the principles and benefits of Detection-as-Code (DAC) and a process for configuring the tools I used to build and implement a DAC pipeline. I also walked through a detection engineering workflow using Terraform and Sumo Logic.

In this post, I’m going to include detailed explanations and example code for creating CI/CD workflows to test our DAC pipeline and deploying changes to dev & production. I’ll also cover how to handle alert payloads with Tines, test detections, and validate the alert pipeline. Finally, I’ll demonstrate how the DAC pipeline works end-to-end with a practical use case — detecting & responding to suspicious Okta behavior.

Let’s go!

Creating a CI/CD workflow with GitHub Actions

At this point, we’re comfortable with creating detections as code using the Terraform provider for Sumo Logic and validating & applying changes. The example Terraform code from part 1 has been pushed to the GitHub repo and applied in Sumo Logic.

The next logical step is to configure a CI/CD workflow where our code is tested and validated whenever a change occurs in our GitHub repo. Every security team creates tests so they can say with confidence that their logging, detection, and alerting stuff is always working, yes? Good 👍.

I’m going to configure these workflows using GitHub Actions. In production, the main branch in the repo will be protected meaning that pull requests must be reviewed & approved by members of the security team and tests must complete successfully before any changes are merged. This encourages collaboration amongst the team and minimizes the risk of things like a broken rule being pushed to production leading to false negatives.

The first GitHub Actions workflow (Test & Deploy in Dev) can be found in my example repo under folder, .github/workflows. Sorry, I can’t link to the workflow files directly, as Medium thinks I’m trying to attack it for some reason.

The Test & Deploy in Dev workflow executes the following steps when any code changes are made in pull requests and will stop and log an error if any of these steps fail.

  1. Validate the Terraform configuration and check it for formatting issues.
  2. Apply the Terraform configuration to the Sumo Logic dev environment.
  3. Execute tests, which I am calling “triggers” in this project to trigger our detection rules.
  4. Sleep for 10 minutes 😴.
  5. Validate that alerts were generated by the detection rules that were triggered.

I’ll explain steps 3 and 5 in more detail in an upcoming section.

Again, if any of the above steps fail, the workflow will stop and generate an error. An engineer/analyst won’t be able to merge changes to the main branch of the GitHub repo unless the tests complete successfully. 🙂

When the workflow runs and completes successfully, it’ll look something like the screenshot below.

Click to expand image. Validating and testing the Detection-as-Code pipeline using GitHub Actions — Success!

When the workflow fails due to something like a Terraform syntax error, it’ll look something like the screenshot below. For this scenario, you could actually use a pre-commit hook to validate your Terraform configuration before it even gets committed to GitHub, but that’s a topic for another day.

Validating and testing the Detection-as-Code pipeline using GitHub Actions — Error 😢

Handling alert payloads with Tines

Before I explain my methodology for testing the detection rules and validating the DAC pipeline, let’s talk about how Tines can be used to handle alert payloads sent from Sumo Logic (or whatever SIEM you use).

Tines stories are a collection of interconnected actions working towards a singular mission. They can be viewed as analogous to use-cases or playbooks. For example, you might have a Phishing story, an Infected endpoint story or a DDoS story. To support sharing, Stories can be imported and exported.

As a reminder, you can sign up for the free community edition of Tines here. I find it intuitive to use and easy to automate workflows without writing any code. They also have a story library to use as a starting point for creating custom playbooks or inspiration.

My first Tines story in the screenshot below handles alert payloads received from Sumo Logic and does the following.

  1. Receive alert payloads (JSON strings) sent from Sumo Logic to the Tines webhook. This is the entry point for the story. Remember the webhook connection that we configured in Sumo Logic via Terraform in part 1?
  2. Parse the JSON string.
  3. Explode the array of alerts into individual alert objects.
  4. Create a GitHub issue for each alert.

I’ll talk about the remaining actions in the story later.

My story is based on this one from the Tines story library and can be found here.

When our Okta detection rule is triggered and Sumo Logic sends an alert payload to this Tines story, here’s what the newly created GitHub issue looks like. Pretty! 😍

GitHub issue created by Tines for Sumo Logic alert

I recommend watching the following presentation by Chris Long for a deeper dive on using GitHub for case management: The Future Of Security Operations Roadshow: Using Github For Automated Case Management.

Testing detections and validating the alert pipeline

It’s crucial for a security team to be able to know that their logging, detection, and alerting capabilities are working as expected. Unfortunately, the following scenario is still commonplace for many Blue Teams:

  1. An analyst/detection engineer creates a detection rule in the security team’s SIEM, EDR, or other solution.
  2. The detection works today — It matches on the relevant security events and alerts on a behavior as it’s logged in the relevant system(s) 🙌.
  3. A disruption occurs in logging or a field name for the vendor’s logging schema changes unexpectedly and the detection is broken.
  4. Days, weeks, or months pass by without the security team being alerted to the failure or “drift” in their environment.
  5. Sometimes the Blue Team discovers the broken detection after attacker or Red Team activity is missed somewhere in the attack lifecycle. Sad times ☹️.

For the purposes of triggering our detection rules and validating our alert pipeline, the GitHub Actions workflow I mentioned earlier (.github/workflows/test_and_deploy_in_dev.yml) runs some Python code that I wrote to do the following:

  1. Trigger the detection rules that were created earlier via Terraform.
  2. Verify that alerts (GitHub issues) were generated by the alerts that were triggered.

The code to trigger our detection rule can be executed using a simple command-line interface and can be found here. The code to create, update, and delete the Okta user for testing purposes can be found here and here.

Again, if the GitHub Actions workflow fails at any step, an error will be raised. These tests can be run on a regular basis to alert the security team to any failures early on so they can fix problems and be confident that their monitoring & detection capabilities are working. Cool.

The output below is logged when I run the code to trigger all detection rules: python -m detections_cli --run-all-triggers.

A number of actions are taken in the Okta organization to trigger the detection rule: Create a new user, assign admin permissions to the user, and finally deactivate & delete the user.

python -m detections_cli --run-all-triggers
22-Jul-23 22:39:42 UTC | INFO | <module> | detections_cli started
22-Jul-23 22:39:42 UTC | INFO | <module> | Running all rule triggers
22-Jul-23 22:39:42 UTC | INFO | main | Executing trigger 'Assign Admin Role to Okta User' (Trigger ID: a17971ab-3980-4936-92e0-d65d9f448204)
22-Jul-23 22:39:42 UTC | INFO | create_user | Attempting to create new Okta user test.alice.smith@threatpunter.com
22-Jul-23 22:39:43 UTC | INFO | create_user | Created new Okta user test.alice.smith@threatpunter.com (ID: 00uaivca4j4SROqoj5d7)
22-Jul-23 22:39:43 UTC | INFO | assign_admin_role | Attempting to assign admin role 'READ_ONLY_ADMIN' to Okta user ID 00uaivca4j4SROqoj5d7
22-Jul-23 22:39:44 UTC | INFO | assign_admin_role | Assigned admin role 'READ_ONLY_ADMIN' to Okta user ID 00uaivca4j4SROqoj5d7
22-Jul-23 22:39:44 UTC | INFO | deactivate_user | Attempting to deactivate Okta user ID 00uaivca4j4SROqoj5d7
22-Jul-23 22:39:44 UTC | INFO | deactivate_user | Deactivated Okta user ID 00uaivca4j4SROqoj5d7
22-Jul-23 22:39:44 UTC | INFO | delete_user | Attempting to delete Okta user ID 00uaivca4j4SROqoj5d7
22-Jul-23 22:39:45 UTC | INFO | delete_user | Deleted Okta user ID 00uaivca4j4SROqoj5d7
22-Jul-23 22:39:45 UTC | INFO | main | Ending trigger 'Assign Admin Role to Okta User' (Trigger ID: a17971ab-3980-4936-92e0-d65d9f448204)

Below is the output of the command used to validate that alerts were generated by each trigger: python -m detections_cli --validate-alerts. The code searches for GitHub issues that match the detection rule name and indicators that were used to trigger the detection rule. When matching issues are found, they’re labeled and closed.

The code I wrote to validate the alert pipeline can be found here.

python -m detections_cli --validate-alerts
22-Jul-23 22:50:02 UTC | INFO | <module> | detections_cli started
22-Jul-23 22:50:02 UTC | INFO | <module> | Validating alerts created by rule triggers
22-Jul-23 22:50:02 UTC | INFO | search_issues | Searching for GitHub issues at https://api.github.com/repos/threat-punter/detection-as-code with query params {'state': 'open', 'per_page': 100}
22-Jul-23 22:50:02 UTC | INFO | search_issues | Found 3 GitHub issues at https://api.github.com/repos/threat-punter/detection-as-code/issues with query params {'state': 'open', 'per_page': 100}
22-Jul-23 22:50:02 UTC | INFO | validate_alerts | Checking for alerts created by trigger 'Assign Admin Role to Okta User'
22-Jul-23 22:50:02 UTC | INFO | validate_alerts | Checking for alerts created for rule 'Administrator Role Assigned to Non-Admin User Account' (Rule ID: 97d6c856-93e8-40e3-9af7-f797a5c1435b)
22-Jul-23 22:50:02 UTC | INFO | check_for_matching_alerts | Found alert for rule 'Administrator Role Assigned to Non-Admin User Account' (Rule ID: 97d6c856-93e8-40e3-9af7-f797a5c1435b)
22-Jul-23 22:50:02 UTC | INFO | check_for_matching_alerts | Found test indicators in alert for rule 'Administrator Role Assigned to Non-Admin User Account' (Rule ID: 97d6c856-93e8-40e3-9af7-f797a5c1435b). GitHub issue: https://github.com/threat-punter/detection-as-code/issues/114
22-Jul-23 22:50:02 UTC | INFO | validate_alerts | Found 1 matching alerts for trigger 'Assign Admin Role to Okta User' and indicators for rule 'Administrator Role Assigned to Non-Admin User Account' (Rule ID: 97d6c856-93e8-40e3-9af7-f797a5c1435b)
22-Jul-23 22:50:02 UTC | INFO | update_issue | Updating GitHub issue https://api.github.com/repos/threat-punter/detection-as-code/issues/114 with params {'state': 'closed', 'labels': ['test', 'alert', 'sumo_logic', 'okta']}

Below is what it looks like when one of the above steps fails and an error occurs. Note the highlighted exception. The security team should investigate the cause of the error and fix it. 🦸🏻‍♀️

Detection rule alert validation failure in GitHub Actions

Deploying changes to production

Once the Test & Deploy in Dev GitHub Actions workflow completes successfully and the pull request is reviewed & approved, it’s time to merge those changes into the protected main branch in the DAC repo.

For the purposes of this project, I created a Deploy to Prod GitHub Actions workflow that runs when code changes are merged to main. It does what it says on the tin. You can find the file under .github/workflows/deploy_to_prod.yml in my example repo.

Putting it all together: Detecting and responding to suspicious Okta behavior

Let’s recap everything that we’ve built so far before testing our DAC pipeline end-to-end including having some more fun with Tines to execute response actions. We’ve covered the following steps so far.

  • Configured Sumo Logic to ingest Okta system logs.
  • Utilized the Terraform provider for Sumo Logic to create detections as code and a webhook connection to send alert payloads to another application (Tines).
  • Built a Tines story to process alert payloads received from Sumo Logic and create GitHub issues for new alerts.
  • Created a GitHub repo for the security team to develop, manage, and collaborate on detections as code with version control, review, and approval processes.
  • Implemented a CI/CD pipeline to validate Terraform configuration, apply it to a dev environment, trigger detections, and validate that alerts were generated.

Let’s say that the security team receives an alert for detection rule, Administrator Role Assigned to Non-Admin User Account and determines that the activity is indeed malicious.

There’s a Suspend Okta User Account link in the GitHub issue for this alert. When clicked, that sends a request back to the Tines story I mentioned earlier. This is accomplished using the PROMPT function in Tines.

Suspending the Okta user account from the alert (GitHub issue)

Once the link in the GitHub issue is clicked, another Tines story is triggered, Suspend Okta User Account with MFA Approval. I’ve included a screenshot of this story below.

This story interacts with Okta and GitHub’s APIs to carry out the following actions:

  • Sends an Okta push factor challenge to a member of the security team to approve/reject the action to suspend the Okta user account.
  • Check if the Okta push factor challenge was approved or rejected.
  • Retrieve the target user account object from Okta, i.e. the user account to be suspended.
  • Suspend the Okta user account.

To maintain an audit trail for the security team, the Tines story also adds comments to the GitHub issue, logging the actions taken and the outcomes.

A Tines story that suspends an Okta user account after sending an MFA push factor challenge to the security team

The Tines story pictured below handles the MFA push factor challenge that’s sent to a member of the security team. This is a customized version of this story from the Tines story library. Thanks again to John Tucker for pointing me to this story.

Sending an MFA push factor challenge to an Okta user account

As I mentioned earlier, the Tines story is configured to document the sequence of activities in the GitHub issue. In the following example, the Okta user account was suspended successfully after the MFA push factor challenge was accepted.

Creating an audit trail in a GitHub issue using Tines
Viewing the suspended user account in the Okta admin console

Wrap up

That’s a wrap for this blog series. Thanks for reading — I hope you enjoyed it and if you learned a new thing, that makes me happy.

Here’s what I covered to build a Detection-as-Code pipeline — from soup to nuts!

  • Principles and benefits of Detection-as-Code including references to awesome existing work by members of the security community. You can find the list of references in part 1.
  • Configuring an arsenal of tools to build and implement a Detection-as-Code pipeline from scratch.
  • Detection Engineering as code using Terraform and Sumo Logic with a practical Okta threat detection use case.
  • Creating CI/CD workflows to test the Detection-as-Code pipeline and deploy changes to dev/prod.
  • Writing custom code to test and validate the Detection-as-Code and alerting pipeline.
  • Automating Security Operations workflows using Tines to respond to suspicious Okta behavior.

As a reminder, you can find my example code in the following GitHub repo: threat-punter/detection-as-code-example.

Cheers!

--

--

David French
threatpunter

Detection & Response Engineering • Threat Hunting • Threat Research