Alerting and Detection Strategy Framework

Alerting and Detection Strategies (ADS) are one of the critical requirements for an effective Incident Response (IR) and Detection team. Well-functioning IR teams invest time and energy into development of new hypotheses for alerting. When done well, a new ADS can provide very strong signal into anomalous and potentially malicious activity. When done poorly, an IR team can flood themselves with low-signal alerts, create fatigue, morale issues, and ultimately generate a net-negative impact for their mission of detecting and eradicating evil.

This blog post details how Palantir’s IR team develops ADS, the process framework we use, and the pitfalls we have experienced in early alert development. Additionally, we are releasing example ADS on our GitHub repository. The GitHub project provides the necessary building blocks for adopting this framework for organizations looking to improve the efficacy of their detection strategies, and we hope that sharing a subset of our internal ADS can help inspire discussion and sharing of alerts more broadly.

The Case for a Framework

An ADS framework is a set of documentation templates, processes, and conventions concerning the design, implementation, and roll-out of ADS. Prior to agreeing on such a framework, we faced major challenges with the implementation of alerting strategies: The lack of rigor, documentation, peer-review, and an overall quality bar allowed the deployment of low-quality alerts to production systems. This frequently led to alerting apathy or additional engineering time and resources to fix and update alerts.
 
Some potential issues experienced by ad-hoc or immature alert development include:

The alert does not have sufficient documentation.

  • The on-call engineer may not be familiar with the software or target OS.
  • There may be a lack of clear guidance on what could constitute a false positive.
  • There may be no response plan for a true positive hit.

The alert is not validated for durability.

  • Alerts may be based on narrow criteria, which creates false negatives.
  • Alerts may be based on overly broad criteria, which creates false positives.
  • Alerts may rely on incorrect assumptions, inconsistent data sources, or techniques.

The alert is not reviewed prior to production.

  • Change control may not be required.
  • Priorities of alerting may be inconsistent or ill-defined.
  • Alerts may not have true-positive testing as part of the implementation process.

ADS Framework

Our ADS framework was designed to mitigate the above-mentioned issues. It helps us frame hypothesis generation, testing, and management of new ADS. The framework has the following sections:

  • Goal
  • Categorization
  • Strategy Abstract
  • Technical Context
  • Blind Spots and Assumptions
  • False Positives
  • Validation
  • Priority
  • Response
  • Additional Resources

Each section must be completed and reviewed prior to deploying a new ADS. This guarantees that any given alert has sufficient documentation, is validated for durability, and is reviewed prior to production deployment. All of our production or draft alerts are documented according to this template, and the documentation is stored in a durable, version-controlled, and centralized location (e.g., Wiki, GitHub entry, etc.).

Subsequent changes to the ADS, such as modifying a whitelist, altering the strategy, or noting additional blind spots, should be incorporated into the ADS documentation at the time of modification. Internally, we use a model similar to GitHub where any feature request or bug is opened as an issue for the respective ADS. One of the on-call engineers will triage the respective bug or feature request, make the change, and then update the ADS documentation before closing the issue out. This process ensures that ADS documentation stays current while leaving an audit trail of changes.
 
Let us now look at the sections in more detail.

Goal gives a short, plaintext description of the type of behavior the ADS is supposed to detect.
 
Categorization provides a mapping of the ADS to the relevant entry in the MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) Framework. ATT&CK provides a language for various post-exploitation techniques and strategies that adversaries might use. 
 
Mapping to the ATT&CK framework allows for further investigation into the technique, provides a reference to the areas of the killchain where the ADS will be used, and can further drive insight and metrics into alerting gaps. In our environment, we have a knowledge base which maps all of our ADS to individual components of the MITRE ATT&CK framework. When generating a hypothesis for a new alert, an engineer can simply review where we are strongest — or weakest — according to individual ATT&CK techniques. 
 
Strategy Abstract is a high-level walkthrough of how the ADS functions. This describes what the alert is looking for, what technical data sources are used, any enrichment that occurs, and any false positive minimization steps. 
 
Technical Context provides detailed information and background needed for a responder to understand all components of the alert. This should appropriately link to any platform or tooling knowledge and should include information about the direct aspects of the alert. The goal of the Technical Context section is to provide a self-contained reference for a responder to make a judgement call on any potential alert, even if they do not have direct subject matter expertise on the ADS itself. 
 
Blind Spots and Assumptions are the recognized issues, assumptions, and areas where an ADS may not fire. No ADS is perfect and identifying assumptions and blind spots can help other engineers understand how an ADS may fail to fire or be defeated by an adversary.
 
False Positives are the known instances of an ADS misfiring due to a misconfiguration, idiosyncrasy in the environment, or other non-malicious scenario. These false positive alerts should be suppressed within the Security Information and Event Management tool (SIEM) to prevent alert generation when a known false positive event occurs.
 
Validation lists the steps required to generate a representative true positive event which triggers this alert. This is similar to a unit test and describes how an engineer can cause the ADS to fire. This can be a walkthrough of steps used to generate an alert, a script to trigger the ADS (such as Red Canary’s Atomic Red Team Tests), or a scenario used in an alert testing and orchestration platform. 
 
Priority describes the various alerting levels that an ADS may be tagged with. While the alert itself should reflect the priority when it is fired through configuration in your SIEM (e.g., High, Medium, Low), this section details the criteria for the specific priorities. 
 
Responses are the general response steps in the event that this alert fired. These steps instruct the next responder on the process of triaging and investigating an alert.
 
Additional Resources are any other internal, external, or technical references that may be useful for understanding the ADS.

Case Study: Alerting for Python Empyre

Now that we have walked through the ADS Framework, we will describe how an IR engineer builds and deploys a production alert. In this example, we’ll look at development of an ADS focused on identifying Python Empyre, a MacOS implant, on compromised workstations. In this scenario, the following steps occur:

  1. A catalyst (e.g., Tweet, article, observed actor activity) inspires an alert idea for an IR engineer. In this instance, the engineer reads an article about Python Empyre and, since we have a large fleet of Macs, decides this is a detection strategy worth pursuing.
  2. The IR engineer starts development on the alert. They download Python Empyre in an analysis environment, and look at their telemetry sources. These sources include host-based logs and tools, network logs, and other forensic artifacts.
  3. The IR engineer discovers an interesting pattern for Python Empyre. There is a special check that Python Empyre performs to identify if Little Snitch, a MacOS application firewall, is running on the host. In a default state, presence of Little Snitch will cause Python Empyre to exit.
  4. The IR engineer decides to write an ADS based around the detection of the Little Snitch check. They believe that this check will be good enough to catch unmodified Python Empyre, as it’s a default function, and likely will not have significant false positives.
  5. The IR engineer spins up a new ADS record in their alerting GitHub repository.

The IR engineer looks at process execution events in the analysis environment, and identifies the specific commands executed by Python Empyre which looks for Little Snitch:

/bin/sh -c ps -ef | grep Little\ Snitch | grep -v grep

Digging further into the source code for Python Empyre reveals the explicit code for this check:

try:
if safeChecks.lower() == 'true':
launcherBase += "import re, subprocess;"
launcherBase += "cmd = \"ps -ef | grep Little\ Snitch | grep -v grep\"\n"
launcherBase += "ps = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)\n"
launcherBase += "out = ps.stdout.read()\n"
launcherBase += "ps.stdout.close()\n"
launcherBase += "if re.search(\"Little Snitch\", out):\n"
launcherBase += " sys.exit()\n"
except Exception as e:
p = "[!] Error setting LittleSnitch in stager: " + str(e)
print helpers.color(p, color='red')

The IR engineer decides to refine their alert focusing more broadly on queries for Little Snitch invoked by the grep binary:

grep Little\ Snitch

The IR engineer annotates this behavior and performs a historical search through their SIEM for any exact matches for this artifact. Looking through the last 90 days of process execution activity only surfaces process execution hits for their testing activity, and some hits for systems management or updater software. This indicates that the alert is indicative of high-fidelity in that there do not appear to be any legitimate invocations of the same command in the environment.
 
The IR engineer annotates the whitelist entries and drafts an alert for their SIEM (e.g., a scheduled query, a transform, etc.) to look specifically for the command invocation. The ADS template is filled out with a first draft of each section. 
 
At this juncture, the IR engineer is at an ADS review checkpoint. The reviewer performs an inspection of the alert and validates that the ADS is functional from source to SIEM, to include true positive testing. The ADS is given a rule ID (RID) used for internal tracking and the IR engineer starts the move to production deployment. While generating more work than simply writing an alert, this provides several benefits for the quality of the ADS:

  • The ADS must be fully complete which guarantees a baseline of documentation.
  • Localized testing, research, false positive minimization, and true positive validation must occur to complete all sections of the ADS. This has the side-effect of also populating your security tools with alert-relevant data for analysis.
  • The ADS must be reviewed by another engineer, which can identify areas of improvement, enhanced strategies, and methods for reducing false positives.

The completed ADS for the Little Snitch case study is as follows: 
 
Goal: Detect attempts by potentially malicious software to discover the presence of Little Snitch on a host by looking for process and command line artifacts.
 
Categorization: These attempts are categorized as Discovery / Security Software Discovery.
 
Strategy Abstract: The strategy will function as follows:

  • Record process and process command line information for MacOS hosts using endpoint detection tooling.
  • Look for any explicit process or command line references to Little Snitch.
  • Suppress known-good processes and command line arguments
    - Little Snitch Updater
    - Little Snitch Installer
    - Health checks for Little Snitch
  • Fire alert on any other process or command line activity.

Technical Context: Little Snitch is an application firewall for MacOS that allows users to generate rulesets around how applications can communicate on the network. 
 
In the most paranoid mode, Little Snitch will launch a pop-up notifying the user that an application has deviated from a ruleset. For instance, the following events could trip an interactive alert:

  • A new process is observed attempting to communicate on the network.
  • A process is communicating with a new IP address or port which differs from a ruleset.

Due to the intrusive nature of Little Snitch popups, several MacOS implants will perform explicit checks for processes, kexts, and other components. This usually manifests through explicit calls to the process (ps) or directory (dir) commands with sub-filtering for Little Snitch.
 
For instance, an implant could look for the following components:

  • Running Little Snitch processes
  • Little Snitch Kexts
  • Little Snitch Plists
  • Little Snitch Rules

Blind Spots and Assumptions: This strategy relies on the following assumptions:

  • Endpoint detection tooling is running and functioning correctly on the system.
  • Process execution events are being recorded.
  • Logs from endpoint detection tooling are reported to the server.
  • Endpoint detection tooling is correctly forwarding logs to SIEM.
  • SIEM is successfully indexing endpoint detection tooling logs.
  • Attacker toolkits will perform searches to identify if Little Snitch is installed or running.

A blind spot will occur if any of the assumptions are violated. For instance, the following would not trip the alert:

  • Endpoint detection tooling is tampered with or disabled.
  • The attacker implant does not perform searches for Little Snitch in a manner that generates a child process.
  • Obfuscation occurs in the search for Little Snitch which defeats our regex.

False Positives: There are several instances where false positives for this ADS could occur:

  • Users explicitly performing interrogation of the Little Snitch installation, e.g., grepping for a process, searching for files.
  • Little Snitch performing an update, installation, or uninstallation, e.g., we miss whitelisting a known-good process.
  • Management tools performing actions on Little Snitch, e.g., we miss whitelisting a known-good process.

Known false positives include:

  • Little Snitch Software Updater

Most false positives can be attributed to scripts or user behavior looking at the current state of Little Snitch. These are either trusted binaries (e.g., our management tools) or are definitively benign user behavior (e.g., the processes performing interrogation are child processes of a user shell process).
 
Priority: The priority is set to medium under all conditions.
 
Validation: Validation can occur for this ADS by performing the following execution on a MacOS host:

/bin/sh -c ps -ef | grep Little\\ Snitch | grep -v grep

Response: In the event that this alert fires, the following response procedures are recommended:

  • Look at management tooling to identify if Little Snitch is installed on the host.
    - If Little Snitch is not installed on the Host, this may be more suspicious.
  • Look at the process that triggered this alert. Walk the process chain.
    - What process triggered this alert?
    - What was the user the process ran as?
    - What was the parent process?
    - Are there any unusual discrepancies in this chain?
  • Look at the process that triggered this alert. Inspect the binary.
    - Is this a shell process?
    - Is the process digitally signed?
    - Is the parent process digitally signed?
    - How prevalent is this binary?
  • Does this appear to be user-generated in nature?
    - Is this running in a long-running shell?
    - Are there other indicators this was manually typed by a user?
    - If the activity may have been user-generated, reach out to the user via our chat client and ask them to clarify their behavior.
  • If the user is unaware of this behavior, escalate to a security incident.
  • If the process behavior seems unusual, or if Little Snitch is not installed, escalate to a security incident.

This ADS record may also be found on our public GitHub repository.

More ADS Examples

The ADS Framework can be found in a repository on our public GitHub repository, and we have also included several internal ADS that we have developed as examples for reference.
 
We encourage other organizations to develop their own unique ADS and find a mechanism to share them with the broader InfoSec community. While there are operational security considerations around publicly acknowledging and documenting internal alerts, we hope these examples spur greater sharing and collaboration, inspire detection enhancements for other defenders, and ultimately increase the operational cost for attackers.

Further Reading and Acknowledgements

We would like to extend thanks to the following for their contributions to the InfoSec community, or for assisting in the development of the ADS Framework:


Authors and Contributors
Art W., Chris L., Dane S., Joshua B., Richard S.