Blueprint for Threat Intel to Detection Flow (Part 7)

Anton Chuvakin
Anton on Security
Published in
7 min readFeb 13, 2024

--

This blog series was written jointly with Amine Besson, Principal Cyber Engineer, Behemoth CyberDefence and one more anonymous collaborator.

In this blog (#7 in the series), we will cover more details on the TI to detectin flow, and stop (for Part 8) at testing.

So, OK, what do we do next?

A blueprint to breakdown intelligence into detections

It’s all well and good to know what we need and want, what we don’t want, how we should set up operations, but what exactly must be done now in my organization?

Detection Engineering (DE) today is often an ad hoc craft (despite “engineering” in the name…) that only a few places do predictably and consistently well. So let’s introduce a few more precise steps of the workflow — the exact specifics should fall in place depending on the details of the processes adopted.

Read cover-to-cover

Obviously (right?), a threat intel analyst would carefully read any incoming piece of information header to footer, front to back… right? Fact of the matter is: time is short, data is complicated and shortcuts are taken. This hurts properly parsing out relevant pieces of new relevant intel that should go further down the pipeline — focus on a few relevant related news article, on a single report of small series of report, and dedicate at least half a day to fully integrate all the new input before calling shots of what should be pushed further down the process. When not understanding some aspect of the threat, the attack path or the TTP, the only way to solve the problem is to do additional research before continuing the intel review to ensure no context is lost. Ideally (and this is a bit pushing it, since it is not the TI job), think about any key details that may make the detection engineer job easier…

This seems obvious for anybody doing deep analysis work, yet it is often the part most weak and hurtful to the downstream detection creation work. You may have seen it already when faced with a war room full of eager colleagues that all disagree on what to do in face of a new threat, which can often be traced back to different levels of information ingested (and interpreted).

Parse out blocks of related threat info

Now that intel is correctly considered in its entirety, the parsing work begins. Identify blocks of data within the intel which describe a threat. You should be able to pull out individual, sometimes related objects such as “Compromise X component using Y feature”, “Exfiltrate data over Z method”, “Connect to their infrastructure via A or B methods” and so on. Extract information like which platforms are involved, what the threat actor would need before being able to perform the TTP etc. If this happens in the cloud, add more information on types of accounts and such.

Ensure that information is enriched with research that was done alongside reading incoming intel. The threat description must suffice on its own, or be able to redirect to other existing threat objects. The idea is to avoid the downstream detection engineer to play a threat analyst on TV, instead they can focus on building a resilient and effective detection content.

Also, ensure that data is deduplicated, so if an existing threat description was created, new intel should only update it (reinforcing the searchability needs for any strong TI to DE system)

Next, re/de-prioritize in quick passes: first by domain (if there is a strong cloud focus, and the organization is known to have insufficient cloud security, priority goes up), then by platform (If the product unto which the threat relies is not used, priority go down). Finally if detection coverage metrics are available, roughly evaluate whether this threat is relevant, particularly relevant, or likely already covered. Frameworks like ATT&CK come in handy for rapidly filtering threat objects.

Backlog new knowledge items for DE

To enter the DE lifecycle, the threat must be entered in some form of ticketing, issue, project management… etc. system. What system matters less compared to the fact that there is in fact a system. The threat must be described alongside any accompanying metadata, label etc. and be picked up by the DE team based on capacity and prioritization.

Threats may be rejected (well, deprioritized, pushed to the end of the queue) on the basis that detection is known to be impossible, or if proven mitigations are already in place. Naturally, “detection is impossible” argument should be wielded very, very, very sparingly. There is almost always a way, but yes occasionally it may call for a new log type being collected … or, in fact, created. This is the case where TI and DE can (attempt to) jointly influence the infrastructure teams or other security teams.

Identify Invariable Behaviors and Data Sources

Next, the DE job will be to read the threat description and understand which behaviors (or combination of behaviors) would be seen as part of the detectable threat scenario (particular network protocols used, user actions, heightened system operations, distinct command lines etc.). This would often result in a second item of documentation, a detection specification. Sometimes, related or adjacent threats can be addressed by a single detection specification, or more complex mappings may be done. Sometimes the specification will also include discerning similar yet legitimate activities to reduce the false positives later.

Close collaboration with the red team (purple teaming, if you must) can help to construct a data set which can be further analyzed, when relying on available public information is insufficient. It would also help — a lot — at the next stage where detections are tested. Sometimes, data sources are simply not available and must be acquired (rarely: built), or the particular log source configuration must be changed to acquire relevant data before working on detections. Such improvements are often lengthy processes with much stakeholder engagement which slow down the DE process considerably. This is where the team will need to influence things, not just engineer them.

Build Detections

At this point, the detection team will have in hand a detection specification, mapped to well documented threat(s) properties. Building the detection (or detections) that may all be required to implement the spec is at this point a matter of a few hours, revolving around identifying the data sources and fields within the SIEM or other telemetry database, some data transformation when field extraction is required and configuring the alert parameters — especially lookback period, and the frequency at which the detection query is ran.

Detection as-code systems help at this stage to correctly track adjustments to the detection configuration and keep work close to project management. When anomaly detection or statistical analysis is also required to reach the specification, the development time typically goes up as the DE will need to explore a lot more noise filtering and triage support (yes, ML-based and algorithmic detection are harder to build and harder to test)

Production Readiness

Before rolling a detection to production, the DE should ensure that the rate of False Positives is acceptable, and reduced to the maximum. The detection query should be run against a large timespan to baseline the expected alert rate. For every hit, the DE should investigate why it triggered, and validate whether it is a true positive, and if not how to eliminate the false positive. When not possible to further reduce false hits, the response playbook should inform clearly of edge cases to facilitate investigation and further qualify alerts into incidents.

In many SecOps environments, a playbook must be attached to the alert created by the detection. Generally by starting with a generic investigation playbook (Endpoint, Cloud, SaaS, “Big Enterprise Application”, etc.) before moving to more threat specific playbooks (Ransomware, Lateral Movement, Data Theft). For the DE, this means either selecting with the incident response team an existing playbook, or supporting the creation of a new one more suited for the new detection.

Lifecycle management can vary a lot per environment, and an acceptance stage with further stakeholder approvals may be added, or a detection validation must be performed before accepting the detection in the production library.

In our next blog post we’ll go even deeper into detection testing…

UPDATE: the story continues “Testing in Detection Engineering (Part 8)”

Previous blog posts of this series:

--

--