Why is Ground Truth so critical for cybersecurity operations?
Security analysts need quick access to the observable ground truth linked to threat activity in order to detect threats earlier in their lifecycle and perform faster, more accurate threat investigations.
What is ground truth?
Ground truth is a term used in various fields to refer to information provided by direct observation (i.e. empirical evidence) as opposed to information provided by inference.
How is ground truth related to cybersecurity?
Most security analytics solutions perform event identification and generate security alerts from analysis of secondary and even tertiary derived information. This means the data used to draw an observation (e.g. correlation, behavioral anomaly, etc) uses inferential data. There is a chance — a good chance — the analysis omitted data. The analysis is flawed since inferential analysis is not deterministic.
In Security Correlation Then and Now: A Sad Truth About SIEM, Anton Chuvakin admits
“The normalized and taxonomized approach in SIEM never actually worked!”
He states several reasons why: impossible to keep up, to test, to develop across vendors consistently, etc.
Using indeterministic analysis models generates a variety of symptoms in both false positives and false negatives. As a result, analysts must perform initial triage of most alerts to determine if the activity captured is even a real threat (threat qualification), and if so then perform lengthy investigations to understand how this threat originated (root cause) and how far it has spread (threat scope). Does it mean we should abandon our security controls if their analytics require impossibly clean and structured data? Certainly not. It does give us a clue to strong impediments we should understand and how they impact our security operations.
How lack of ground truth creates barriers in cybersecurity
There are three unfortunate side-effects without analysis grounded in observable ground truth.
- Impeded threat detection. According to Crowdstrike’s 2020 Cyber Front Lines Report, “Average dwell time grew 10 days to 95 in 2019, up from 85 in 2018.” In 95 days, it is highly unlikely an attacker did not produce evidence of their attack tactics. Why aren’t these recognized as a threat indicator leading to earlier detection and circumvention? This is because without observable ground truth linking detected threat indicators it is impossible to quickly and accurately qualify early threat indicators.
- Impeded threat investigation. Making connections across attack tactics is a manual exercise. Instead of having immediate access to ground truth related to all threat progression activity, the security analyst must weed through system and application logs from around the same time period of the initial suspected attack. They rely on experience and guesswork to identify and determine from the voluminous audit logs, mostly innocuous and unrelated, which capture anything relevant to the investigation. It’s time consuming work with a high potential for mistakes. This allows an analyst to easily miss the details that connect log activity and previous alerts together. Keep in mind, these individual threat activities may be separated by hours, days, or even weeks and are hiding among uninvolved log activities.
- Inability to implement automation. With an extremely high volume of false positives, and an inability to quickly corroborate alerts with ground-truths, analyst teams are unable to immediately trust their alerts. If you can’t trust the alert, you can’t automate actions triggered by the alert. Instead, most organizations only automate gathering additional contextual information to help the analyst. Knowing the external IP’s reputation, the user’s identity, and the host’s services helps to prioritize the investigation, but it does not help to quickly understand the threat’s scope or root cause. Security analysts still perform these most difficult aspects of the threat investigation manually.
If the security industry is going to take a meaningful step towards reducing dwell, investigation, and remediation time, the solution must automate collecting, filtering, and linking ground truth to each alert and present it in a meaningful way to the analyst.