A lot of people seem to think that matching technical threat intelligence (TI) to logs for threat detection is a great idea. Some people also think this is very easy.
But before we go there… Did I just use the phrase “threat intelligence” to mean “threat data feeds”? Yes, I did. Frankly, I am tired of fighting this battle (“No, you dummy, this list of ‘bad’ IP addresses is not really ‘threat intelligence’” — “It sure is” — “Is not!” — “Is!!” — “Is not!!!” …). Just like AI, this battle has been lost. Marketers overran the barricades. Hence I’ve used the term “TI” to also include threat indicator feeds and lists (even though I still cringe a little when I say it). BTW, you can perhaps make intelligence from threat feeds and other data, but the list of “bad” objects isn’t intelligence, but at best an observation. So in this post I’d say “threat data” to avoid insulting the intelligence literati …
So, is matching threat data to logs for detection a great idea? Is this very easy? Let’s think about it together…
Now, I’ve argued here (in 2014) and here (in 2016) that threat data is very useful to have inside your SIEM. Most of my arguments from that ancient time still stand. However, threat data usage is much broader than detection — often it is not even centered on detection. It may focus on confirming the detection signals (alerts) produced by other means or on “threat naming” — attaching threat names to algorithmic alerts.
So, first, let’s travel to some perfect world where …
- Logs and other security telemetry (tricked you, eh? :-)) always arrives on time, and
- Logs are always correctly parsed with matchable fields (IP, host name, URL, hash, etc) correctly labeled, and
- Threat data feeds are accurate, have no false entries, and are timely, and
- Threat data feeds come with useful context about the types of badness, its source, confidence, etc
OK, some of you are already laughing, but please humor me for a minute here. In that world, correlating logs to threat data in near real-time will likely yield useful detections. Such detections may be easier to produce than detecting badness via other methods (dreaded “machine learning”, manually correlating observables, higher-level TTPs, etc). The arguments I make here, BTW, apply less in that perfect world.
However, this is NOT our world. Here on Earth, logs are dirty and threat data feeds are noisy and late. In fact, they are always late because they are observations of past badness, known to at least somebody (human or machine) that cooked the threat data feed in the first place.
Logs are also late sometimes. Occasionally, they are also not parsed right the first time (and only with some products you get that second chance to parse it correctly). On top of this, threat data often lacks any and all context, to the point that it is embarrassing to use the term “intelligence” to describe it.
As a result,the strategy “get random TI feeds, throw them in your SIEM and match to all the logs that have matchable fields” fails spectacularly. In my analyst years, I’ve heard of cases where even SIEM-vendor-provided threat intel feeds killed all the SOC analyst joy due to their low fidelity. In other words, they were barely context-quality (i.e. OK to serve as context to alerts produced via other means) and for sure not alert-quality. Matching such TI to logs only delivers hatred, horror and FAIL (What can I say? I like drama!)
Hence, to do this in real life, there are conditions and complications:
- If you do plan to “wake people up at 3AM” upon a TI match with logs, TI for you is essentially a signature. This means that you need to dramatically boost their fidelity and certainty, and really transition them away from being intelligence to being … well … black/white signatures.
- Better, you feed TI into your detection engineering process and then make signatures out of it. The output of such process will focus on fidelity much more than context.
- If you plan to raise the low-fidelity “alerts” (“here is something to look at when you have time”) and only present them to humans in the morning, or just use TI matches as hunting clues, things are better off. Here you need context more than fidelity. And nobody is woken up at 3AM!
- In both cases, you really need to think about matching to a subset of logs. You know what happened to that poor dude who matched inbound firewall logs to threat feeds? :-)
- Finally, in one of the future posts, I plan to address the value of “retro-matching” i.e. matching today’s intel to old logs. This is very fun! And, honestly, a much better use of threat data considering the bulk of it is gleaned from observations of past badness.
Therefore, “I got logs and I got TI, let’s just toss both in a blender and pray” is generally bad advice. Do not wake people up off a single TI feed match, unless you personally curated this feed to be super-reliable. Otherwise, matching TI to logs products hints, not conclusions.
However, using threat data as a part of the context provided in for a potential alert helps analysts recognize patterns; something humans are inherently good at. Use threat data to help inform decisions, not to make them for you. In cases where there is actual “intelligence” linked to threat data, clearly present that context within an alert to help analysts make informed decisions.
Finally, if you recall my “dire” disclaimer about using TI for detection of “top tier” threat actors: if you rely on TI, you are, generally speaking, too late… After all, elite threat actor infrastructure probably will not show up on any feeds until after you are hit…