Is Detection Engineering just glorified googling?

br4dy5
7 min readOct 18, 2024

--

I tend to learn using analogies. To prove to myself that I’ve grasped a concept, I must find a valid analogy.

The other day, while pondering the process of detection engineering, my mind found a parallel with googling. I reasoned that data indexed and available for searching with Google is very similar to the swaths of data indexed in a SIEM or similar tool, where you can access it with a query.

Where I realized they are insurmountably different is the acceptance of the results returned. Most people tend to google using a single word or short phrase. We are conditioned that this is sufficient because we typically find what we’re looking for within the first 10 results. Mission accomplished.

Our objective when engineering a detection is a bit different. We can’t just send the SOC a pile of events and say, “Look! One of these is probably something bad that you should investigate further!” No, we have to be good enough to return exactly what we are intending to return. To do this, we have to learn the equivalent of Google Dorking. More on this later.

What helps make a strong detection engineer is their knowledge of the tools at their disposal, even if they don’t have to use them all the time. There are plenty of solid detections that only need to look at a few key-value pairs in logs, like parent process names and child process names, because they are so rare to be chained together that any existence is suspicious. But, there are plenty of other use cases that can only be detected effectively by sufficiently refining the resultset using advanced logic.

Learn the language

So, how do you learn these advanced techniques?

For me, I periodically reference the glossary of available commands for the query language I’m using. I cut my teeth in this industry with Splunk, so I frequented this page. If you’re a Splunker, how many of these have you never used? How many of them did you never know existed until now? We tend to remember the techniques and approaches we use regularly, but there are probably other very valuable commands that can accomplish a concept you may be trying for the first time. Before you know it, you may transfer that tool from your shed to your daily-worn toolbelt.

Have a target

The most important part about testing these tools is having the data to validate them. This is why I’m a proponent of unannounced Red Team operations. You find out if the events you think will occur, do. It’s like finally playing the game we are always practicing for. But this is only productive if leadership perceives it as an opportunity to strengthen the defense rather than fault the team for any shortcomings. Making mistakes in a scrimmage is okay if you can clean them up before the season starts. (oops, another analogy 😬)

The next best thing is simulating attacker activity to generate logs using tools like MITRE’s Caldera and PurpleSharp. Other tools like Splunk’s Attack Range are very complementary for performing this type of testing, including their BOTS datasets. The latter may not be a perfect reflection of your logging, but it can still be valuable if translated to meet your needs.

If you’re able to get the logs, that’s when it becomes a simple game of google dorking. You know the thing you’re looking for is in there; how can I make it the first, or better yet, only, result? Try that task with a certain webpage or news article from a certain media outlet. You’ll find yourself increasing the sophistication of your Google-Fu in order to accomplish your goal.

Return the target

Hm, that sounds like fun, actually. Let’s try it out.

Note: I’m not a proficient Google Dorker since it’s never been a need in any of my roles. But, for today it is, so let’s learn.

Remember that page I said I referenced a lot as a detection engineer with Splunk? Welp, here’s an equivalent for google dorking. Let’s acquaint ourselves. If you prefer UI’s, Google’s got you covered. Ever seen this? Let’s put it to the test.

Challenge

Create a Google search to return the Expel blog I wrote about detecting Red Team activity as the only result.

BOOM. Since I knew the “threat” well, I detected it first try:

site:expel.com “brady stouffer”

Refine the scope

Pretty simple, yet powerful.

But, in the real world, this query isn’t very tightly scoped to the intended “activity” we’re trying to detect. It worked, because the only blog I’ve written for Expel’s blog is this one article. What happens if I publish another blog about something completely unrelated? It would produce another result, or “alert”, that would essentially be a False Positive. To prevent that, we could tighten up the query a bit by adding additional logic to ensure we only return blogs relating to Red Team activity:

site:expel.com “brady stouffer” intitle:”red team”

Still, pretty simple, but now we’re tapping into some additional capabilities of the query language.

Assess the appropriate aperture

However, while this “detection” is now more narrowly scoped, is it too narrow for our use case?

In my opinion, it is. This logic requires the title of the blog to include a specific string which introduces risk of a False Negative. For instance, if I write a blog about detecting red team activity with a title named “Detecting sneaky threats from our Blue Team counterparts”, we would miss it.

We find ourselves in a predicament: How narrow do we want to scope our detections?

Do we want to limit our risk of false negatives by opening up the aperture to catch novel or evasion techniques? This will likely increase the volume of alerts we send the SOC, increasing our false positive rate and negatively impacting SOC capacity.

Or, do we want to seek an objective to only surface true positive activity to the SOC? This will decrease the volume of alerts we send to the SOC, maximizing capacity and reducing the likelihood of alert fatigue, but it will also increase the risk of false negatives.

Understand the risk

Each incurs a risk that we have to weigh. Protecting our SOC or protecting the environment(s) that we are responsible for. This is one part that makes detection engineering hard: Understanding the appropriate scope to build your detection so it aligns with your organization’s goals and tolerance for false positives.

This blog isn’t intended to get into metrics — and we’re already on a huge digression — but this is why it’s hard to find an agreement on what false positive rate is acceptable, or flipping it around, what true positive rate you seek to achieve. In my opinion, it comes down to SOC capacity and the risk tolerance the organization has for false negatives.

Evaluate alerting strategies

If you don’t want false negatives, invest in your SOC or, what I prefer, find a way to detect a lot but alert on few. Allow your detection engineers to be aggressive in capturing suspicious activity without having to be so concerned with crushing the SOC. Some method of aggregate alerting is key here, whether it’s something like Risk Based Alerting or Scenario Based Alerting. But, those approaches aren’t silver bullets. They can be challenging to implement and OOTB solutions usually don’t work well, so the adoption of these methods is still a bit behind IMO.

If you don’t have the ability to invest in your SOC or to mature your detection engineering approach, then you’ll need to accept greater risk for false negatives.

Tune effectively

Returning from the tangent, in my opinion, this is a pretty strong detection for identifying Expel blogs written by me that detail Red Team activity. I’m also going to tighten it up to “tune” out the “activity” we’ve already observed and evaluated.

site:expel.com “brady stouffer” “red team” -intitle:”Red team sneakiness: Splunking for AD certificate abuse”

So, is it just glorified googling?

At the end of the day, a portion of Detection Engineering is really a bunch of guess and checking, or if you prefer a fancier term, “iterative querying”. It is very similar to advanced Googling. But, there are other components of Detection Engineering that are more challenging.

We covered some of them. Let’s recap:

  • Learning the language (understanding the tools and capabilities at your disposal)
  • Knowing the target you’re trying to capture with your query (knowing which events are relevant to the attack you’re trying to detect)
  • Generating the proper target if one doesn’t exist (generating the events so you have something to capture with your query)
  • Returning the intended events (being intentional with the scope of your query)
  • Understanding the risk of searching too narrow or too broad (hurting SOC capacity or increasing false negatives)
  • Finding a way to increase detections while maintaining alert volume (alerting on aggregated signal)
  • Keeping your detections tight (continuous improvement through tactical tuning and refinement)

While comparing Detection Engineering to googling is a fun way to think about it, there’s a lot more that goes into building a good detection.

By the way, we didn’t even touch on what data to return once you’ve settled on a query. Thankfully, I’ve already shared my thoughts on this here.

--

--

br4dy5
br4dy5

Written by br4dy5

Detection Engineer. Threat Hunter. Splunker. RBA enthusiast.

No responses yet