The Shuffle automation and detection framework — Open Source SOAR

Frikky
Shuffle Automation
Published in
8 min readApr 26, 2021

--

Shuffle has been available for almost a year now, and it’s about time we move the detection-needle with Shuffle. Since last time we wrote about Shuffle, it’s evolved to the point where “anything is possible” — for better or worse. It’s good, as the product can do a lot. Bad because it’s not focused enough. That’s why for the foreseeable future, we’ll write multiple blogposts monthly, highlighting exactly how we are and will solve problems with Shuffle. Maybe we’ll solve yours too.

Our goal is to make the life of analysts easier and more fulfilling, and to get there we need to start simple. What does that mean though? It means to alleviate a lot of time spent looking at all the different browser tabs you have open, and focus you down on your most important tasks by using your existing tools more efficiently.

If you didn’t read our previous pieces on Shuffle, I’d strongly suggest starting there to understand what we’re about. You can get started by setting up Shuffle locally (takes 2 minutes) or using our cloud version (instant).

What are your existing tools though? And how can they be used to enrich you or your analysts’ life? To answer this question we’ve gone deep into all kinds of operation teams, and integrated with hundreds of their tools. We’ve also found the root cause for the analyst agony of alert fatigue. And it’s painstakingly obvious… your SOC has bad detections. Here’s why.

Starting with Detection

Detection engineering is hard. It’s something everyone should do, some of us actually do, but not something most of us actively focus on. Why? Because it’s creating more work for ourselves. It means to catch MORE bad things, but not too many of them. Why is this a problem? Because we all have limited capacity, and would like to keep our sanity. This has less to do with the detection itself, and more with what comes afterwards; incident handling and time spent.

Detections come in many variants; SIEM, IDS, EDR, IAM, WAF, AV and a lot of other (acronym-) tools that have the capability. But do you know how to use all of them well? If you do, great for you and your team, but chances are; you don’t. And you shouldn’t have to either. These tools exist to help you detect, and you shouldn’t need to be an expert in every aspect. But that’s not reality today. Reality is they all work sub-optimally, or not at all. That’s something Shuffle can solve. Giving you a better overview of your resources, and making use of them in the best way possible — for detection and otherwise. How?

How we’ll help you get incident context

The Shuffle SOC tool framework

The above image is a simplified view of security operation tools. Everything you need in a SOC can be put into one of those boxes (please comment if you disagree). There are indeed unique selling points to each kind of tool as well, but they generally have the same capabilities. Let’s dig in.

  • Cases: The heart of your operations. The place your analysts SHOULD be working, WITH context. Sadly this tool is the most underutilized today. An example and favorite of mine; TheHive.
  • IAM: Identity access management. Why is this important? Because knowing who someone is and what they have and should have access to is important. Example: Keycloak
  • Assets: Asset management, CMDB, Vulnerability management, Documentation, Vulnerabilities and more. Everywhere you can find useful host and user relations. Example: Snipe-IT
  • Intel: A broad and misused name, Intel means any intelligence that can help you solve an incident. Most often used with Threat Intelligence providers. Example: MISP, GreyNoise
  • SIEM: Your datalake, often misused as the place to work in, rather than a place to find and detect, before moving the data out. Example: Wazuh, ELK
  • Network: Firewalls, IDS/IPS, DNS servers, Switches, NSM engines.. Example: Pf Sense
  • Eradicate: EDR, Antivirus, Powershell & Bash… This is a broad category to define preventive actions that are and can be done — usually at a host level. Example: Velociraptor, OSSEC
  • Comms: Email, Chat services, SMS etc. Should be used for notifications and validations of semi-automated workflows. All tickets only in Slack/Teams is NOT a good idea long-term.

With that covered though — frameworks are useless unless put into action. Our goal for the SOC tool framework is to help you understand our approach and how it can be incorporated into your processes. We’ll start high level, and eventually get to individual tools and their use-cases — starting with detection.

Example definition of the circle

The image above is a high level example of the usage of each piece of the framework. From the left, you can see us defining Sigma, Snort and Yara, which fit into the Detection area. These are tools we already support, but plan to make WAY easier to use for everyone. Again, good detection and sharing is the goal. In the middle you can see two areas that are human-centered; case management and communication. This CAN fit in with the rest, but in most cases won’t and shouldn’t. On the right you can see the last three pieces which are used to make context of an incident easier to understand.

So what’s the next step from here? How do you actually put it into action? Let’s go deeper.

Example WAF exploit remediation with Shuffle

This example shows an example where your external infrastructure is being exploited — to make it a bit more specific; lets say it’s a webserver with a WAF (Web Application Firewall). On the left you can see the semi-manual way most are doing it today, while on the right, you’re seeing how it can be optimized. The stipled lines indicate automated actions (usually API’s), while the blue are human actions.

Looking closer at the left part of the image, here’s about how it goes (Manual):

  1. You get an alert from the WAF, sent to the SIEM. You got an API which sends this to your communication system (e.g. Email or Slack).
  2. You see the message after a while, and wonder what it is, and whether someone else is working on it (if you use shared email with tags, please get a ticketing system).
  3. You start by going back to the WAF for the real context, while also collecting the event logs in the SIEM.
  4. You’ll start exploring the IP it came from, and looking for the service it was targeting to see if it’s vulnerable. What and who owns the service? Is the source IP targeting only you or everyone? What was the exploit? Did it work? Etc.
  5. After an hour of scouring for information, you find that the exploit indeed did work, that the attack didn’t happen before, it was targeted and that you’ll need to remediate. It may be too late for this step, but you decide to isolate the compromised host.

Now, compare this to the right side (Automated);

  1. You get an alert from the WAF (Network), sent to the SIEM, which further forwards it to Shuffle. Shuffle is configured to create alerts in your case management system.
  2. While adding the case, Shuffle further goes and looks for other similar incidents (Cases), checks who owns the service (IAM & Assets), whether the service is vulnerable (Assets, VMS), whether the source IP is targeting you (Intel)
  3. Shuffle sends a message to your team’s notification channel, which only occurs for high-severity incidents. It also calls your defined on-call.
  4. The analyst sees all the required information in their case management system, and decides to take action immediately because there’s a high likelihood of a successful exploit.
  5. After about 5 minutes, the analyst has taken action and isolated the server (which may or may not have been correct) and notified the service owner.

See the difference? In the first example, the analyst had to scour for information and may not have noticed the incident at all, while in the second, the information is at hand and ready when necessary immediately. The starting point, goal and procedure is the same, but you reach conclusions way faster, if not wholly automated. Over time this has another huge benefit though; your team won’t suffer from Alert Fatigue.

Use your resources well

Shuffle isn’t meant to be a silver bullet, but attempts making your existing tools into one. We want to empower your employees to do innovative security work rather than working on something that can be automated. Way too many people are quitting cyber security before they get to the experience the “fun” parts (e.g. level 1 SOC analysts), and this is the only way to get there. Using your available tools and resources shouldn’t be as hard as it seems at first.

In the coming weeks and months, we’ll further explore actual use-cases that you can try yourself, with the basis in this framework. We’ll tackle phishing, enrichment, tool building and more, and will show you how a good overview of the incident can lead to success for everyone. All of the tools above can be used as a starting point as well (not just Network & SIEM), and we want to explore what happens for each one.

The expansive Shuffle search engine on https://shuffler.io

Want to try Shuffle for yourself? It’s open source, we provide support and use-cases to get you started, and already has a thriving community on Discord. Feel free to drop by and introduce yourself to fellow security and automation enthusiasts! Want help automating your operations? Get in touch.

As always, feel free to clap (up to 50 times), share this article and follow us on Twitter for updates.

--

--