Detection Engineering the SOC: Writing a Detection Rule
Hey, RCX Security here. My blog has moved over to Substack as of April 2024 under the Cybersec Cafe. If you are here to sign up or keep reading, go here instead.
Thank you for the support!
Welcome to the first article of a new three part mini-series at the Cybersec Cafe called “Detection Engineering in the SOC” — covering a real life use case for Detection Engineering. As a refresher, the SOC stands for the Security Operations Center, which is the departmenet within an organization responsible for monitoring, detecting, responding to, and mitigating security threats and incidents. In this mini-series, I’ll be taking a detection end-to-end through the detection lifecycle (DLC) and giving an inside look at the thought process of Security Engineers and how they build out the SOC.
In this series, we’ll be covering the following topics, so make sure to save the post and subscribe as I’ll post them over the coming weeks. Or, if you want to get ahead, you can also find the series already publishes on my blog, the Cybersec Cafe.
In the first article of this series, we’ll start with writing a Detection Rule. In essence, a Detection Rule defines patterns, behaviors, or indicators of compromise (IoCs) that are associated with known threats. These rules are designed to trigger alerts when the logic returns True during log monitoring. The collection of rules generally live with a SIEM (Security Information and Event Management) system, which is a tool used to aggregate and analyze log data from various sources. Security Engineers write these Detection Rules inside the SIEM to ensure effective monitoring and alerting based on an organization’s specific security needs.
Detection Engineering Scenario
For this example, lets consider an organization that leverages Amazon Web Services for their infrastructure. Amazon Web Services, or AWS, is a widely used cloud computing platform provided by Amazon that offers a broad set of services to allow businesses and organizations to access and utilize computing resources. Cloud services are generally used to avoid the need for upfront investment in physical hardware or infrastructure. Many companies nowadays leverage AWS, or other cloud services such as Google Cloud Platform or Microsoft Azure.
When companies uses a cloud service like AWS, it is best security practice to setup the service to sit behind an MFA provider, ensuring secure access to the different services. However, there are some instances and use cases that may require different forms of authentication into the console. For our scenario here, let us assume that a use case exists in the organization requiring a different form of authentication. As the lead Security Engineer, we are tasked with writing a Detection Rule to trigger whenever a log comes through alerting us that a user authenticated successfully to AWS without using MFA. The business case behind this alert is that our environment requires a login using credentials, and we need insight into these logins in the case of credentials being stolen/leaked.
Analyze the Logs
As the Security Engineer taking on this task, lets first take a look at what a sample log from AWS may look like for this scenario:
{
"accessKeyId":"fahsdjklnjkllnasd",
"accountId":"asdf8ae3hlas",
"awsRegion":"us-west-2",
"consoleLogin":"Success",
"eventCategory":"Management",
"eventId":"adsf38-3n8d-3nd1-9d83-asd83nfla",
"eventName":"ConsoleLogin",
"eventSource":"signin.amazonaws.com",
"eventTime":"2054-03-14T23:12:12Z",
"eventType":"AwsConsoleSignIn",
"eventVersion":"1.10",
"ip":"8.12.54.2",
"managementEvent":true,
"mfaAuthenticated":false,
"mfaUsed":"No",
"readOnly" false,
"recipientAccountId": 234789104,
"type":"AssumedRole",
"userIdentity":"RCXCybersecCafe"
}
Take a deep breath and relax for just a second, I know this can be a bit daunting at first, but this is what a (simplified) log will look like from AWS CloudTrail. Since these logs are being ingested to the SIEM, this means that thousands and thousands of other logs are going to be ingested and logged leveraging the exact same schema.
We’ve arrived at our first challenge: how do we differentiate this log from the other logs coming in to write the logic for this detection? Let’s take a look at the same log but redact away information that won’t be any use to us. We’re now left with the following:
{
"awsRegion":"us-west-2",
"consoleLogin":"Success",
"eventName":"ConsoleLogin",
"eventSource":"signin.amazonaws.com",
"eventTime":"2054-03-14T23:12:12Z",
"eventType":"AwsConsoleSignIn",
"ip":"8.12.54.2",
"mfaAuthenticated":false,
"mfaUsed":"No",
"userIdentity":"RCXCybersecCafe"
}
Much cleaner! Now we have a simpler look at the fields that are of interest to us, and we can start to make sense of what is going on in this log. Let’s take a minute to understand the different fields:
- awsRegion — The region that the login occurred.
- consoleLogin — Describes the status from the Login attempt.
- eventName — Tells us what event happened in this log.
- eventSource — The origin source of the log.
- eventTime — The exact time this event took place.
- eventType — Categorizes the event that took place.
- ip — The origin IP address of the user.
- mfaAuthenticated — Boolean describing if MFA was used.
- mfaUsed — String describing if MFA was used.
- userIdentity — The user who took the action.
Great, we now have a sample log to base our Rule off of. But, remembering that some users authenticate to our console using MFA, is it possible that other ConsoleLogin logs exist? It’s likely. At this point, it will be a good idea to start querying in the SIEM, searching for similar logs to help get an idea of how we want to write the logic for our specific use case.
Just as suspected, we’ve found the following (redacted) ConsoleLogin log:
{
"awsRegion":"us-west-2",
"consoleLogin":"Success",
"eventName":"ConsoleLogin",
"eventSource":"signin.amazonaws.com",
"eventTime":"2054-02-11T12:23:14Z",
"eventType":"AwsConsoleSignIn",
"mfaAuthenticated":true,
"ip":"23.173.198.61",
"saml":"arn:aws:iam:1234567890:saml-provider/TestApp",
"mfaUsed":"Yes",
"userIdentity":"RCXCybersecCafe"
}
Although this also seems to be a log for a successful Console Login, there are some key differences to make note of. If we take a minute to compare to the previous log, we find:
- mfaAuthenticated — Now a true boolean, we can see MFA was used here.
- mfaUsed — A string notating that MFA was used in this event.
- saml — The inclusion of this new field notates the application that utilized SAML to access the AWS.
Perfect, we should now have the knowledge we need to create an MVP (Minimal Viable Product) for this Detection Rule! But, before we move forward with writing the logic for this detection, I want to take a brief step back to iterate something I’ve covered in another article, but not yet here. It’s important to remember that the Detection Lifecycle is an iterative process. The initial creation of the Detection Rule is really only the first step, and even though we’ve done an analysis of the logs, we’ll likely have to see how the detection behaves in a production environment to get an idea on its viability. More times than not, a detection will need some tuning done to it, and that’s okay!
Writing the Detection Rule
With that being said, let’s move forward with creating the logic for our Detection Rule. We’ll be writing this rule in Python, but many SIEM platforms out there use SQL or their respective Query languages to write Detection Rule logic. For our use case, the first order of business is to set the severity of the alert and to filter by the eventName field. In this initial piece, we can ignore any log that does not have an eventName ConsoleLogin , filtering out any noise.
severity = "High"
def rule(event):
if event['eventName'] != "ConsoleLogin":
return False
We’ve chosen to set the severity to “High” due to the nature of the event. Unauthenticated access to the AWS console could be an IoC and potentially lead to devastating consequences if an attacker gets access. But, since it is also regular activity for some privileged developers, it’ll be too regular of an event to rank as “Critical.”
Moving to the next part of the detection logic, let’s imagine we have a built in method, check_account_age(), that checks how long the account has been around. We’re going to leverage this method to determine if the account is less than 3 days old. If so, we’ll upgrade the severity to Critical, and return as True. Why? Because this could point to a potentially huge compromise — that an attacker was able to create a new account and then access the AWS console without using MFA.
if check_account_age(event['userIdentity']) and event['mfaUsed'] == "No" and event['consoleLogin'] == "Success":
severity = "Critical"
return True
This statement achieves exactly what we would classify as malicious activity. But, what if this statement returns False? We need to write an else statement that decides whether to trigger an alert based on some more parameters. This else statement should trigger whenever there is a login without MFA. Period. A potential scenario of this could be due to lateral movement: an attacker was able to take over an authentic account and leverage that to gain access. So, let’s write some basic logic to cover this:
else:
return (
event['consoleLogin'] == "Success"
and
event['mfaUsed'] == "No"
and
'samlProviderArn' not in event.keys()
and
event['mfaAuthenticated'] != True
)
This else statement will first, check to ensure that the login was a successful event. And next, that MFA was not used in the successful login. As we saw above, the samlProviderArn field is only present in a log that uses MFA to login. So for the third conditional, we can write logic that ignores events that have that key among the key:value pairs. Lastly, if the mfaAuthenticated field is not false, we can also ignore the events. We’ve covered the fields in the log that could point to anomalous activity, and have wrapped them up in and statements so that if one of them returns as False, the alert will not fire.
Putting it all Together
After all of that hard work, let’s take a look at this detection in it’s entirety:
severity = "High"
def rule(event):
if event['eventName'] != "ConsoleLogin":
return False
if check_account_age(event['userIdentity']) and event['mfaUsed'] == "No" and event['consoleLogin'] == "Success":
severity = "Critical"
return True
else:
return (
event['consoleLogin'] == "Success"
and
event['mfaUsed'] == "No"
and
'samlProviderArn' not in event.keys()
and
event['mfaAuthenticated'] != True
)
Now that we have the logic written out, the following step would be to deploy the detection to the production environment, and wait for alerts to come in.
Congratulations, you’ve completed writing your first Detection Rule!
Now What..?
The next day, after deploying the rule, you notice that an alert has come in for the detection — just as we expected! But… now what? What do we do with this alert? We can’t just ignore it now… Don’t worry, that’s step two of our “Detection Engineering in the SOC” series. In the next article, we’ll be covering how to write an IR Playbook to triage your alerts. So, if you enjoyed the first part of this series and want to learn more about how to Engineer the SOC, make sure to continue to follow the Cybersec Cafe if you want to get ahead on the series, or RCX on Twitter for daily cybersecurity insights!