Building Better Detection Systems: Introducing KRANG at Carta

John Sonnenschein
Building Carta
Published in
5 min readMar 28, 2023

Detection engineering is a major function of a modern security operations team, and is roughly a method of threat detection that goes beyond simply writing detection rules; it involves applying a systematic engineering approach to improve the accuracy of threat detection.

The objective is to build an automated detection system that is adaptable, flexible, reproducible, and generates high-quality alerts for security teams to respond to. The method takes a holistic approach to identifying threats before they can do significant damage through understanding an organization’s operations and risks, and turning an idea of how a threat might manifest into a description of how to detect it, then continuously tuning and developing those detections to defend against threats to your organization in particular.

Detection-as-Code is the idea that detections should be treated as code, and that software engineering best practices should be applied to detections, using modern and agile CI/CD processes to provide repeatable & attestable changes to detections as a holistic product of the detection engineering team

A Splunk SOAR screenshot of the playbooks run on a risk event, where it creates an artifact, preprocesses it, imports data, and merges it with other related events
A detection playbook, enriching & merging multiple events

Smarter Alerting

Security alerts based on simple searches such as a developer logging on to a production system to debug a piece of code can lead to a barrage of alerts on relatively innocuous behavior, overwhelming SOC analysts and causing them to either ignore the alert or create ever expanding exceptions for it. This increases the risk of Type II Errors, treating a security event as benign, when it may actually be an indicator of the beginning of a real breach!

Alerting could be smarter than this.

Carta’s Security Operations team is implementing a better way to handle alerting: a risk-based alerting by way of correlative searches and data enrichment. Take for example a DBA logging in outside working hours from an IP in a city they are not usually in- an instance that requires us to figure out their working hours, which city they’re usually in, and attaching a score to correlate with the other risky behaviors that may happen before deciding a course of action. Such schemes depend on a highly normalized set of data, but data from different sources needs to be normalized in order to generate meaningful insights and alerts.

a screenshot of a Splunk alert showing a user exceeding their risk threshold, including a table of MITRE ATT&CK techniques used
Risk-based alert, from multiple alertable events

Challenges in Data Normalization

Normalization of data from different sources is a major challenge, as vendors have different models for data normalization. Splunk publishes a group of datasets they refer to as the Common Information Model, or CIM and many vendors take advantage of these data models. The AWS add-on for instance has field transforms to turn userIdentity into user. However, not all vendors provide Splunk apps that include field extractions and transformations. Teams committed to data normalization must create them manually, which is difficult in a cloud environment where there’s no access to the underlying props.conf files.

Splunk’s UI is also ergonomically challenging, with only a single line for a massively unwieldy transform (getting a meaningful user field out of all the AWS applications that write to cloudtrail is, currently for us, over 2500 characters). Changes made in the UI are difficult to audit or revert, and not all options that one might use for detection are documented.

A screenshot from Splunk’s field UI with a single , long, truncated line defining a field extraction
The Splunk UI for editing a field transform

Carta’s Solution: KRANG

Carta’s Security Operations team recognized these challenges as an opportunity to improve our engineers’ quality of life and enhance our alerting and investigative actions. We built KRANG, which stands for Knowledge, Reports, Alerts, & Normalization Generator, an automated framework for applying CI/ CD practices to field normalization and detection engineering in a Splunk environment.

A formatted YAML file demonstrating field extractions with the KRANG tool
Defining fields for Route53 log lines

KRANG allows our engineers to focus on writing higher quality correlative alerts, by using simple, human readable YAML documents with configurable properties stored in Git and built by CI. This is where they can be automatically deployed to Splunk as an app, and available immediately!

At Carta, we’ve adopted the alert format used by Splunk research in their security content repository. We did so to maintain consistency in field names and so we can easily use alerts written by Splunk research without any modifications. We chose not to use the Splunk security content tooling directly because it gives us greater flexibility in how we use and deploy alerts. We can enable them as needed, select our own alert actions, and tailor our alerting to fit our specific needs. Furthermore, the ESCU app generation utility includes many features of the entire ES suite, such as Analytic Stories, that we don’t need to use for all of our alerting.

A formatted YAML file demonstrating an alert written to use the KRANG tool, including risk confidence & impact, context of the alert, and alertable object, in this case the user
A typical alert, someone working through the AWS console

To make our detection processes even more efficient, we’ve extended the format to allow for risk-based alerting fields to be declared outside the tags object. We’ve also simplified the format so that only a name and search field are strictly necessary.

Our approach to data normalization is similarly streamlined. By declaring the presence of a top-level key, we assume the corresponding knowledge object is present. For example, the presence of an eventtype top-level key indicates the need for applying a data model acceleration to enable faster summary searches via tstats. Similarly, the presence of a macro key is assumed to indicate the need for search macros, and the presence of a sourcetype field indicates the need for field operations. By adopting this approach, our detection engineers can rapidly apply necessary data normalization for better, faster, and more accurate alerting.

KRANG simplifies the process of creating field extractions and creating alerts in Splunk, making it easier to maintain and scale, and today, we’re open sourcing it!

We hope you enjoy. As always, pull requests are very much welcomed: https://github.com/carta/krang

--

--